Hi there,
I have a .csv file (or, equivalently, I can have it as a .xlsx or .ods file), with the typical following format (suppressing many more intermediate rows):
Vínculo Nome Número de identificação Estado Iniciado em Completo Tempo utilizado Avaliar/10,00 Q. 1 /0,00 Q. 2 /0,50 Q. 3 /0,50 Q. 4 /0,50 Q. 5 /0,50
- Aluno JOSÉ DA SILVA 1890 Finalizada 10 janeiro 2021 15:00 10 janeiro 2021 17:00 2 horas 7,50 - 0,50 0,00 0,50 0,50
- Aluno JOÃO NIGUÉM 1543 Finalizada 10 janeiro 2021 15:00 10 janeiro 2021 16:59 1 hora 59 minutos 0,50 - 0,00 0,00 0,00 0,00
- Aluno MARCOS DA SILVA 1329 Finalizada 10 janeiro 2021 15:00 10 janeiro 2021 16:45 1 hora 44 minutos 8,00 - 0,50 0,00 0,50 0,50
- Aluno BRUNA NINGUÉM 1111 Finalizada 10 janeiro 2021 17:58 10 janeiro 2021 18:00 1 minuto 19 segundos 0,00 - - - - -
Média geral 6,49 - 0,35 0,21 0,34 0,34
I would like, with Julia and any appropriate packages, to:
(1) delete the original columns 1, 2, 3, 4, 7, 8 and 9, originally entitled “Vínculo”, “Nome”, “Número de identificação”, “Estado”, “Tempo utilizado”, “Avaliar/10,00” and “Q. 1 /0,00”, and keep the remaining ones.
(2) delete the last row.
(3) read the original columns 5 and 6, originally entitled “Iniciado em” and “Completo”, as DateTime objects, after first replacing the (Portuguese standard) date/times, such as “10 janeiro 2021 15:00”, with (international standard) “2021-01-10 15:00”. I had problems with this step, mainly possibly because of the Portuguese to international standard conversion of the date and time format.
(4) read the original columns 10 up to 13, originally entitled “Q.2/0,50” up to “Q.5/0,50”, as floating point numbers, after first replacing the (Portuguese standard) decimal commas, such as “7,50”, with the English standard decimal dot, such as “7.50”. Also the “-” strings in those columns should be interpreted (converted/replaced) as missing
type values. I could not make much progress with this, particularly the comma to dot and then string to float conversion, and then the missing value part as well.
(5) finally, rename the remaining 6 columns from: “Iniciado em”, “Completo”, “Q.2/0,50”, … “Q.5/0,50” to: “Start”, “End”, “Q1”, … “Q4”.
I guess I am able to deal with items (1), (2) and (5) reasonably well, at least with the packages XLSX.jl
and DataFrames.jl
, but the items (3) and (4) seem trickier. @Mattriks in question How to convert string column of dataframe to float with missing values? has induced me to use the CSV.jl
package. I am willing to do this, but now I would like to know whether I could deal with all the issues described above, discarding XLSX.jl
altogether, using
only DateTimes.jl
and CSV.jl
.
Thanks for any help.