Most efficient way: create/mutate DataFrame time column

Usor · November 26, 2019, 8:21pm

Suppose you have a DataFrame, which has a column called Year (with Int64). Then it has a column called Month (with String). Yes, suppose you have 12 rows with the integer year 2000 and the 12 months January to December as Strings with that, and under it 2001 and its 12 months.

You look at it. Now you want to create a corresponding column with Julia’s own Date type.

What is the most direct, most efficient, and easy way to get the result whether using some querying package or only Julia’s own included tools?

I thought of using a for loop to create a new array, but it might not be the best here.

Alternatively, is it possible to simply enforce Julia’s recognition of Dates by editing a CSV beforehand somehow?

ExpandingMan · November 26, 2019, 8:35pm

I don’t see a function in Dates for parsing month names, however there is a function for generating them.

You can do

mdict = Dict(monthname(i)=>i for i ∈ 1:12)
Date.(years, [mdict[m] for m ∈ months])

assuming the months and years are in two separate columns.

If you had a column with parseable timestamps you can set the column type with the types keyword, see the CSV.jl documentation, but that doesn’t sound like it’s the case here.

Mattriks · November 26, 2019, 11:40pm

Month names could be parsed like:

# df is the DataFrame described above
Date.(string.(df.Year).*"-".*df.Month, "y-U")

For info on the "y-U" string, see ?DateFormat

Usor · November 27, 2019, 6:27pm

Thanks, both. I will have a look at that (and maybe come back to you).

Usor · November 27, 2019, 9:15pm

Your way, @Mattriks, gives me an error:

ArgumentError: Unable to parse date time. Expected directive DatePart(U) at char 6

I wonder, @ExpandingMan, if yours is really applicable to this situation. Maybe I don’t understand or I explained wrongly.

Let me try to explain (and someone can then extend this code):

### a simple DataFrame
df = DataFrame(
  Year = [2000, 2000, 2000, 2000],
  Month = ["January", "February", "March", "April"]
);

Question: how do you easily, automatically and most efficiently create a third column, which will have Julia’s Date type, where the elements/Dates will have the format according to2000-01-01, 2000-02-01, which you would normally get from Date(2000,1,1), Date(2000,2,1), etc.?

Mattriks · November 27, 2019, 9:23pm

Using the df in your last post, these both work for me:

df.date = Date.(.*(string.(df.Year), "-", df.Month), "y-U")
df.date2 = Date.(string.(df.Year).*"-".*df.Month, "y-U")

Usor · November 28, 2019, 2:26pm

That works.

I wish there were pre-made convenient functions for this kind of thing (which may be common encounters), even that it would nicely work with different String month names. I thought of it, but haven’t come up with a very nice solution.

Is there in any case an alternative to such code as you wrote closer to the source (earlier in the process)? For example, if you edit or make a CSV beforehand, can you do it in such a way (by doing a column in some way) so as to force Date recognition?

Mattriks · November 28, 2019, 6:24pm

You can do (see ?CSV.File):

CSV.read("myfile.csv", dateformat=dateformat"y-U", types=Dict(1=>Date))

which assumes that column 1 contains e.g. 2000-January that can be converted to a Date. But that only works for single columns, I don’t know of a way for combining multiple columns into a Date in CSV.read.

Usor · November 28, 2019, 6:46pm

Thanks.

Topic		Replies	Views
Converting columns in a DataFrame from String to Date (with a specific format) General Usage dates , dataframes	14	3516	October 10, 2022
DataFrame conversion of column names to type Data dates , data , type , dataframes	5	1142	November 25, 2020
Read CSV, converting column containing String "Date" to a specific proper Date format New to Julia	1	337	March 22, 2021
Converting DateTime in DataFrames indo Data New to Julia question , dates , dataframes	9	561	April 15, 2021
Convert date format "dd_mm_yyyy" in dataframe to "dd/mm/yyyy" General Usage dates , dataframes	6	2267	February 10, 2022

Most efficient way: create/mutate DataFrame time column

Related topics