Suppose you have a
DataFrame, which has a column called
Int64). Then it has a column called
String). Yes, suppose you have 12 rows with the integer year
2000 and the 12 months
Strings with that, and under it
2001 and its 12 months.
You look at it. Now you want to create a corresponding column with Julia’s own
What is the most direct, most efficient, and easy way to get the result whether using some querying package or only Julia’s own included tools?
I thought of using a
for loop to create a new array, but it might not be the best here.
Alternatively, is it possible to simply enforce Julia’s recognition of
Dates by editing a CSV beforehand somehow?
I don’t see a function in
Dates for parsing month names, however there is a function for generating them.
You can do
mdict = Dict(monthname(i)=>i for i ∈ 1:12)
Date.(years, [mdict[m] for m ∈ months])
assuming the months and years are in two separate columns.
If you had a column with parseable timestamps you can set the column type with the
types keyword, see the CSV.jl documentation, but that doesn’t sound like it’s the case here.
Month names could be parsed like:
# df is the DataFrame described above
For info on the
"y-U" string, see
Thanks, both. I will have a look at that (and maybe come back to you).
Your way, @Mattriks, gives me an error:
ArgumentError: Unable to parse date time. Expected directive DatePart(U) at char 6
I wonder, @ExpandingMan, if yours is really applicable to this situation. Maybe I don’t understand or I explained wrongly.
Let me try to explain (and someone can then extend this code):
### a simple DataFrame
df = DataFrame(
Year = [2000, 2000, 2000, 2000],
Month = ["January", "February", "March", "April"]
Question: how do you easily, automatically and most efficiently create a third column, which will have Julia’s
Date type, where the elements/
Dates will have the format according to
2000-02-01, which you would normally get from
df in your last post, these both work for me:
df.date = Date.(.*(string.(df.Year), "-", df.Month), "y-U")
df.date2 = Date.(string.(df.Year).*"-".*df.Month, "y-U")
I wish there were pre-made convenient functions for this kind of thing (which may be common encounters), even that it would nicely work with different
String month names. I thought of it, but haven’t come up with a very nice solution.
Is there in any case an alternative to such code as you wrote closer to the source (earlier in the process)? For example, if you edit or make a CSV beforehand, can you do it in such a way (by doing a column in some way) so as to force
You can do (see
CSV.read("myfile.csv", dateformat=dateformat"y-U", types=Dict(1=>Date))
which assumes that column 1 contains e.g. 2000-January that can be converted to a Date. But that only works for single columns, I don’t know of a way for combining multiple columns into a