How to manage various date format in a csv file

Hello,

I have to create a CSV file with various date format. Up to now, I use:

using DataFrames
using Dates
format_date_1=“yyyymmddHHMMSS”
format_date_2=“yyyymmdd”
df_out=DataFrame(hre_deb_calc=DateTime[DateTime(“20190402161901”,format_date_1)],jour_prevu=DateTime[DateTime(“20190326”,format_date_2)])
CSV.write(“resultat.csv”, df_out, delim=“;”, dateformat=format_date_1);

Of course, the CSV file is the following:
hre_deb_calc;jour_prevu
20190402161901;20190326000000

Nevertheless, I would like to obtain:
hre_deb_calc;jour_prevu
20190402161901;20190326

But, I do not succeed to manage diffrent date format with CSV.write

Could you help me ?

Thanks !

NB : When I use

CSV.write(“resultat.csv”, df_out, delim=“;”)

I obtain (which is not OK):

hre_deb_calc;jour_prevu
2019-04-02T16:19:01.0;2019-03-26T00:00:00.0

Hi @steph_de_paris. Sorry for the late reply.

The cause of your problem

  • CSV.write() does not support dates being written with different formats in different columns.

Solution A

  • Write your own custom write_custom_csv(...) function.
  • You can use the Dates.format(::IO, some_date_object, FORMATSTR) function

Should be easy enough, just a bit annoying when you have to support multiple different table formats.

Solution B

Julia makes it possible to create wrapper types to get them to behave differently than the original.

In your case, you would want these types to use different show(::IO, ::MyWrapType) functions. Note that, in a similar fashion, you can create a more convenient constructor that applies the same date format used by show():

import DataFrames: DataFrame
import CSV
using Dates


#==Define date wrapper (parameterized type)
===================================================================#

#Wrapper/tagging system to create a specialized date format:
struct DateFormatted{FMT} #FMT should be a symbol like `:YMD`
	d::DateTime #Acutal data/date/etc
end
#Convenient string-based constructor:
DateFormatted{FMT}(datestr::AbstractString) where FMT =
	DateFormatted{FMT}(DateTime(datestr, DateFormat(DateFormatted{FMT}))) #Use DateFormat()... as specified below

#By default, dates are shown formatted by "user-specified" DateFormat() function
Base.show(io::IO, df::T) where T<:DateFormatted =
	Dates.format(io, df.d, Dates.DateFormat(T))


#==Define your CUSTOM date wrapper types:
===================================================================#
#Create convenient aliases (if you prefer):
DateAMJ = DateFormatted{:AMJ}
DateAMJHMS = DateFormatted{:AMJHMS}

#Overload `DateFormat` for your new wrapper type (to construct/show() your date types):
#(Convenient way for `show()` to figure out how to show)
Dates.DateFormat(::Type{DateAMJ}) = dateformat"yyyymmdd"
Dates.DateFormat(::Type{DateAMJHMS}) = dateformat"yyyymmddHHMMSS"


#==Use your new wrapped date types:
===================================================================#
@show d1 = DateAMJ("20190326")
@show DateAMJHMS("20190402161901")


#==Create your dataframe again:
===================================================================#
#Define your arrays externally for readability purposes:
hre_deb_calc = [DateAMJHMS("20190402161902")]
jour_prevu   = [DateAMJ("20190326")]

#Use `;` to avoid needing `keyname1=keyname1, keyname2=keyname2, ...`:
df_out = DataFrame(; hre_deb_calc, jour_prevu)
CSV.write("resultat.csv", df_out, delim=";")

PS

There are, of course, other solutions possible with Julia types. Some might even be more elegant than what I suggested.

3 Likes

@MA_Laforge, how would you implement solution A?

The code below works but requires copying the dataframe and formatting the DateTime columns to strings prior to saving with the regular CSV.write:

using DataFrames, Dates, CSV

format_date1 = "yyyymmddHHMMSS"
format_date2 = "yyyymmdd"
df = DataFrame(hre_deb_calc=[DateTime("20190402161901",format_date1)], jour_prevu=[DateTime("20190326",format_date2)])

df_out = copy(df)
df_out.hre_deb_calc = Dates.format.(df_out.hre_deb_calc, format_date1)
df_out.jour_prevu = Dates.format.(df_out.jour_prevu, format_date2)
CSV.write("resultat.csv", df_out, delim=";")

Indeed, if you don’t care about runspeed/memory usage, you can copy the entire DataFrame and substitute the two date columns manually with a string value instead. You could also just replace the one column that doesn’t conform to the default dateformat= value in write():

  • CSV.write(“resultat.csv”, df_out, delim=";", dateformat=format_date_1).

But I was thinking of just implementing your own traditional write_X() function using elemental print(), etc commands (in this case Dates.format() as an advanced print()):

import DataFrames: DataFrame
using Dates

format_date1 = "yyyymmddHHMMSS"
format_date2 = "yyyymmdd"
df = DataFrame(hre_deb_calc=[DateTime("20190402161901",format_date1)], jour_prevu=[DateTime("20190326",format_date2)])

function write_csv_custom(filename, df)
	nrows = size(df, 1)
	open(filename, "w") do io
		println(io, "hre_deb_calc;jour_prevu")
		for i in 1:nrows
			Dates.format(io, df.hre_deb_calc[i], DateFormat(format_date1))
			Dates.print(io, ";")
			Dates.format(io, df.jour_prevu[i], DateFormat(format_date2))
		end
	end
end

write_csv_custom("resultat.csv", df)
1 Like