Is the following efficient? I was mainly concerned about the push! function.
function writeToFile(filename::String, am::Vector{Am})
df=DataFrame(name1=Int[], name2=Int[], name3=Int[])
for a in am
push!(df,[a.name1,a.name2,a.name3])
end
CSV.write(filename,df)
end
using CSV
mutable struct myType
index::Int # Int64
sIndex::Int # Int64
myType() = new(0,0)
end
function writeToFile(filename::String, a::Vector{myType})
@assert length(a)>0
CSV.write(filename,a)
end
function main()
aVector = Vector{myType}(6)
for i in 1:6
aVector[i] = myType()
aVector[i].index=i
aVector[i].sIndex = i
end
writeToFile("path to my csv", aVector)
end
main()
I got an error, saying that "ArgumentError: no default Tables.row implementation for type: Array{myType, 1}.
The following only applies if you have the latest version of Tables.jl (which is 0.1.14), I’m not sure it will work in earlier versions.
A NamedTuple is an object of type (a=1, b="something"): like a Tuple except it has names. An Array of NamedTuples (for example v = [(a=i, b=i+1) for i in 1:3]) can be considered a table and therefore you can do CSV.write(filename, v). I thought am was a vector of named tuples. If it isn’t, instead of allocating a DataFrame (in case your data is really big and you are afraid this could slow things down) you can simply generate a NamedTuple iterator and write it to CSV.
itr = ((name1=a.name1, name2=a.name2, name3=a.name3) for a in am)
CSV.write(filename, itr)
This way you avoid allocating the DataFrame and the data is streamed directly from your vector to the CSV. Note that itr is probably a better way to convert to DataFrame. For example you could just do DataFrame(itr) and get the DataFrame (avoiding the temporary vectors [a.name1, a.name2, a.name3]).
As an alternative, if your data is stored as a DataFrame and therefore in columnar storage (it is basically a list of vectors), you can convert it to a named tuple of arrays with Tables.columntable(df) and then create a StructArray from it:
using StructArrays
cols = Tables.columntable(df)
StructArray{MyType}(cols)
Yes, note that you can do table = CSV.File(file) |> Tables.columntable to materialize a csv file into a NamedTuple of Vectors directly, no intermediate construction into a DataFrame. In general, you can directly materialize a csv file into any Tables.jl-compatible “sink” function, like SQLite.load!, Feather.write, IndexedTables.table, etc. It makes it nice to not have to always intermediate through DataFrames if that’s not needed.