Writing data from Julia to an Excel Spreadsheet

Hi, I am having some issues with exporting data to excel, using the XLSX package. In my Julia script I have a loop (I = 9000) whereby at each step (value oof i) I need to write out a vector y (130 elements) and several other variables, say a,b,c….n I want the vector and each of the variables, for each I, to be written into a row (not a column).The program structure I am guessing would look something like:

For i in 1:9000
	I want to write the following to a file:
 i, y,a,b, c,..n
End 

My question is what is the Julia syntax to do this? I have familiarized my self with the basic commands surrounding the use of XLSX but none seem very relevant to the problem I am have. I am also open to using the CSV package, if this is considered better suited. I have managed to write what I want to a *.dat file (comma separated) but this is a little cumbersome as I then need to read this file into a spreadsheet.

I think that the easiest way to go about this is to focus on creating a DataFrame first. If the DataFrame looks good, you can transform it into a CSV or Excel sheet.

To go to an Excel sheet, you can use this code:

using DataFrames
import XLSX

path = "/path/to/file"
XLSX.writetable(path, collect(eachcol(df)), names(df))

I’ve been running writetable in production for months already and had no issues with it so far.
In general, I do find the CSV.jl interface easier to work with though.

2 Likes

If you are ok to use the OpenDocument Spreadsheet Document format (that can be directly used in any modern version of Excel, not just LibreOffice/OpenOffice) you could use my package OdsIO, in particular:

using OdsIO
cd(@__DIR__)

fileOut   = "myspreadsheet.ods"
sheetName = "outSheet"

y = rand(9000,130)
a = rand(9000)
b = rand(9000)
# ...etc...

# Collect everything to export in a matrix (or DataFrame).. it is quicker than calling the export function 9000 times
out = Array{Any,2}(undef,9000,132) # adjust the type to your case, e.g. Array{Float64,2}
for i in 1:9000
    out[i,:] = vcat(y[i,:],a[i],b[i])
end

ods_write(fileOut,Dict((sheetName,1,1)=>out))

In the export function, the second parameter is a dictionary whose key is a tuple made of three elmements: (1) the sheet name (or number), (2) the row number and (3) the column number of where to start exporting the data (in this cas first row, first column of sheet “outSheet”), and the value is the actual object to export (it can be also a dataframe or an ordered dictionary).
You could obviously put the ods_write function within the for loop and writing the single row each time… it depends if you are more constrained by CPU time (go for the first approach) or memory (go for the second one)…

3 Likes

Transforming your data into a row would look something like this:

for i in 1:9000
    row = hcat(i, permutedims(y), a, b, c, n)
    # append row to file
end

Have you tried opening the .dat file with Excel or just renaming the extension to .csv and then trying to open it? Excel may be able to read the file how you want without it explicitly being .xlsx format.

All,

Many thanks for all of the valuable suggestions. I am now going through the documentation on df’s to get a better understanding. Most of my work deals with numerical methods, so I haven’t bothered much with df’s to this point … now’s the time to start!

Nathan,
In answer to your question about reading the *.dat file into excel, this hasn’t been a problem as I use a small VBA script to open the file (as a *.dat) and then read the contents into specified locations within my GUI master spreadsheet. However, this is surprisingly VERY slow(I am guessing a 3-4 minutes - will measure with a timer) to read in 8760 x 124 Float64 numbers (need double precision in excel otherwise some of my small numbers get truncated to zero) . I am guessing the df methods maybe faster, although I will still need to “pick off” the relevant cells in the *.xls file generated by Julia into my master excel GUI. I will need a VBA script to do this as well I guess or I could set links between the two spreadsheets.

Thanks again for everyone’s useful and supportive suggestions.

Peter, Brisbane, Australia

Sylvaticus,

I find your approach very interesting, particularly the flexibility to get around CPU or memory. I am still establishing which will be my limitation. At the moment my problem takes 7 minutes to run through the 9,000 iterations, writing the *.dat file as it goes. I have not tried loading all of the outputs into a matrices, and then printing out after the main loop has completed. I don’t think there will be a memory problem. Ultimately I want the warp the whole program up in an optimization engine, which will be CPU limited. During these runs I would suppress writing of most of the output results until the optimum solution is found.
Thanks for you ideas

Peter

I am having a little bit of trouble getting the concept to write the variables (a vector y and variables a and b) into a matrix at each iteration step (y, a and b change for each iteration i). I am using a matrix (rather than a DataFrame) at this time as it is easier for me to understand. Please see test code below which has the error below. No doubt I am missing something very basic. Hope someone can set me straight as once I have this method working it will open my way to do a whole lot of interesting options. Thanks Peter

using DataFrames
using DelimitedFiles

function foo()
    out= zeros(5,12)
    y=zeros(5,10)
    a=zeros(5)
    b=zeros(5)

    for i in 1:5
        a=rand(10)       # test data
        b=rand(10)       # test data
        y=rand(1:10, 10)    # test data
        out[i,:] = vcat(y[i,:],a[i],b[i])
    end
    CSV.write("MyOutput.csv", Tables.table(out),delim=',',decimal='.')

end
foo()
ERROR: LoadError: DimensionMismatch("tried to assign 3-element array to 1×12 destination")
Stacktrace:
 [1] throw_setindex_mismatch(::Array{Float64,1}, ::Tuple{Int64,Int64}) at .\indices.jl:193
 [2] setindex_shape_check at .\indices.jl:245 [inlined]
 [3] macro expansion at .\multidimensional.jl:795 [inlined]
 [4] _unsafe_setindex!(::IndexLinear, ::Array{Float64,2}, ::Array{Float64,1}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}) at .\multidimensional.jl:789
 [5] _setindex! at .\multidimensional.jl:785 [inlined]
 [6] setindex! at .\abstractarray.jl:1153 [inlined]
 [7] foo() at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:20
 [8] top-level scope at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:25
 [9] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
 [10] include_string(::Module, ::String, ::String) at .\loading.jl:1096
 [11] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [12] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709
 [13] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:185
 [14] (::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:144
 [15] withpath(::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams}, ::String) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:124
 [16] (::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:142
 [17] hideprompt(::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams}) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:36
 [18] (::VSCodeServer.var"#59#63"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:110
 [19] with_logstate(::Function, ::Any) at .\logging.jl:408
 [20] with_logger at .\logging.jl:514 [inlined]
 [21] (::VSCodeServer.var"#58#62"{VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:109
 [22] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [23] invokelatest(::Any) at .\essentials.jl:709
 [24] macro expansion at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:27 [inlined]
 [25] (::VSCodeServer.var"#56#57")() at .\task.jl:356
in expression starting at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:25

All of y, a and b are 10 element vectors, so you are picking 3 elements (element i from y, a, and b) respectively and try to store them in the first row of out, which has length 12.

It looks to me like you were intending to create a 10x10 matrix y from which you then want to select a 10-element vector to concatenate with elements a and b? That would be rand(10, 10) rather than rand(1:10, 10) - if you pass an iterable (like the range 1:10 as the first argument to rand, you will get a random draw from that iterable, compare:

julia> rand(1:3, 3)
3-element Vector{Int64}:
 2
 2
 1

julia> rand(3, 3)
3×3 Matrix{Float64}:
 0.196941  0.563995  0.406779
 0.583269  0.523321  0.252729
 0.70422   0.598516  0.239555
1 Like

Ni9ls,

Spot on! It works! I have just copied the command out[i,:] from some where else . Could you give me a simple explanation for the use of “:”? Presumably something like a “wild card”?

Many thanks for your quick response. Much appreciated

Regards

Peter

The colon means essentially “all elements in that dimension”:

julia> a = reshape(1:9, 3, 3)
3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9

julia> a[1, :]  # first row, all columns in that row
3-element Vector{Int64}:
 1
 4
 7

julia> a[:, 1] # first column, all rows in that column
3-element Vector{Int64}:
 1
 2
 3

Great. Thanks Nils. One more question. IF my y above was a 1D vector (different values generated at each iteration i) how would I concatenate this into the out matrix? Can I use the transpose command to transform y into a row 1D array (vector)? Something like?

out[i,:] = vcat(transpose(y) ,a[i],b[i])
Thanks Peter

Nils

Just to be clear I am trying to get the following to work, but get error below. This test problem better reflects my real world problem. Hoping you can help

using CSV
using DataFrames
using DelimitedFiles

function foo()
    out= zeros(5,12)
#    y=zeros(5,10)
    y=zeros(10)     # Column vectors
    a=zeros(5)
    b=zeros(5)
    
    for i in 1:5
        a[i]=1       # test data
        b[i]=2       # test data
    #    y=rand(10, 10)    # test data
        y=rand(10)
    #    out[i,:] = vcat(y[i,:],a[i],b[i])
        out[i,:] = vcat(transpose(y),a[i],b[i])

end
    CSV.write("MyOutput.csv", Tables.table(out),delim=',',decimal='.')

end
foo()
ERROR: LoadError: DimensionMismatch("mismatch in dimension 2 (expected 10 got 1)")
Stacktrace:
 [1] _cs at .\abstractarray.jl:1501 [inlined]
 [2] _cshp at .\abstractarray.jl:1487 [inlined]
 [3] _cshp at .\abstractarray.jl:1498 [inlined]
 [4] cat_shape(::Tuple{Bool}, ::Tuple{Int64,Int64}, ::Tuple{Int64}, ::Tuple{Int64}) at .\abstractarray.jl:1476 (repeats 2 times)
 [5] _cat_t(::Val{1}, ::Type{T} where T, ::LinearAlgebra.Transpose{Float64,Array{Float64,1}}, ::Vararg{Any,N} where N) at .\abstractarray.jl:1521
 [6] cat_t(::Type{Float64}, ::LinearAlgebra.Transpose{Float64,Array{Float64,1}}, ::Vararg{Any,N} where N; dims::Val{1}) at .\abstractarray.jl:1518
 [7] _cat at .\abstractarray.jl:1516 [inlined]
 [8] #cat#111 at .\abstractarray.jl:1654 [inlined]
 [9] vcat at .\abstractarray.jl:1583 [inlined]
 [10] foo() at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:23
 [11] top-level scope at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:29
 [12] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
 [13] include_string(::Module, ::String, ::String) at .\loading.jl:1096
 [14] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [15] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709
 [16] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:185
 [17] (::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:144
 [18] withpath(::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams}, ::String) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:124
 [19] (::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:142
 [20] hideprompt(::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams}) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:36
 [21] (::VSCodeServer.var"#59#63"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:110
 [22] with_logstate(::Function, ::Any) at .\logging.jl:408
 [23] with_logger at .\logging.jl:514 [inlined]
 [24] (::VSCodeServer.var"#58#62"{VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:109
 [25] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [26] invokelatest(::Any) at .\essentials.jl:709
 [27] macro expansion at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:27 [inlined]
 [28] (::VSCodeServer.var"#56#57")() at .\task.jl:356
in expression starting at c:\Users\peter\Documents\Julia_Code\Greenhouse_Model\Green-Lights-main\Test_Adding_Data.jl:29```

I don’t think you need to transpose here? `

julia> vcat(y, a[1], b[1])
12-element Vector{Float64}:
 0.9318158821949039
 0.3359640881496957
 0.16623594634909078
 0.9495470560720898
 0.27541753284252435
 0.16017290691025687
 0.8054610042893757
 0.713729712443578
 0.580308842677498
 0.1172294422664153
 1.0
 2.0

Thanks. Ultimately I want to write out y, a and b into rows for each i. Y is column vector at each i and a and b are just values that vary with each iteration. I think I need to transpose y to load into out? 11.00 pm in Brisbane. Finished fir the evening. Regards Peter

No, that was my point:

julia> out = zeros(5, 12)
5×12 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> out[1, :] = [rand(10); 1; 2]; out
5×12 Matrix{Float64}:
 0.318656  0.210235  0.635783  0.357457  0.401695  0.409686  0.980032  0.492942  0.686835  0.119913  1.0  2.0
 0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0  0.0
 0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0  0.0
 0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0  0.0
 0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0  0.0

The not-so-obvious point here is that in out[i,:] =x, x is a column vector, not a row one, even if the destination is a row…

Nils et al,

I am now clear. I think I made it more difficult by choosing a poor example (see below for working code), but I learned in the process. One step forward two steps back as they say! Regards Peter

using CSV
using DataFrames
using DelimitedFiles

function foo()
    out= zeros(5,7)
    y=[1,2,3,4,5]       # Test vector

    for i in 1:5
        a=i             # Test value
        b=i*i           # Test value
        out[i,:] = vcat(y,a,b)
    end
    CSV.write("MyOutput.csv", Tables.table(out),delim=',',decimal='.')
end
foo()
2 Likes

Use this. I think this will solve your problem.

using DataFrames, XLSX

y = 2 
a = 3  
b = 4  
c = 5  
n = 6  

df = DataFrame(i = Int[], y = Int[], a = Int[], b = Int[], c = Int[], n = Int[])

for i in 1:9000
    row = (i, y, a, b, c, n)
    push!(df, row)
end


XLSX.writetable("file.xlsx","sheet1" => df)