Reading data of different types from a data file with multiple lines and headers for each line

Many thanks for the detailed explanation. I have managed to build my complete dictionary and made the conversions using the method you outlined above - a lot of typing but now complete and the method understood.

A couple of remaining questions:

  1. How do you assign “input” to the file containing the values? In your example you contained the data in IOBuffer, which I also used for demonstration purposes(cut and pasted from file). Going forward I want to be able to assign the values by reading from a file, which I presumably open in the usual way:

Once open, how to do the assignment?
2. I have dropped dates from my file, although going forward I would like to include. Is there a simple way to assign which values are dates(like you have done for the strings above)? VBA always puts # around a date when written to a file. eg. ,#9-12-2020#,

  1. If there was a requirement to use the variables in more than one scope, do we have to repeat the unpacking process outlined above for each scope?

I am almost there thanks to all the help I have received. I will also try the CSV package as well, as I will need to get my head around this for this and other aspects of the project. Thanks to all who have grappled with my problem

Nils, as you can see I have made considerable progress with putting data into, with a couple of outstanding questions still pending with Florian. For completeness, I have had a look at re-ordering my output from VBA to be in ordered pairs, written to a file, “MyFile_Pairs.txt”, as you suggested. See small sample below:

"Location","Toowoomba"
"Elevation",598
"Latitude",-27.55
"Longitude",151.95
"Year","2010"
"Sky_TRNSYS",1

The following line reads the file into an array:

df=readdlm("C:\\Users\\peter\\Documents\\Julia_Code\\Learning\\MyFile_Pairs.txt",',',header=false);

println(df[4,1])
println(df[4,2])

Now for some questions:

  1. How best to “package” this array df to be available to other module in my program?
  2. Rather than making a Dictionary is there any other structure, such as struct that might be easier to define? Perhaps and example using my small data file would be useful
  3. If I used struct or just left the df as an array, and then used “using” to include in another scope or module, would I then need a function to “unpack” the array or struct so I can use the variable names directly ( in much the same way Florian outlined to unpack a Dictionary)? I guess I could include this function in the module where I read in the data. In this way this function would be made available in the module with the using xxxx statement as well. Is this thinking correct?

I realise I am a bit slow, but I am making progress due to everyone’s efforts. I am determined to get this inputting of data sorted in the next day or so, so I can get on with the more interesting stuff of solving the equations. Many thanks

Okay here’s an attempt at a reasonably close approximation to what I would try:

Start with a csv file which has 400 columns, each of which has the name of the parameter as the header, and 1 row, which has the values for each parameter. At this point your parsing step is as simple as:

julia> using CSV, DataFrames

julia> input_data = Tables.rowtable(CSV.File("out.csv", pool = false))
1-element Vector{NamedTuple{(:location, :elevation, :latitude, :longitude, :year), Tuple{String, Int64, Float64, Float64, Int64}}}:
 (location = "Toowoomba", elevation = 598, latitude = -27.55, longitude = 151.95, year = 2010)

The result is a vector of NamedTuples (with a length of one, as there’s only one row). NamedTuples work well with the macros in Parameters.jl,

Next, define a struct that holds your parameters:

julia> using Parameters

julia> @with_kw struct ModelParameters
           location::String
           elevation::Int64
           latitude::Float64
           longitude::Float64
           year::Int64
       end
ModelParameters

Note that this is still quite cumbersome if the struct has 400 fields (I also don’t know what the compiler will think about this, as I’ve never encountered such a large struct, but I guess performance won’t matter when just passing the parameters around). There might be a way to group things which makes intuitve sense, but that will depend on your application - as an example, you might have a field position which is a Vector{Float64} that holds elevation, longitude, latitude instead of three separate fields.

With your struct defined and the data from VBA read into a NamedTuple, you can then create a ModelParameters instance to collect the parameters and pass them into your solver:

julia> parameters = ModelParameters(input_data[1]...)
ModelParameters
  location: String "Toowoomba"
  elevation: Int64 598
  latitude: Float64 -27.55
  longitude: Float64 151.95
  year: Int64 2010

This parameters object can then be passed into the solver (so your solver should be defined as solve_ode(parameters::ModelParameters). Then at the start of your solve_ode function you do:

julia> @unpack location, elevation, latitude, longitude, year = parameters
ModelParameters
  location: String "Toowoomba"
  elevation: Int64 598
  latitude: Float64 -27.55
  longitude: Float64 151.95
  year: Int64 2010

julia> location
"Toowoomba"

and all your variables are defined.

Thanks. I need to work my way through this. Please note that many of the 400 parameters are manipulated and combined before appearing as "parameters " in front of the dependant variables on the RHS of the system of ODEs. Again thanks for your help.

So first of all, I like the approach Nils proposed with CSV, especially because CSV.jl takes care of parsing to the appropriate types.

The IOBuffer was indeed for demonstration purposes, but opening anything else is pretty easy too:
(See also Networking and Streams · The Julia Language)

# using the inputtodict function
mydictwithvalues = open(inputtodict, "filename.extension") # if file is in the same folder, see pwd()

mydictwithvalues = open(inputtodict, "path/to/file.xt") # if the file is anywhere on your machine
# equivalently:
mydictwithvalues = open("filename or path") do input
                   inputtodict(input)
                   end

# for CSV:
using CSV, DataFrames
input_data = Tables.rowtable(CSV.File("full path or filename", pool = false)) # as Nils already said

You can do that easily by stripping the "#"s and parsing the Date, the second argument to date is the format string: day-month-year, change as necessary

using Dates

julia> thedate = Date(strip( "#02-03-2045#" ,'#'), "dd-mm-yyyy") # thedate will be a Date object
2045-03-02

julia> year(thedate)
2045

julia> dayname(thedate)
"Thursday

If you want to automatically recognize if the String you input is a Date (in your format, starting with “#”), it’s as simple as:

using Dates
julia> dateorstring(thestring) = startswith(thestring, "#") ? Date(strip( thestring ,'#'), "dd-mm-yyyy") : thestring
dateorstring (generic function with 1 method)

julia> dateorstring("notadate")
"notadate"

julia> typeof(dateorstring("notadate"))
String

julia> dateorstring("#11-12-2013#")
2013-12-11

julia> typeof(dateorstring("#11-12-2013#"))
Date

# and of course violating the assumption that only Dates start with "#" errors out:
julia> dateorstring("#looks like a date but it's not#")
ERROR: ArgumentError: Unable to parse date time. Expected directive DatePart(dd) at char 1
Stacktrace: [...]

(the “if ? then : else” construct is a ternary operator and works just like regular if else end blocks but is shorter)

That’s up to how you write it. If you “unpack” variables in a local scope, then you would have to repeat it in a different scope. If you call your solver as a function, you can unpack all variables in the outer scope and just call the functions with the variables they need (i.e. make the variables arguments of the functions). Using Nils approach with Parameters, you can just pass around the struct to every function or scope you desire and selectively unpack the variables you need there.

A question on the side: how is your code structured? Are you reading in your files in a REPL and the call the solver by hand, or do you have a script of sorts or will input, manipulation and running a solver be integrated into a single function?

Thank you. I will need to work through all this material and see what I get. I am using Vscode to write a script and will hope to run the program in stand alone mode with the solver in my script. I actually have the program running in VBA, but it takes 4.5 hours to solve for every hour of the year, hence the move to Julia. At this time our team wants to define all inputs in Excel and then read the outputs from the Julia script back into Excel for graphing and further analysis. I know it sounds complicated… my initial inclination was to do it in my native language Fortran, but going forward this was seen as a retrograde step.

Whatever works for you I guess, but I had a good laugh at the fact that Fortran is seen as a step back from solving something in VBA and Julia is not :smiley: best of luck to you

If you really intend to transfer things back to Excel, then going with CSV.jl is the most natural solution as it works for both ways.

Florian I tried all of your suggestions and very happy to say they work (I am yet to implement the CSV, struct approach, but I will). Thanks for supporting all my questions. I have a couple of general questions that have intrigued me on my short journey with Julia:

  1. Finding good examples of script. For example in the function you wrote you have the key part,
@assert length(header) == length(values)
           for (k,v) in zip(header, values)
               output[k] = trytoinfertype(v)
           end
How did you know to use these lines? None of the basic documentation material on Julia, I could "google", would lead me to these lines. It just intrigues me how you knew to insert these lines. How long have been using Julia. ? In "my day" you studied the textbook , got to know the language syntax and proceeded with those tools to master the language ... along the way you picked up "tricks" and things to make it easier and write more compact code.
2. Is there a recommended on-line course(s) that would lead one to acquire a comprehensive grasp of the language, such as the code you suggested?
3. I have not used Python, but I am guessing Julia and Python are very similar in structure?
4. Comment: As you probably know Fortran and VBA have both sub-routines and functions, which can appear anywhere in the script. Mostly you would structure your program with a "main program" first and all of the sub-routines and functions, in any order, at the bottom of the program. These programs also accommodate easily global variable definitions and "option explicit" or implicit none, which stops typos. I have struggled to make the move to Julia without these features, but I persist and, thanks to all the help on this site, I can see progress..... I am swept along by the belief of many colleagues that Julia solves the two language program (certainly VBA was a poor choice for the problem I am tackling !) . I do like the concept of "Modelling Toolkit" and some of the other high level representation of problems outlined by Nils and others. This remains the appeal and I sincerely hope I am not let down by speed.

Nils I have transposed much of my input data into a single line file with a header and successfully read in the data. Please note that I stuck with the *.txt extension, as this still gives me a NamedTuple representation (see below for a truncated version of the input file).

input_data = Tables.rowtable(CSV.File("MyFile_Pairs.txt", pool = false));

println(input_data)

@with_kw struct ModelParameters

    Location::String
    Elevation::Float64
    Latitude::Float64
    Longitude::Float64
    Year::String
    Sky_TRNSYS::Int64
    Crop_Model:Int64
    Start_Hr::Int64
    End_Hr:: Int64
    Hours_Total:Int64
end

parameters = ModelParameters(input_data[1]...)

NamedTuple{(:Location, Symbol(" Elevation"), Symbol(" Latitude"), Symbol(" Longitude"), Symbol(" Year"), Symbol(" Sky_TRNSYS"), Symbol(" Crop_Model"), Symbol(" Start_Hr"), Symbol(" End_Hr"), Symbol(" Hours_Total ")),Tuple{String,Int64,Float64,Float64,Int64,Int64,Int64,Int64,Int64,Int64}}[(Location = "Toowoomba",  Elevation = 598,  Latitude = -27.55,  Longitude = 151.95,  Year = 2010,  Sky_TRNSYS = 1,  Crop_Model = 1,  Start_Hr = 1,  End_Hr = 4368,  Hours_Total  = 4368)]

Upon running this program I get the error at the bottom of this text.

Questions:

  1. Any ideas what is causing the error? Please note I have defined my structural variables with capitals to align with my data and program variables.

  2. I thought Tuples used (). eg. nt = (a=1, b=2, …) whereas Arrays used ? I have tried both without effect

  3. In the statement parameters = ModelParameters(input_data[1]…) do you really just incide the ellipses or does the line need to be completed in some other way?

Many thanks. I will wait your answers before transferring all on my inputs.

ERROR: LoadError: invalid redefinition of constant ModelParameters
Stacktrace:
 [1] top-level scope at C:\Users\peter\.julia\packages\Parameters\CVyBv\src\Parameters.jl:572
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
 [3] include_string(::Module, ::String, ::String) at .\loading.jl:1096
 [4] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [5] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709
 [6] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:185
 [7] (::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:144
 [8] withpath(::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams}, ::String) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:124
 [9] (::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:142
 [10] hideprompt(::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams}) at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:36
 [11] (::VSCodeServer.var"#59#63"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:110
 [12] with_logstate(::Function, ::Any) at .\logging.jl:408
 [13] with_logger at .\logging.jl:514 [inlined]
 [14] (::VSCodeServer.var"#58#62"{VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:109
 [15] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [16] invokelatest(::Any) at .\essentials.jl:709
 [17] macro expansion at c:\Users\peter\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:27 [inlined]
 [18] (::VSCodeServer.var"#56#57")() at .\task.jl:356
in expression starting at c:\Users\peter\Documents\Julia_Code\Learning\Read_List_Variable.jl:22
  1. The error comes from redefinining a struct - this is a fundamental limitation of Julia currently, it is not possible (without some tricks) to change and redefine a struct defined in a session, so if you’ve run the @with_kw struct ModelParameters part of the code once, then change the fields of ModelParameters and run it again you will see that error. If you don’t need loads of other packages and your startup time is low, the easiest way around it is to just re-start Julia when you need to change the definition of ModelParameters.

  2. Yes that’s correct - sorry not sure what your question is here, but given the above I don’t think your problems are related to brackets?

  3. Yes, the ellipses are what’s called splatting, basically passing one argument which is a collection (vector/tuple) and splitting it into its components when passing it into the function. In this case it splits up the NamedTuple and passes every name-value pair as one argument to the constructor. A simpler example to illustrate:

julia> f(arg1, arg2) = println("my first argument is $arg1 and the second is $arg2")
f (generic function with 1 method)

julia> x = [5, 6];

julia> f(x...)
my first argument is 5 and the second is 6
1 Like

Nils, that did the trick (stopping and starting - I also had a couple of typos). I have already read in about 100 of the 400 variables but I have encountered a couple of issues related to VBA (not Julia) , which may require a re-think of the best way to handle the input file. I would value your opinion, as my ultimate goal it to present the problem to the Julia ODE solvers (and ultimately the optimizing engine) in the best way possible:

  1. The maximum number of line continuations in a VBA write/print statement is 25, which is not enough to write all my values on one line. I could possibly join to files together or write the headings and data to a spare spreadsheet in two rows and then read it back into another file that I ultimately present to Julia … bit messy and in any case, suffers from the problem 2 below.

  2. When I write the headings and one line and the data(values) on the line below there is no visible correspondence, as the headings are much longer than the values - basically kills my idea of having an editable data file. This was not so much of a problem when I wrote multiple lines to the file , as the correspondence between headings and values was retained.

  3. If I write the data into two columns (headings in one column and data in the other column) can I still retain the struct format that you say is valuable for presenting to the ODE solver, This file format is essentially a table, which would be the most easy to edit. I think I would need your help on how to read in this file and set up a struct, assuming it was possible?

  4. Note I have completed the Dictionary method that Florin outlined, so I guess this is an option for presenting and unpacking data for the solver? If it is not as good as struct or offers some limitations, then I am all for taking the best route to achieve the ultimate result.

Again many thanks for all of the help and setting me straight on many concepts. I liked the splatting explanation. Peter

1 Like

Yes I would probably just use two columns, one for variable name, one for value. You can then just flip this around e.g. with DataFrames it would be:

julia> using DataFrames

julia> df = DataFrame(variable = ["Location", "Elevation"], value = ["Toowaboo", 534])
2×2 DataFrame
 Row │ variable   value    
     │ String     Any      
─────┼─────────────────────
   1 │ Location   Toowaboo
   2 │ Elevation  534

julia> permutedims(df, "variable")
1×3 DataFrame
 Row │ variable  Location  Elevation 
     │ String    Any       Any       
─────┼───────────────────────────────
   1 │ value     Toowaboo  534

and you can get your NamedTuple from there to pass to your ModelParameters constructor.

EDIT: The downside that I notice now as I post this is that you lose the automatic type detection that CSV.jl gives you, which you could potentially sort by just writing the file back out and reading it in again in the wide format.

I will try that. Thanks. Just to be clear the Dictionary method is not as friendly as the struct approach when setting up an ODE solver problem? The Dictionary seems to have the ability to unpack just the specific variables you need for a particular function. Does the struct method offer this targeted ability as well? Thanks Peter

Yes, if in my example above you replace

julia> @unpack location, elevation, latitude, longitude, year = parameters

with e.g.

julia> @unpack location, year = parameters

you would only define the location and year variables

Great will try to get the full struct working tomorrow. Need to log off as 8.00 pm in Brisbane. Thanks again.

1 Like

Knowing a bit of Python is certainly good to get started with Julia, but it’s not necessary. You could have a look at this, if you have a background in either MATLAB, R, Python, C or Lisp and need a little help for transitioning:

https://docs.julialang.org/en/v1/manual/noteworthy-differences/

I’ve been using Julia for around a year now, and I’m currently working on my master thesis using Julia, so that helps enormously.

The approach is pretty much the same as in your days: I started by reading the manual and coding up really basic examples and things for fun. It’s of great help to have a kind of ‘toy project’ which you can code up bit by bit, seeing progress when you make it and being able to revise and refactor the code many times. You can be sure that I started writing horrible code (by my current standards) at first :smiley: I think I turned the basic bits of my thesis project inside out for the 3rd time now. (I started the project a while ago and it turned into a thesis eventually, it didn’t start as one, else I would have ran out of time by now). [Edit: I’m still far from writing optimal code, but it’s getting better. You’ll get there, too.]

It’s just a process like learning any language. I like revisiting the docs once in a while, and also looking at the code in Julia Base is often helpful, especially if you implement your own types and need help implementing specialized functions. It’s really helpful to look at how other people solved these problems and digest that.

Oh and reading around the julia discourse here is also pretty helpful, as there are a lot of questions and answers and nice examples here.

Can’t really help with that, sorry, but similar questions have been asked multiple times here. Maybe you find something there:

And finally:

Both are fine. With a dictionary holding different types, you need to make sure to annotate the types when you unpack them (for performance reasons), with Nils’ struct approach that comes for free.

Good luck to you!

[Edit: typing errors]

2 Likes

Many thanks for sharing your insights and journey with Julia. Helpful to me and others I am sure. Best wishes Peter

2 Likes

Nils, it appears that the following statement can be used to read in a vertical file by using the transpose = true option in the argument list of the CSV.read function. Do you concur? I am not sure what the pool = false does. Could you elaborate a little

input_data = Tables.rowtable(CSV.File("MyFile_Vertical.txt",transpose = true, delim =',', pool = false));

In conclusion, if the above works, then I now have two paths to import my parameters (Dictionary and struct). Thanks again to you both for staying the journey. Peter

Yes, I forgot about that CSV option - indeed you can just read it in transposed in that way.

As for pool:

help?> CSV.File
  CSV.File(source; kwargs...) => CSV.File

(...)

•    pool::Union{Bool, Float64}=0.1: if true, all columns detected as String will be internally pooled; alternatively, the proportion of unique
              values below which String columns should be pooled (by default 0.1, meaning that if the # of unique strings in a column is under 10%, it will
              be pooled)

“pooling” here refers to compressing a string column by only storing unique strings and a lookup table, and then replacing the strings in the column with the lookup value.

Nils, “pool” is a powerful option! Just to let you know, I have successfully read-in all my variables, into both a struct and a Dictionary. Next week I will try to connect up my modules, and develop the coefficients that the RHS of my 123 ODE’s require. I only have a week left before I go on Xmas break. Best wishes to you both. Peter

2 Likes