I am very new to Julia and I am struggling with the following. I have a python hdf5 file in Python full of data and I convert it to a Python bumpy DataFrame which I then save to a csv file. Then I want to use this file in Julia but it looks much different than in Python:
You should be able to load the HDF5 file directly using HDF5.jl. If that’s not an option, it’d help if you could provide the raw CSV file (or at least the first few rows) and whatever commands you’re currently using to load it. See this post for pointers on how to make your question easier to answer:
Your CSV file isn’t publicly visible with your current sharing settings. If you already have a GitHub account, the easiest thing is to upload it to gist.github.com.
What you need to figure out how to do is how to “clean” the values of your data frame so that they work with parse.
This involves
strip to remove extra white space
using replace to remove the ( and ).
Here is a full example
julia> using CSV, Chain, DataFrames
julia> df = CSV.read("data.csv", DataFrame; delim = ",", header = false);
julia> function clean_parse_complex(x)
c = @chain x begin
strip()
replace("(" => "")
replace("(" => "")
parse(Complex{Float64}, _)
end
end
clean_parse_complex (generic function with 1 method)
julia> df.c = clean_parse_complex.(df.Column1);
One thing to note, though, is it looks like all your values are real! They all have 0 imaginary component. Maybe you are encoding things as complex in python when that isn’t necessary?
This is part of my data. I have more with complex values. I will try and see if it works. Not sure how I could figure all this out without assistance here.
Thanks a lot. I am trying to use the function that you defined above for a second dataset which actually involves imaginary parts. Nevertheless I get the error: ArgumentError: expected trailing "im", found only "m" Stacktrace: [1] tryparse_internal(::Type{Complex{Float64}}, ::String, ::Int64, ::Int64, ::Bool) at ./parse.jl:316 [2] parse(::Type{Complex{Float64}}, ::String) at ./parse.jl:378 [3] clean_parse_complex(::String) at ./In[46]:11 [4] _broadcast_getindex_evalf at ./broadcast.jl:648 [inlined] [5] _broadcast_getindex at ./broadcast.jl:621 [inlined] [6] getindex at ./broadcast.jl:575 [inlined] [7] macro expansion at ./broadcast.jl:932 [inlined] [8] macro expansion at ./simdloop.jl:77 [inlined] [9] copyto! at ./broadcast.jl:931 [inlined] [10] copyto! at ./broadcast.jl:886 [inlined] [11] copy at ./broadcast.jl:862 [inlined] [12] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(clean_parse_complex),Tuple{Array{String,1}}}) at ./broadcast.jl:837 [13] top-level scope at In[46]:16 [14] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
I cannot understand why it works for the first dataset but not the second one. I attach here the second dataset in case you can provide some assistance.
in the example I initiated this topic with. Then I realized that the rest of my dataset was saved as csv files using:
df.to_csv('ata.csv', index=False)
As mentioned above indeed, the latter way includes a header that I did not know how to get rid off initially. But it all works now. Quite frustrating for a beginner.
Beginning to learn a programming language by parsing strings doesn’t sound like a great idea to me. (And if you do, I don’t see how Julia is any more difficult in that regard than other languages.)
As has already been pointed out above, it is probably a mistake to convert your data to from HDF to CSV and then parse it in Julia. Why not use HDF5.jl like you’re using h5py in python directly? This will preserve all the proper data types and string parsing won’t be necessary at all.
Alternatively, you could try to convert a python data frame (pandas at least) directly to a Julia DataFrame via PyCall.jl and/or Pandas.jl. This would again avoid any csv business.
Thanks. I 've been using symbolic tools for years and programming for such also for years but now only I have to deal with proper data. So, sometimes, I am unaware even of the proper terminology, e.g. parsing.
to directly import Pandas-style HDF files in Julia - for HDF, it calls Pandas under the hood (with Pandas.jl), therefore it should support all Python data types (including e.g. pickled strings).