I have a huge collection of json files containing a single panda’s dataframe each. I would like to open them in julia but I haven’t found the solution. A caveat is that the dataframes have strings as row indexes.
Any help would be appreciated!
This is an interesting format - didn’t realize pandas serialized to JSON but doesn’t surprise me. Could you share an example of what this JSON looks like? First thought is to do something like:
- Use JSON3 to convert JSON files to Julia Dict objects
- Write your own parser in how the data should look
- Convert data to a DataFrame via DataFrames.jl
- Repeat steps 1 - 3 until you have loaded in all your dataframes
-
vcat
each dataframe from this loading – via something likevcat(df1, df2, ...)
. - Sort on your column that has string based indices (I don’t think you have to convert to Int here – if you do, should just be like
df.rownum = parse.(,Int,df.rownum)
) - Off to the rest of your analysis!
Without an example, it is rather hard to imagine exactly how this would work but this is what comes to mind. Does that help?
P.S. Also, welcome to the Julia community!
Thank you very much for your kind response, @TheCedarPrince!
In this first picture, we can see the partial contents of one of my json files, which is a long dataframe (32 rows, 105 columns).
I didn’t understand what you mean by sort my column that has string based indices. The whole dataframe has string indices instead of numbers.
I really don’t have a clue of how to proceed, any further help will be of great help.
I already managed to read the json file by converting it to a string, as shown in the picture above. But I don’t know how to have it opened as the Dataframe that it is.
Ah got it! So, actually, I think the simplest way may be this:
- Open up your files again as pandas dataframes (within Python)
- Write these dataframes to CSVs instead
- Use CSV.jl to open the CSV into a DataFrame via DataFrames.jl (within Julia)
- Continue with your analysis.
Does that work for your needs?
I converted one of the json files to csv within python to test your solution. My problem now is that I do not know how to open it as a dataframe as you suggest? Do you have the code for that? Thanks again!
I wonder if JSONTables.jl might work here…
using JSONTables, DataFrames
json_string = read("00_stats.json", String)
jtable = JSONTables.jsontable(json_string)
df = DataFrame(jtable)
(Of course, I haven’t tested this against your file format.)
Thanks for your help. I got the following error:
ArgumentError: input JSON3.Object
must only have JSON3.Array
values to be considered a table
Stacktrace:
[1] jsontable(x::JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}})
@ JSONTables C:\Users\sERRa.julia\packages\JSONTables\EZWPP\src\JSONTables.jl:26
[2] jsontable(source::String)
@ JSONTables C:\Users\sERRa.julia\packages\JSONTables\EZWPP\src\JSONTables.jl:15
[3] top-level scope
@ In[31]:4
I guess the file format is not
a JSON object of arrays, or a JSON array of objects,
as the JSONTable readme says. It was a bit of a low-probability that this would work directly.
Can you post a link to an example file you’re using?
This is the json file containing a single Dataframe consisting of 32 rows and 105 columns.
Ok, this basically follows what @TheCedarPrince outlined above:
using JSON3
using DataFrames
json_string = read("00_stats.json");
jsobj = JSON3.read(json_string);
rows = DataFrame[];
for sym in keys(jsobj)
# sym = :Var # For example.
row = DataFrame([k => jsobj[sym][k] for k in keys(jsobj[sym])])
insertcols!(row, 1, :KeyID => String(sym))
push!(rows, row)
end
df = reduce(vcat,rows)
(The KeyID
s seem unintelligible to me but trust you can figure it out!)
This worked perfectly!! Only thing, rows and columns are inverted, how can I transpose them?
Try permutedims(df, :KeyID)
Do you know the equivalent of df.iloc[1] to extract a row?
ExpandNestedData.jl might help you.
I haven’t registered it yet, so you need to install it with the GitHub link (see the docs)
df[1, :]
Should do it!
I moved the solution mark back to @jd-foster since his response is the solution to OP
Glad things were resolved! Got distracted by hanging out with my family.
That said, welcome to the Julia fam! Glad you got your questions answered and see you around!
How can I fix the for loop so that the columns and rows are switched here, instead of having to use permutedims(df, :KeyID) later?