How to read Panda's DataFrames from json file?

sERRa · January 30, 2023, 12:40am

I have a huge collection of json files containing a single panda’s dataframe each. I would like to open them in julia but I haven’t found the solution. A caveat is that the dataframes have strings as row indexes.
Any help would be appreciated!

TheCedarPrince · January 30, 2023, 12:55am

This is an interesting format - didn’t realize pandas serialized to JSON but doesn’t surprise me. Could you share an example of what this JSON looks like? First thought is to do something like:

Use JSON3 to convert JSON files to Julia Dict objects
Write your own parser in how the data should look
Convert data to a DataFrame via DataFrames.jl
Repeat steps 1 - 3 until you have loaded in all your dataframes
vcat each dataframe from this loading – via something like vcat(df1, df2, ...).
Sort on your column that has string based indices (I don’t think you have to convert to Int here – if you do, should just be like df.rownum = parse.(,Int,df.rownum))
Off to the rest of your analysis!

Without an example, it is rather hard to imagine exactly how this would work but this is what comes to mind. Does that help?

P.S. Also, welcome to the Julia community!

sERRa · January 30, 2023, 1:09am

Thank you very much for your kind response, @TheCedarPrince!

In this first picture, we can see the partial contents of one of my json files, which is a long dataframe (32 rows, 105 columns).
I didn’t understand what you mean by sort my column that has string based indices. The whole dataframe has string indices instead of numbers.
I really don’t have a clue of how to proceed, any further help will be of great help.

sERRa · January 30, 2023, 1:14am

I already managed to read the json file by converting it to a string, as shown in the picture above. But I don’t know how to have it opened as the Dataframe that it is.

TheCedarPrince · January 30, 2023, 1:17am

Ah got it! So, actually, I think the simplest way may be this:

Open up your files again as pandas dataframes (within Python)
Write these dataframes to CSVs instead
Use CSV.jl to open the CSV into a DataFrame via DataFrames.jl (within Julia)
Continue with your analysis.

Does that work for your needs?

sERRa · January 30, 2023, 1:34am

I converted one of the json files to csv within python to test your solution. My problem now is that I do not know how to open it as a dataframe as you suggest? Do you have the code for that? Thanks again!

jd-foster · January 30, 2023, 2:07am

I wonder if JSONTables.jl might work here…

using JSONTables, DataFrames

json_string = read("00_stats.json", String)
jtable = JSONTables.jsontable(json_string)
df = DataFrame(jtable)

(Of course, I haven’t tested this against your file format.)

sERRa · January 30, 2023, 2:11am

Thanks for your help. I got the following error:

ArgumentError: input JSON3.Object must only have JSON3.Array values to be considered a table

Stacktrace:
[1] jsontable(x::JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}})
@ JSONTables C:\Users\sERRa.julia\packages\JSONTables\EZWPP\src\JSONTables.jl:26
[2] jsontable(source::String)
@ JSONTables C:\Users\sERRa.julia\packages\JSONTables\EZWPP\src\JSONTables.jl:15
[3] top-level scope
@ In[31]:4

jd-foster · January 30, 2023, 2:16am

I guess the file format is not

a JSON object of arrays, or a JSON array of objects,

as the JSONTable readme says. It was a bit of a low-probability that this would work directly.

jd-foster · January 30, 2023, 2:19am

Can you post a link to an example file you’re using?

sERRa · January 30, 2023, 2:23am

This is the json file containing a single Dataframe consisting of 32 rows and 105 columns.

jd-foster · January 30, 2023, 2:57am

Ok, this basically follows what @TheCedarPrince outlined above:

using JSON3
using DataFrames

json_string = read("00_stats.json");
jsobj =  JSON3.read(json_string);

rows = DataFrame[];

for sym in keys(jsobj)
    # sym = :Var # For example.
    row = DataFrame([k => jsobj[sym][k] for k in keys(jsobj[sym])])
    insertcols!(row, 1, :KeyID => String(sym))
    push!(rows, row)
end

df = reduce(vcat,rows)

(The KeyIDs seem unintelligible to me but trust you can figure it out!)

sERRa · January 30, 2023, 3:14am

jd-foster:

using JSON3
using DataFrames

json_string = read("00_stats.json");
jsobj =  JSON3.read(json_string);

rows = DataFrame[];

for sym in keys(jsobj)
    # sym = :Var # For example.
    row = DataFrame([k => jsobj[sym][k] for k in keys(jsobj[sym])])
    insertcols!(row, 1, :KeyID => String(sym))
    push!(rows, row)
end

df = reduce(vcat,rows)

This worked perfectly!! Only thing, rows and columns are inverted, how can I transpose them?

jd-foster · January 30, 2023, 3:17am

Try permutedims(df, :KeyID)

sERRa · January 30, 2023, 3:33am

Do you know the equivalent of df.iloc[1] to extract a row?

mrufsvold · January 30, 2023, 3:42am

ExpandNestedData.jl might help you.

I haven’t registered it yet, so you need to install it with the GitHub link (see the docs)

mrufsvold · January 30, 2023, 3:43am

df[1, :]

Should do it!

mrufsvold · January 30, 2023, 3:48am

I moved the solution mark back to @jd-foster since his response is the solution to OP

TheCedarPrince · January 30, 2023, 3:56am

Glad things were resolved! Got distracted by hanging out with my family.

That said, welcome to the Julia fam! Glad you got your questions answered and see you around!

sERRa · January 30, 2023, 4:21am

How can I fix the for loop so that the columns and rows are switched here, instead of having to use permutedims(df, :KeyID) later?

Topic		Replies	Views
DataFrames, best way to import from JSON format file Data	6	8112	October 15, 2019
Recommended way to save and read DataFrames in JSON format Data	8	3669	February 12, 2018
JSON to Array of DataFrames General Usage question	4	397	November 22, 2019
Processing JSON from a .txt file and converting to a DataFrame New to Julia dataframes , json3	7	2479	May 15, 2021
Easiest way to load a DataFrame from a compressed, newline delimited json file on the cloud? Data dataframes	2	2585	October 20, 2020

How to read Panda's DataFrames from json file?

Related topics