Converting NamedTuple to DataFrame seems expensive?

xiaodai · May 3, 2020, 2:30pm

I have a huge NamedTuple and

DataFrame(larged_namedtuple)

takes a long time like 12s and uses quite a bit of RAM

The eventual target for me is a DataFrame but I created the named tuple so that users can choose the sink they want. Is it better to not create the named tuple and just create the DataFrame? That would force a dependency on DataFrame on a package that I am only a potential contributor to, so if it can be avoided would be good.

Any good solutions?

pdeffebach · May 3, 2020, 3:01pm

Use copycols = false in the DataFrame constructor.

If you can avoid a dependency on data frames that would be best.

davidanthoff · May 3, 2020, 4:22pm

Huge in what sense? Many columns?

xiaodai · May 3, 2020, 4:29pm

100s col 26m rows

xiaodai · May 3, 2020, 4:30pm

Does it make the dataframe immutable?

pdeffebach · May 3, 2020, 4:38pm

As far as I can tell, no. I think the issue you are thinking of is CSV.read which used to return an immutable AbstractArray type that would cause problems with copycols = false.

davidanthoff · May 3, 2020, 4:58pm

@xiaodai, it is a lot easier to help you if you provide more concrete information. Are you passing one named tuple with 100s fields and each field is a vector with one element per row? Or are you passing a vector of named tuples? Or an iterator of named tuples?

xiaodai · May 3, 2020, 11:44pm

Firstly, I create 100s of vectors using the multi-threading, so I have a (unamed) tuple of 100s of materialized vectors.

Then I create names for them using namedtuple. Come to think of it, I can just create DataFrame from tuple. And then give them names so I can skip the name tuple dependency.

Topic		Replies	Views
Memory allocations when converting from NamedTuples to DataFrame Performance dataframes	4	896	July 17, 2020
Extracting row of DataFrame directly as NamedTuple? General Usage dataframes	4	4300	October 2, 2019
Converting Array of Named Tuples to DataFrames General Usage	3	2641	January 8, 2019
Dataframe destructors Data question , dataframes , namedtuple	2	486	February 20, 2022
How to create `DataFrame` from using NamedTuple keys as column names Data	4	2812	August 11, 2019

Converting NamedTuple to DataFrame seems expensive?

Related topics