AbstractDataFrame is not iterable, I just want to make a histogram

Axze-rgb · August 2, 2023, 11:53am

Hello,

why is it this is not iterable?

Here is the code

begin
CairoMakie.activate!(type = “svg”)
fig2 = Figure(pt_per_unit = 2)
ax2 = Axis(fig[1, 2], xlabel = “MAPQ”)
title2 = (“MAPQ of the hifi reads”)
hist!(ax2, df2)
CairoMakie.save(“MAPG.svg”,fig)
fig2
end

and my df2 is a single column DataFrame of Int64

I don’t understand why it wouldn’t be iterable. Or why it’s “abstract”?

Thank you

algunion · August 2, 2023, 12:45pm

Iterating over a data frame without explicitly stating if you iterate over rows or columns can create confusion. Although I think we can agree that iterating over rows might be more frequent than iterating over columns, I think the decision to require the explicit use of eachrow or eachcol is a sane one.

You can try this on your end (will output the same error as the one you reported):

# fails
for x in df2
    println(x)
end

However, this works (altogether with eachcol):

# works
for x in eachrow(df2)
    println(x)
end

Now, related to your code snippet, it will fail even if you actually use eachrow instead of passing the data frame: because iterating in that way will produce DataFrameRow elements - and Makie will not do the extra work to detect that in fact you are using a single column and guess your intention of using those values.

To fix this, you’ll need to actually pass the column values:

hist!(ax2, df2.yourcolumnname)

You can use use df2[:, 1] if you want to just use the first column without bothering with the name.

Now, related to:

It is not abstract, but it has an abstract supertype.

Try to run this on your end: isa(df2, AbstractDataFrame). You’ll see that it evaluates to true.

Now, there is a Base.iterate method that is implemented to throw a very informative error (pointing towards eachrow and eachcol usage).

Instead of implementing the method for each concrete type that is a subtype of AbstractDataFrame, the method is implemented for the parent abstract type.

I hope this helps.

Axze-rgb · August 2, 2023, 1:20pm

Thank you and indeed it solved the problem. I was confused because none of the examples in the Makie tutorial have to do that. That being said, I get a white image. The dataframe is 2 million points. It should be able to handle that. I switched to a kernel density, and same, computation is really slow and results in a blank figure.

Therefore, I subsampled my dataframe to 100 (one hundred) and tried to plot an histogram. Bur alas, still just a blank figure. Even if Makie is not the best for millions of points, we agree that 100 points should not be outside of its capabilities, right?

I am really sorry, I feel like a total idiot bringing you questions after questions… Is there a way to contribute and give back to this community? Like a “donate” or something? I would definitely do it at my discretion.

Thanks again

algunion · August 2, 2023, 1:53pm

I am not sure about others but at least in my case, this is my way of giving back to the community (e.g., producing more of the things that I found valuable when starting my Julia journey). So my answer above was actually in the same give-back category.

Now, I understand that there are more ways in which one can give-back. If you are inclined towards a donation, my feeling is that supporting the core language development is a valuable way to spend money. However, I am not knowledgeable enough about the precise ways things work at that level - so maybe somebody with more information can jump in and add more context.

Regarding your additional questions, I think the best way is to create another topic and provide a minimal working example. That will speed up the time needed to solve your problem and be valuable for others that might encounter similar issues (this will aid the search functionality - both here on discourse and from outside search engines).

bkamins · August 2, 2023, 11:22pm

@algunion is 100% correct. We require an explicit call to eachrow or eachcol because different other ecosystems chose one option or the other as a default and users coming to DataFrames.jl were sometimes confused.

joa-quim · August 3, 2023, 1:08am

Sorry for low quality post but doing it from iPad and codespaces.
With GMT.jl it takes half a second to compute the histogram of a million points.

algunion · August 3, 2023, 2:14am

Not sure what is going on there - CairoMakie’s hist! takes under 0.04 seconds for 1M points and 0.2 for 10M.

joa-quim · August 3, 2023, 12:42pm

This brings me a faint recall of something on C histogram in GMT but can’t test it right now.

jules · August 3, 2023, 1:01pm

A histogram does not actually plot the points you give it, just a couple bars, so you should be able to throw almost an arbitrary number of points at it. For me, 100 million take about 3 seconds to render. A white image suggests something else is going on.

With GLMakie you should be able to handle a couple million points at interactive speeds.

Topic		Replies	Views
How to plot histogram subplots of a DataFrame? General Usage question , plotting , dataframes , plots , statsplots	11	3119	March 23, 2022
Mutating function plot in Makie, and plotting 2 data sets on the same figure New to Julia plotting	6	180	September 8, 2023
What is the agreed abstraction for the "size" of finite collections? General Usage question , api , collection	38	1876	December 31, 2019
I do not understand this error: MethodError: no method matching iterate(::DataFrame) General Usage first-steps	8	3301	February 13, 2020
Is there an equivalent of eachindex() for DataFrames? General Usage question , dataframes , type-stability	13	1365	October 21, 2022

AbstractDataFrame is not iterable, I just want to make a histogram

Related topics