Hello, I have a data frame with 2 columns, each has 2 millions points. When I try to plot it crashes Jupyter. I am surprised because R and Python can handle it.
Here is my command
df2=CSV.read(“/home/alessandro/Data/MAexperiment/24-07-2023_Deepvariant_standing_variation_analysis_based_on_HiFi/RefCall_all_GQ_VAF”, DataFrame; header = false)
rename!(df2,[:RefCall,:VAF,:GQ])
t=Gadfly.plot(df2, x=“GQ”, y=“VAF”, Geom.point())
Is there something I can do? I feel like it should be able to handle it no? Or am I delusional?
Thanks
EDIT: I can plot the data using
using Plots
gr()
@df df2 Plots.scatter(:GQ, :VAF)
but this causes massive performance issues. I suspect therefore there is somethign wrong with Jupyter/Brave browser
EDIT 2: with Python it takes 576 ms to plot
EDIT 3: this works but it’s very slow, in a .jl script, takes around 1 minute. I am aware of time to first plot, but there is no improvement over time, it still takes seconds to plot.
using DataFrames
using CSV
df4=CSV.read(“/home/alessandro/Data/MAexperiment/24-07-2023_Deepvariant_standing_variation_analysis_based_on_HiFi/RefCall_all_GQ_VAF”, DataFrame; header = false)
rename!(df4,[:RefCall,:VAF,:GQ])
using Gadfly
t=Gadfly.plot(df4, x=“GQ”, y=“VAF”,Geom.point, Theme(background_color=color(“white”),grid_color=color(“white”)))
using Cairo
using Fontconfig
t|> PNG(“density.png”,30cm,25cm)
EDIT 4: it also crashes Pluto… So far only VisualStudio is able to render my plot, though very slowly. I don’t believe it’s my computer fault, here is my hardware:
product: Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
64GiB System Memory
My issue is reminiscent of this one: Why is Julia's graphics system so slow? - #32 by evan-wehi
I am not good enough in computing to understand all the implications. Can you tell me if what’s happening is actually normal behaviour and I shouldn’t use Julia for data exploration? I am really puzzled.
Thanks and sorry for the many edits, I kept doing research.