I am plotting a simple x,y line plot. See the partial plot I posted in reply to the other post
So, keeping it simple. For example:
julia> using Plots
julia> x = 0:1e-6:2Ļ ; y = sin.(x); # ~2Ļ million points
julia> plot(layout=(2,1))
julia> plot!(x,y,linewidth=2,subplot=1)
julia> sample = 1:1000:length(x);
julia> plot!(x[sample],y[sample],linewidth=2,subplot=2)
Gives you two essentially identical plots, but the second one has 1/1000 of the points:
You perhaps need a smarter sampling strategy if some part of the plot doesnāt look like it should, but that can even be done āby handā.
Yes, this is the first thing I tried but it doesnāt preserve the shape of the data. Do you have any suggestions for a smarter sampling strategy?
I was thinking that if you are ok with that syntax, you could simply concatenate by hand different sampling frequencies tuned to the data variation in each region. That, of course, because it seems that you are only doing this for a single figure. For a more automated and sophisticated solution, we should implement one smarter sampling method as suggested above.
If your plot is log-spaced, your sampling indices should likewise be log-spaced using something like this:
julia> function logsample(x, y, N)
length(x) == length(y) || ArgumentError("Mismatched input sizes")
idx = unique([round(Int, exp10(v)) for v in LinRange(0, log10(length(x)), N)])
x[idx], y[idx]
end
I recall people being satisfied with the publication quality graphs they constructed using PyPlot.jl. It wraps matplotlib, which is a well known and highly developed Python plotting package.
I donāt understand why GR should not produce publication quality plots. Could you provide a MWE?
I suspect itās because that plot makes a heatmap (with a very colorful color map), which doesnāt reproduce well in print. For the 2D line data of OP a (simplified) line plot would probably be best in terms of plot quality.
Yes, PyPlot is nice but it canāt plot 10 billion data points efficiently.
GR.shade is producing a grainy curve since it is sort of a heatmap of the data
GMT letās you plot as many billion points as you want, but itās constrained to the same memory conditions as the others so you would have to do it by chunks and expect ghostscript
is able to convert the huge resulting postscript file.
However, I maintain that that no-one is able to see billions of points in a plot so donāt understand this request.
One question then is how long are you willing to wait for the plot to finish, and how often do you need to repeat it? Or is the memory requirement the issue (as PyPlot canāt stream the data, donāt know actually)?
The goal is probably not to directly show all billions of points, but to produce faithful visual representation of the underlying data that shows the overall trajectory in a certain amount of detail. And the challenge here is that the underlying data isnāt straightforward to subsample/decimate without losing the overall shape of the curve. Or at least, it takes a specific type of decimation algorithm, like one of the algorithms/papers mentioned earlier (or something like CGAL 5.6 - 2D Polyline Simplification: User Manual). And plotting as a heat-map doesnāt make much sense for a trajectory.
So brute-forcing plotting the 2D line segments can still be the best approach as it is simple (but somewhat time-consuming), while not introducing any smoothing artifacts that decimation would most likely introduce.
Also, I somewhat feel that doing a simulation at such a high resolution and saving that data, and then not looking at the evolution in its full resolution seems a waste. Usually only a subset of simulation timesteps is saved (mostly due to storage restrictions), which provide enough insight into the system behaviour, and then OPās problem would simply not exist. But apparently thereās a reason here to save such a high number of time steps and visualize them.
Edit: bad grammar fixes, missing word, clearified question
Sure, but the point is that they canāt be visualized all at same time.
That depends on your definition of āat the same timeā. If you take those billions of points and represent them with a heat map (which would involve mapping/plotting them all) then the plot still shows them āall at the same timeā And making a heatmap of a subset of the data will most likely lead to a different plot output (depending on the plot setup). But I agree that for the goal of providing an overview of the data itās not very useful to directly plot all those points.
Iāve seen examples of N-body simulations of billions of mass particles where an interactive viewer was built to be able to visualize both the high-level structure, as well as being able to zoom down to the individual point level. Yet, such exploratory visualization doesnāt seem to be the goal here.
Wow, that should be a daunting job. I can only admire and envy someone has a tool like this.
I would try something like this, basically I go over the vectors and add indices for plotting only when the angle changes significantly, so that if many points follow the same direction they are skipped. You might want to add a minimum step to make sure you reduce enough and also do something to deal with the log-scale.
function downsample(x, y, tol)
idx = [1]
last_idx = 1
last_angle = 0
for i in 2:length(y) #don't use OffsetArray ;)
Ī“y = y[i] - y[i-1]
Ī“x = x[i] - x[i-1]
angle = atan(Ī“y, Ī“x)
if abs(last_angle - angle) > tol
push!(idx, i)
last_idx = i
last_angle = angle
end
end
idx
end
tol = 0.1
x = LinRange(0,2Ļ, 10_000)
y = @. sin(x) + cos(2x)^4
idx = downsample(x, y, tol)
@show length(idx)
plot(x = x[idx], y = y[idx], Geom.line, Geom.point)
That is a nice idea. From quick test below, using the derivative instead of atan may provide better coverage for the same level of data reduction?
Code
function downsample(x, y, tol)
idx = [1]
last_idx = 1
last_angle = 0
for i in 2:length(y) #don't use OffsetArray ;)
Ī“y = y[i] - y[i-1]
Ī“x = x[i] - x[i-1]
angle = atan(Ī“y, Ī“x)
if abs(last_angle - angle) > tol
push!(idx, i)
last_idx = i
last_angle = angle
end
end
idx
end
function downsample2(x, y, tol)
idx = [1]
last_idx = 1
last_derivative = 0
for i in 2:length(y) #don't use OffsetArray ;)
Ī“y = y[i] - y[i-1]
Ī“x = x[i] - x[i-1]
Ī“x==0 ? deriv = Inf : deriv = Ī“y/Ī“x
if abs(last_derivative - deriv) > tol
push!(idx, i)
last_idx = i
last_derivative = deriv
end
end
idx
end
tol = 0.1
x = LinRange(0,2Ļ, 10_000)
y = @. 2x * sin(x)^2 + 4x * cos(2x)^4
idx = downsample(x, y, 0.1)
idx2 = downsample2(x, y, 2.6)
using Plots
p1 = plot(x,y, c=:blue)
plot!(x[idx], y[idx], marker=:o, ms=2, ls=:dash, c=:red, title="atan angle ($(length(idx)) points)")
p2 = plot(x,y, c=:blue)
plot!(x[idx2], y[idx2], marker=:o, ms=2, ls=:dash, c=:green, title="derivative ($(length(idx2)) points)")
plot(p1,p2, size=(1000,500))
In any case, for examples like this, the use of FFT-based data compression is more common?
This blog post sounds relevant Drawing 2.7 billion Points in 10s | by Simon Danisch | HackerNoon.com | Medium
Guys, those Douglas-Peucker guys (sorry Ramer, I didnāt know about you) have worked on this
using GMT
x = LinRange(0,2Ļ, 10_000);
y = @. 2x * sin(x)^2 + 4x * cos(2x)^4;
xy = gmtsimplify([x y], tol=0.005);
imshow(xy, marker=:point, lw=0.5, figsize=(12,12), title="Douglas-Peucker ($(size(xy,1)) points)")
Very ingenious but since the image has 600 x 960 pixels and continents are about 1/3 of Earth surface and more than half of if have no points, it turns out that of those 2.7 bilions points we are only seeing 600 * 960 / 3 / 2 ~ 100_000
of them.