Need to plot high resolution time series data

I have successfully created a time series plot using timeseries.jl, with Version 1.5.3 (2020-11-09), in Juno, installed with JuliaPro, with the following code

ATTEMPT 1:

using IterableTables
using DataFrames
using CSV
using Dates
using TimeSeries
using Plots


myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft))
println(first(df,10))

ta = TimeArray(df; timestamp = :Date)
println(colnames(ta))
display(plot(ta[:Col3]))

plot omitted as new user cannot add more than one media

with the following output in my REPL

10Γ—5 DataFrame
β”‚ Row β”‚ Date                β”‚ Col1    β”‚ Col2    β”‚ Col3    β”‚ Col4    β”‚
β”‚     β”‚ DateTime            β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2020-08-10T00:00:00 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 2   β”‚ 2020-08-10T00:01:00 β”‚ 507.29  β”‚ 181.34  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 3   β”‚ 2020-08-10T00:02:00 β”‚ 507.27  β”‚ 181.34  β”‚ 1532.94 β”‚ 183.16  β”‚
β”‚ 4   β”‚ 2020-08-10T00:03:00 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 5   β”‚ 2020-08-10T00:04:00 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 6   β”‚ 2020-08-10T00:05:00 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 7   β”‚ 2020-08-10T00:06:00 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 8   β”‚ 2020-08-10T00:07:00 β”‚ 507.28  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 9   β”‚ 2020-08-10T00:08:00 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 10  β”‚ 2020-08-10T00:09:00 β”‚ 507.28  β”‚ 181.32  β”‚ 1532.96 β”‚ 183.16  β”‚
[:Col1, :Col2, :Col3, :Col4]

Unfortunately, it came out as an image where if I zoom the resolution is not high as can be seen below.

image

WHAT I LIKE TO ACHIEVE:

Ideally, I would prefer a high resolution image as below which i can zoomed in properly using Shift and left mouse button.

Image omitted as new user cannot have more than one image

the dataframe for the above image looks like below.

julia> print(first(mydf2,10))
10Γ—8 DataFrame
β”‚ Row β”‚ ticker β”‚ timestamp  β”‚ Open    β”‚ High    β”‚ Low     β”‚ Close   β”‚ AdjClose β”‚ Volume    β”‚
β”‚     β”‚ String β”‚ Date       β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚ Float64  β”‚ Float64   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ MSFT   β”‚ 2010-12-27 β”‚ 28.12   β”‚ 28.2    β”‚ 27.88   β”‚ 28.07   β”‚ 22.3176  β”‚ 2.16528e7 β”‚
β”‚ 2   β”‚ MSFT   β”‚ 2010-12-28 β”‚ 27.97   β”‚ 28.17   β”‚ 27.96   β”‚ 28.01   β”‚ 22.2699  β”‚ 2.30422e7 β”‚
β”‚ 3   β”‚ MSFT   β”‚ 2010-12-29 β”‚ 27.94   β”‚ 28.12   β”‚ 27.88   β”‚ 27.97   β”‚ 22.2381  β”‚ 1.95025e7 β”‚
β”‚ 4   β”‚ MSFT   β”‚ 2010-12-30 β”‚ 27.92   β”‚ 28.0    β”‚ 27.78   β”‚ 27.85   β”‚ 22.1427  β”‚ 2.07861e7 β”‚
β”‚ 5   β”‚ MSFT   β”‚ 2010-12-31 β”‚ 27.8    β”‚ 27.92   β”‚ 27.63   β”‚ 27.91   β”‚ 22.1904  β”‚ 2.4752e7  β”‚
β”‚ 6   β”‚ MSFT   β”‚ 2011-01-03 β”‚ 28.05   β”‚ 28.18   β”‚ 27.92   β”‚ 27.98   β”‚ 22.2461  β”‚ 5.34438e7 β”‚
β”‚ 7   β”‚ MSFT   β”‚ 2011-01-04 β”‚ 27.94   β”‚ 28.17   β”‚ 27.85   β”‚ 28.09   β”‚ 22.3335  β”‚ 5.44056e7 β”‚
β”‚ 8   β”‚ MSFT   β”‚ 2011-01-05 β”‚ 27.9    β”‚ 28.01   β”‚ 27.77   β”‚ 28.0    β”‚ 22.262   β”‚ 5.89987e7 β”‚
β”‚ 9   β”‚ MSFT   β”‚ 2011-01-06 β”‚ 28.04   β”‚ 28.85   β”‚ 27.86   β”‚ 28.82   β”‚ 22.9139  β”‚ 8.80263e7 β”‚
β”‚ 10  β”‚ MSFT   β”‚ 2011-01-07 β”‚ 28.64   β”‚ 28.74   β”‚ 28.25   β”‚ 28.6    β”‚ 22.739   β”‚ 7.3762e7  β”‚

using data from MarketData.jl with the following code to plot:

using Gadfly
display(plot(mydf2,x="timestamp",y="AdjClose", Geom.line))

ATTEMPT 2:

I tried with my first dataseries to achieve similar results, just ignoring the TimeArray (since it didnt help in Attempt 1), and got the following error

myfile="test2.csv"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile)))
println(first(df,10))
display(plot(df,x="Date",y="Col3", Geom.line))

I got the following dataframe and error message:

 10Γ—5 DataFrame
β”‚ Row β”‚ Date                β”‚ Col1    β”‚ Col2    β”‚ Col3    β”‚ Col4    β”‚
β”‚     β”‚ DateTime            β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2020-08-10T00:00:00 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 2   β”‚ 2020-08-10T00:01:00 β”‚ 507.29  β”‚ 181.34  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 3   β”‚ 2020-08-10T00:02:00 β”‚ 507.27  β”‚ 181.34  β”‚ 1532.94 β”‚ 183.16  β”‚
β”‚ 4   β”‚ 2020-08-10T00:03:00 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 5   β”‚ 2020-08-10T00:04:00 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 6   β”‚ 2020-08-10T00:05:00 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 7   β”‚ 2020-08-10T00:06:00 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 8   β”‚ 2020-08-10T00:07:00 β”‚ 507.28  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 9   β”‚ 2020-08-10T00:08:00 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 10  β”‚ 2020-08-10T00:09:00 β”‚ 507.28  β”‚ 181.32  β”‚ 1532.96 β”‚ 183.16  β”‚
ERROR: LoadError: Cannot convert DataFrame to series data for plotting

ATTEMPT 3:

Since it is in DateTime format, I wonder why that is an issue. Ok so I tried something different now, not changing the format when loading the data, and still not using the TimeArray:

myfile="test2.csv"
# dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile))) # dateformat=dmft removed
println(first(df,10))

display(plot(df,x="Date",y="Col3", Geom.line))

but I still got this result:

10Γ—5 DataFrame
β”‚ Row β”‚ Date           β”‚ Col1    β”‚ Col2    β”‚ Col3    β”‚ Col4    β”‚
β”‚     β”‚ String         β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 10/8/2020 0:00 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 2   β”‚ 10/8/2020 0:01 β”‚ 507.29  β”‚ 181.34  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 3   β”‚ 10/8/2020 0:02 β”‚ 507.27  β”‚ 181.34  β”‚ 1532.94 β”‚ 183.16  β”‚
β”‚ 4   β”‚ 10/8/2020 0:03 β”‚ 507.28  β”‚ 181.34  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 5   β”‚ 10/8/2020 0:04 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.97 β”‚ 183.16  β”‚
β”‚ 6   β”‚ 10/8/2020 0:05 β”‚ 507.29  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 7   β”‚ 10/8/2020 0:06 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 8   β”‚ 10/8/2020 0:07 β”‚ 507.28  β”‚ 181.33  β”‚ 1532.96 β”‚ 183.16  β”‚
β”‚ 9   β”‚ 10/8/2020 0:08 β”‚ 507.27  β”‚ 181.33  β”‚ 1532.95 β”‚ 183.16  β”‚
β”‚ 10  β”‚ 10/8/2020 0:09 β”‚ 507.28  β”‚ 181.32  β”‚ 1532.96 β”‚ 183.16  β”‚
ERROR: LoadError: Cannot convert DataFrame to series data for plotting

I suspect the issue is with the Date or DateTime, but I haven’t been able to nail it down. There was a post on plotting the time series data, but using String instead. Gadfly.jl : How to plot date time based? resulting in my attempt below:

ATTEMPT 4:

myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft)) # historical data for the ticker

dt = Array(df.Date)
dt_str = Array(String,length(dt))
for i=1:length(dt)
    dt_str[i] = string(dt[i]);
end

with the following error message:

ERROR: LoadError: MethodError: no method matching Array(::Type{String}, ::Int64)

This is a small snippet of my csv, in case you want to try it out.

Date,Col1,Col2,Col3,Col4
10/8/2020 0:00,507.28,181.34,1532.96,183.16
10/8/2020 0:01,507.29,181.34,1532.95,183.16
10/8/2020 0:02,507.27,181.34,1532.94,183.16
10/8/2020 0:03,507.28,181.34,1532.97,183.16
10/8/2020 0:04,507.29,181.33,1532.97,183.16
10/8/2020 0:05,507.29,181.33,1532.96,183.16
10/8/2020 0:06,507.27,181.33,1532.95,183.16
10/8/2020 0:07,507.28,181.33,1532.96,183.16
10/8/2020 0:08,507.27,181.33,1532.95,183.16
10/8/2020 0:09,507.28,181.32,1532.96,183.16
10/8/2020 0:10,507.29,181.32,1532.97,183.16
10/8/2020 0:11,507.28,181.33,1532.94,183.16
10/8/2020 0:12,507.27,181.33,1532.96,183.16
10/8/2020 0:13,507.31,181.33,1532.96,183.17

I am a newcomer to Julia, any beginner’s level guide is most appreciated.

1 Like

That’s a very long post but it seems to me that it has nothing to do with time series, the structure of the underlying DataFrame you’re plotting etc. and is simply a question about either plotting in high resolution, and/or zooming plots?

I’ve never used Gadfly, but can’t you just output a vector format like svg? E.g.

using Plots

savefig(plot(rand(100)), "out.svg")

when I open this in my web browser and zoom to 800% I get:

If your issue is specifically with zooming into a plot in the Juno plot pane, there’s an old issue here which seems to suggest that this should be working, if it isn’t it might be worth coming up with a smaller minimum working example that illustrates your problem (your post above has a lot of other issues like incorrectly calling plot, incorrectly assigning elements to vectors of a different type etc. which are completely unrelated to issues around zooming in the Juno plot pane and therefore distract from what I think you intended the topic of this conversation to be).

3 Likes

Thanks nilshg!

  1. I don’t have the issue with Juno plot pane. Under WHAT I LIKE TO ACHIEVE, I was able to do that in Juno. The datafarme has been done correctly, using Date. The funny thing, is why is it that when I used DateTime it doesn’t work? And how can I make it to work?

  2. The issue here is that the plot is rendered as image. I did the svg and this is what I get. Not very appealing right? All the high resolution data got clustered. The data is produced every minute or less. I need to be able to zoom in to each data point.
    image

Once it is rendered as an image which is what TimeSeries.jl does as opposed to plotting using plotly or gladfly (or whetver other backend engines), then I lose the ability to zoom into the plot.

  1. You hit the nail on the head, I am sure I have made plenty of mistakes here, I have started to learn Julia only a few days ago. I was hoping for some pointers on how to do the dataframe properly with Julia.

  2. As long as it is high resolution and not rendered as an image, I am fine whether it is plotly or gladfly or others.

  3. Yes, the post is long. Since that doesnt help, just ignore my codes then. At the end of the post, I have supplied a short csv if anyone doesn’t mind showing me how it is suppose to be done correctly. Here it is again.

β€œβ€"
Date,Col1,Col2,Col3,Col4
10/8/2020 0:00,507.28,181.34,1532.96,183.16
10/8/2020 0:01,507.29,181.34,1532.95,183.16
10/8/2020 0:02,507.27,181.34,1532.94,183.16
10/8/2020 0:03,507.28,181.34,1532.97,183.16
10/8/2020 0:04,507.29,181.33,1532.97,183.16
10/8/2020 0:05,507.29,181.33,1532.96,183.16
10/8/2020 0:06,507.27,181.33,1532.95,183.16
10/8/2020 0:07,507.28,181.33,1532.96,183.16
10/8/2020 0:08,507.27,181.33,1532.95,183.16
10/8/2020 0:09,507.28,181.32,1532.96,183.16
10/8/2020 0:10,507.29,181.32,1532.97,183.16
10/8/2020 0:11,507.28,181.33,1532.94,183.16
10/8/2020 0:12,507.27,181.33,1532.96,183.16
10/8/2020 0:13,507.31,181.33,1532.96,183.17
β€œβ€"

I don’t think it is very clear what your problem is. But if it is being able to zoom into the interactive plot, in Linux at least you can use the pyplot backend, which opens a more option-rich window allowing you to zoom into the plot:

julia> using Plots

julia> pyplot()
Plots.PyPlotBackend()

julia> plot(rand(10),rand(10))

The problem is that the plot is rendered as image. Then I am at the mercy of the resolution of the image which needs to cater for at least (60 x 24 = 3600) data points per day, usually more.

Under the section of WHAT I LIKE TO ACHIEVE, i can successfully make a plot which is not an image and then I can zoom in to the data that is per minute.

BUT, I dont know how to do this with my data, I have shared the csv at the end of my post.

I am not using linux but it seems the plot above is an image? Then unless it is super high resolution, then it will still cluster the data which is at least 3600 data points per day.

What is high resolution? I can zoom in to the each data point, and there are thousands per day.

The above is a interactive windows, and you can zoom as much as you want.

But you can also do that with SVG images, as suggested above. Those are vector images, so every point is an object, you can zoom as much as you want too.

What we do not understand is how does this has anything to do with your data being of one type or the other. Those are properties of the way you generate the plot, not of the data.

Ok, let me see if I get this right. I tried using your code, and the Plots pane shows the plot okey. Then when I zoom in, I see this. Which means that is rendered as an image. And this is just with a handful of points. Imagine if I have thousands in close proximity.

If you zoom in do you see it at such low resolution or it remains sharp?

image

On the data type, I am baffled as well why it doesn’t work when I change data type. Obviously I did something wrong but I dont know what. I have shared the error and the data and everything.

That is a property of the application generating the plot visualization. In Linux and with the default pyplot visualizer I can zoom indefinitely. So, I do not think that you need to change anything in your Julia code in what concerns this issue, but try to visualize the plot with other software. One alternative is to save the plot as PDF and open it in your preferred PDF viewer.

Ah I see.
I think saving it as pdf or image, unless I can make it super high resolution, only then it will work.
I will try something else.

I just used Gadfly and it is awesome, if only I can work with my DateTime data without getting the error message.
I zoomed in, and everytime I zoomed in, it renders the plot again, so no issue.

image

1 Like

SVGs and PDFs are scallabe vector graphics, they don’t have a β€œresolution”. You can zoom them as much as you want. (Both can have embebbed pixel images, but that is not the case of the plots here).

2 Likes

That should be:

dt_str = Vector{String}(undef, length(dt))
1 Like

Thanks! Lots for me to learn :slight_smile:

This one now works fine. Not sure why I didn’t get this right yesterday

myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile)))
println(first(df,10))
using Gadfly
display(plot(df2, x="Date", y="Col3", Guide.xticks(label=false), Geom.line, Theme(grid_line_width=0mm)))

image

EDIT:
I think I was confused because there is a post that says the DateTime must be in String for plotting. That is not true. That led me down the rabbit hole. Here it is with plotly. All is good now.

using IterableTables
using DataFrames
using CSV
using Dates
using Plots
myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft))
println(first(df,10))
df2 = filter(row -> row[:Date] <= Dates.DateTime("2020-10-15T00:06:00"), df)
plotly()
using StatsPlots
@df df plot(:Date, :Col3)

2 Likes