Create Histogram of Monthly Return with CSV and DataFrames

Freya_the_Goddess · February 16, 2023, 4:04pm

Hi all,

I am reading this:

and I want to plot a histogram for monthly return of a Shanghai Composite Index based on this code:

using CSV, DataFrames
    
#Read the CSV file and transform it into a DataFrame
    
rain_data = CSV.read("./csv/historico_precipitaciones.csv", DataFrame)
  
#Rename the columns 
colnames = ["Year", "Month", "mm", "Days"]
rename!(rain_data, Symbol.(colnames)) #Symbol is the type of object used to represent the labels of a dataset
    
#We use a dictionary to translate de Month names 
translate = Dict("Enero" => "January" ,"Febrero" => "February" ,"Marzo" => "March" ,"Abril" => "April" ,"Mayo" => "May" ,"Junio" => "June"  ,"Julio" => "July"   ,"Agosto" => "August"  ,"Septiembre" => "September"  ,"Octubre" => "October" ,"Noviembre" => "November" ,"Diciembre" => "December")
    
    for i in 1:length(rain_data[:,:Month])
        rain_data[i,:Month] = translate[rain_data[i,:Month]]
    end

# See the first 8 rows
# first(rain_data, 8)

begin
    histogram(rain_data[:,"mm"], bins=20, legend=false, size=(450, 300))
    title!("Monthly rainfall in Buenos Aires")
    xlabel!("Rainfall (mm)")
    ylabel!("Frequency")
end

But the csv is a monthly return that you can retrieve from

it is Shanghai Composite Index Data.

I want to know can the x axis have negative value too? Since the return can be negative, if the price falls from the previous month.

so just:

y-axis = the frequency (the number of months the return at certain interval occuring)
x-axis = the return in percent.

All is available in the investing.com website.

Thanks!

nilshg · February 16, 2023, 4:56pm

Why shouldn’t the x-axis be able to have negative values?

(Disclaimer: I just used randn for convenience, I am not implying returns are normally distributed, don’t sue me if you implement a trading strategy that assumes this!)

rafael.guerra · February 16, 2023, 7:04pm

You could use the linked site option to download the monthly data in CSV format and then use the customary tools:

using CSV, DataFrames, Plots
file = "Shanghai Composite Historical Data_Monthly.csv"
df = CSV.read(file, DataFrame)
df.Price .= parse.(Float64, replace.(df.Price, "," => ""))
histogram(df.Price)

NB: had to get rid of the comma separator for thousands

Freya_the_Goddess · February 17, 2023, 4:39am

No I won’t sue you, it is on almost all Finance books that assume the return has normal distribution.

Freya_the_Goddess · February 17, 2023, 5:01am

Thanks @rafael.guerra

I was trying to make histogram from the return columns, the title is “Change %”

but it does not working:

using CSV, DataFrames, Plots

file = "./csv/Shanghai Composite Historical Data.csv"
df = CSV.read(file, DataFrame)
df.Change .= parse.(String)

histogram(df.Change, title="Shanghai Composite Index Histogram",
	label="", xlabel="Return (%)", ylabel="Frequency")

Freya_the_Goddess · February 17, 2023, 5:25am

Another one,

I try to make multiple layouts, there are problems:

The resolution is too small, the ylabel is too big, thus the histograms becoming fat
Why the label are wrong even if I put [s1 s2 s3 s4] it only shows s1 and s3 ?
Can I color the histogram? I use color or hist_color they don’t work.

using CSV, DataFrames, Plots

file = "./csv/Shanghai Composite Historical Data.csv"
filedj = "./csv/Dow Jones Industrial Average Historical Data.csv"
filenasdaq = "./csv/NASDAQ Composite Historical Data.csv"
filenikkei = "./csv/Nikkei 225 Historical Data.csv"

df = CSV.read(file, DataFrame)
df2 = CSV.read(filedj, DataFrame)
df3 = CSV.read(filenasdaq, DataFrame)
df4 = CSV.read(filenikkei, DataFrame)

df.Price .= parse.(Float64, replace.(df.Price, "," => ""))
df2.Price .= parse.(Float64, replace.(df2.Price, "," => ""))
df3.Price .= parse.(Float64, replace.(df3.Price, "," => ""))
df4.Price .= parse.(Float64, replace.(df4.Price, "," => ""))

p1 = histogram(df.Price, title="",
	label="", xlabel="", ylabel="")
p2 = histogram(df2.Price, title="",
	label="", xlabel="", ylabel="")
p3 = histogram(df3.Price, title="",
	label="", xlabel="", ylabel="")
p4 = histogram(df4.Price, title="",
	label="", xlabel="", ylabel="")

s1 = "Shanghai Composite Index (SSE)";
s2 = "Dow Jones Industrial Average";
s3 = "NASDAQ";
s4 = "Nikkei 225";

plot(p1, p2, p3, p4, layout = (4, 1), legend=:outerright,
     xaxis = "Price", yaxis = "Frequencies", label=[s1 s2 s3 s4])

Capture d’écran_2023-02-17_12-22-09

rafael.guerra · February 17, 2023, 8:49am

When there are spaces in the dataframes’ column names, we need to use a different syntax:

df[!,"Change %"] .= parse.(Float64, replace.(df[!,"Change %"], "%" => ""))
histogram(df[!,"Change %"])

Regarding the new question about formatting subplots, I recommend you to play with figure size, dpi, font size, margins, and the like, to get the most suitable results. Look for many similar posts on this topic and if nothing works, open a separate thread. As for the labels, you can define them individually for each subplot at the source.

Topic		Replies	Views
Plotting data showing 2 sets of data New to Julia syntax , pluto	1	257	December 7, 2023
DataFrame, aggregate by month of date General Usage dataframes	4	2263	February 2, 2022
Help with dataframes please New to Julia question , dataframes	5	282	August 29, 2024
where does this histogram come from? New to Julia	2	463	January 28, 2019
Most efficient way: create/mutate DataFrame time column General Usage	8	1076	November 28, 2019

Create Histogram of Monthly Return with CSV and DataFrames

Related topics