Create Histogram of Monthly Return with CSV and DataFrames

Hi all,

I am reading this:

and I want to plot a histogram for monthly return of a Shanghai Composite Index based on this code:

using CSV, DataFrames
    
#Read the CSV file and transform it into a DataFrame
    
rain_data = CSV.read("./csv/historico_precipitaciones.csv", DataFrame)
  
#Rename the columns 
colnames = ["Year", "Month", "mm", "Days"]
rename!(rain_data, Symbol.(colnames)) #Symbol is the type of object used to represent the labels of a dataset
    
#We use a dictionary to translate de Month names 
translate = Dict("Enero" => "January" ,"Febrero" => "February" ,"Marzo" => "March" ,"Abril" => "April" ,"Mayo" => "May" ,"Junio" => "June"  ,"Julio" => "July"   ,"Agosto" => "August"  ,"Septiembre" => "September"  ,"Octubre" => "October" ,"Noviembre" => "November" ,"Diciembre" => "December")
    
    for i in 1:length(rain_data[:,:Month])
        rain_data[i,:Month] = translate[rain_data[i,:Month]]
    end

# See the first 8 rows
# first(rain_data, 8)

begin
    histogram(rain_data[:,"mm"], bins=20, legend=false, size=(450, 300))
    title!("Monthly rainfall in Buenos Aires")
    xlabel!("Rainfall (mm)")
    ylabel!("Frequency")
end

But the csv is a monthly return that you can retrieve from :slight_smile:

it is Shanghai Composite Index Data.

I want to know can the x axis have negative value too? Since the return can be negative, if the price falls from the previous month.

so just:

  • y-axis = the frequency (the number of months the return at certain interval occuring)

  • x-axis = the return in percent.

All is available in the investing.com website.

Thanks!

Why shouldn’t the x-axis be able to have negative values?

image

(Disclaimer: I just used randn for convenience, I am not implying returns are normally distributed, don’t sue me if you implement a trading strategy that assumes this!)

You could use the linked site option to download the monthly data in CSV format and then use the customary tools:

using CSV, DataFrames, Plots
file = "Shanghai Composite Historical Data_Monthly.csv"
df = CSV.read(file, DataFrame)
df.Price .= parse.(Float64, replace.(df.Price, "," => ""))
histogram(df.Price)

NB: had to get rid of the comma separator for thousands

1 Like

No I won’t sue you, it is on almost all Finance books that assume the return has normal distribution.

Thanks @rafael.guerra

I was trying to make histogram from the return columns, the title is “Change %”

but it does not working:

using CSV, DataFrames, Plots

file = "./csv/Shanghai Composite Historical Data.csv"
df = CSV.read(file, DataFrame)
df.Change .= parse.(String)

histogram(df.Change, title="Shanghai Composite Index Histogram",
	label="", xlabel="Return (%)", ylabel="Frequency")

Another one,

I try to make multiple layouts, there are problems:

  1. The resolution is too small, the ylabel is too big, thus the histograms becoming fat
  2. Why the label are wrong even if I put [s1 s2 s3 s4] it only shows s1 and s3 ?
  3. Can I color the histogram? I use color or hist_color they don’t work.
using CSV, DataFrames, Plots

file = "./csv/Shanghai Composite Historical Data.csv"
filedj = "./csv/Dow Jones Industrial Average Historical Data.csv"
filenasdaq = "./csv/NASDAQ Composite Historical Data.csv"
filenikkei = "./csv/Nikkei 225 Historical Data.csv"

df = CSV.read(file, DataFrame)
df2 = CSV.read(filedj, DataFrame)
df3 = CSV.read(filenasdaq, DataFrame)
df4 = CSV.read(filenikkei, DataFrame)

df.Price .= parse.(Float64, replace.(df.Price, "," => ""))
df2.Price .= parse.(Float64, replace.(df2.Price, "," => ""))
df3.Price .= parse.(Float64, replace.(df3.Price, "," => ""))
df4.Price .= parse.(Float64, replace.(df4.Price, "," => ""))

p1 = histogram(df.Price, title="",
	label="", xlabel="", ylabel="")
p2 = histogram(df2.Price, title="",
	label="", xlabel="", ylabel="")
p3 = histogram(df3.Price, title="",
	label="", xlabel="", ylabel="")
p4 = histogram(df4.Price, title="",
	label="", xlabel="", ylabel="")

s1 = "Shanghai Composite Index (SSE)";
s2 = "Dow Jones Industrial Average";
s3 = "NASDAQ";
s4 = "Nikkei 225";

plot(p1, p2, p3, p4, layout = (4, 1), legend=:outerright,
     xaxis = "Price", yaxis = "Frequencies", label=[s1 s2 s3 s4])

Capture d’écran_2023-02-17_12-22-09

When there are spaces in the dataframes’ column names, we need to use a different syntax:

df[!,"Change %"] .= parse.(Float64, replace.(df[!,"Change %"], "%" => ""))
histogram(df[!,"Change %"])

Regarding the new question about formatting subplots, I recommend you to play with figure size, dpi, font size, margins, and the like, to get the most suitable results. Look for many similar posts on this topic and if nothing works, open a separate thread. As for the labels, you can define them individually for each subplot at the source.

1 Like