Question about my data

using CSV,Clustering,Statistics,DataFrames,TimeSeries,MarketData

data=CSV.read("E:\\测试数据\\huizong15min.csv")
heping_1MX_V_35kV=convert(Array,data[:,3])
xiangshan_1MX_V_35kV=convert(Array,data[:,4])
dafeng_1MX_V_110kV=convert(Array,data[:,5])
heping_1MX_V_110kV=convert(Array,data[:,6])
dafeng_dahe_P_220kV=convert(Array,data[:,7])
dafeng_1MX_V_220kV=convert(Array,data[:,8])
heping_1MX_V_220kV=convert(Array,data[:,9])
guilin_MX_V_500kV=convert(Array,data[:,10])
Thermal_Q=convert(Array,data[:,11])
Thermal_P=convert(Array,data[:,12])
Wind_Q=convert(Array,data[:,13])
Wind_P=convert(Array,data[:,14])
Load_Q=convert(Array,data[:,15])
Load_P=convert(Array,data[:,16])
guilin_guidu_P_220kV=convert(Array,data[:,17])
heping_MU_cos=convert(Array,data[:,18])
Water_S=convert(Array,data[:,19])


Why does my.CSV read out as “string”?thanks!

What is the type of data[:, 16]? Also, why are you doing these conversions to Array? A column of a DataFrame is already a vector, so there’s no need to convert?

It also seems slightly unusual to decompose a DataFrame into a bunch of individual vectors like this, you might want to think about just renaming the columns of the DataFrame so that you can then access them by their names (e.g. data.Thermal_Q)

1 Like

This is data in hours, and I’m going to convert each of its columns into data in days (24 hours), which I say is the only way to facilitate data conversion

And how do you convert a vector of type String to Float64

But presumably this could be done in DataFrames as well, e.g.

combine(groupby(data, :Day), :Thermal_P => sum => :Thermal_P)

(assuming that your hour-to-day conversion is just summing over all hours, and that you have a column Day that gives you the day for each hour)

parse.(Float64, data.Load_P)

But I’d have a look at your csv file - it’s likely that Excel has screwed up the formatting which is preventing CSV.jl from correctly reading the numbers. Open it up in Excel again and set the formatting to “General”, then save and reload with CSV, it should correctly parse the numbers (assuming that you don’t have non-numerical data in the same column)

1 Like

I need all the data, just rearrange them(a->{M1} into a->{N24})

It’s still not in the right format :persevere:

That’s unfortunate, but also pretty much impossible to debug remotely unless you can share the csv file or some dummy data that reproduces your issue.

How can I send my data to you?

Depends on how large it is and whether you can make it public or not - you could just upload it on GitHub or some other filesharing service, or if you can’t make it public and it’s not too large I can DM you an email address to send it to

Ok, I will send it to your email,thankyou

Would you mind giving me your E-mail address?

Sent you a DM

I have sent it to you, :grin:

Thanks - the issue you have is that there is one row in your data where missing values are encoded not with an empty cell, but with a single whitespace character. The simplest thing to do is to just use Ctrl+H (find/replace) in Excel to replace " " in your file with "" (i.e. put a single empty character in the top box and nothing in the bottom box). This should make 12 replacements in your file, and if you then do

df = DataFrame(CSV.File("huizong15min.csv"))

you should get all numerical columns (although note that you’ll have Union{Float64, Missing} data type, as there are missing values now, you can use dropmissing!(df) to get rid of those)

1 Like

I did as you asked, but nothing seems to have changed,And it’s not found in Excel " "

The problem is in row 17,702 (timestamp 43648.38542), where almost all of your data is missing. Just manually delete the content of the empty cells (or the whole row if you would drop it afterwards anyway)

Sorry, that was my formatting error, I have corrected it, that column should be the date