Question about my data

abcde · August 4, 2020, 8:06am

using CSV,Clustering,Statistics,DataFrames,TimeSeries,MarketData

data=CSV.read("E:\\测试数据\\huizong15min.csv")
heping_1MX_V_35kV=convert(Array,data[:,3])
xiangshan_1MX_V_35kV=convert(Array,data[:,4])
dafeng_1MX_V_110kV=convert(Array,data[:,5])
heping_1MX_V_110kV=convert(Array,data[:,6])
dafeng_dahe_P_220kV=convert(Array,data[:,7])
dafeng_1MX_V_220kV=convert(Array,data[:,8])
heping_1MX_V_220kV=convert(Array,data[:,9])
guilin_MX_V_500kV=convert(Array,data[:,10])
Thermal_Q=convert(Array,data[:,11])
Thermal_P=convert(Array,data[:,12])
Wind_Q=convert(Array,data[:,13])
Wind_P=convert(Array,data[:,14])
Load_Q=convert(Array,data[:,15])
Load_P=convert(Array,data[:,16])
guilin_guidu_P_220kV=convert(Array,data[:,17])
heping_MU_cos=convert(Array,data[:,18])
Water_S=convert(Array,data[:,19])

Why does my.CSV read out as “string”?thanks!

nilshg · August 4, 2020, 8:38am

What is the type of data[:, 16]? Also, why are you doing these conversions to Array? A column of a DataFrame is already a vector, so there’s no need to convert?

It also seems slightly unusual to decompose a DataFrame into a bunch of individual vectors like this, you might want to think about just renaming the columns of the DataFrame so that you can then access them by their names (e.g. data.Thermal_Q)

abcde · August 4, 2020, 8:42am

This is data in hours, and I’m going to convert each of its columns into data in days (24 hours), which I say is the only way to facilitate data conversion

abcde · August 4, 2020, 8:45am

And how do you convert a vector of type String to Float64

nilshg · August 4, 2020, 8:46am

But presumably this could be done in DataFrames as well, e.g.

combine(groupby(data, :Day), :Thermal_P => sum => :Thermal_P)

(assuming that your hour-to-day conversion is just summing over all hours, and that you have a column Day that gives you the day for each hour)

nilshg · August 4, 2020, 8:47am

parse.(Float64, data.Load_P)

But I’d have a look at your csv file - it’s likely that Excel has screwed up the formatting which is preventing CSV.jl from correctly reading the numbers. Open it up in Excel again and set the formatting to “General”, then save and reload with CSV, it should correctly parse the numbers (assuming that you don’t have non-numerical data in the same column)

abcde · August 4, 2020, 8:50am

I need all the data, just rearrange them（a->{M1} into a->{N24}）

abcde · August 4, 2020, 8:57am

It’s still not in the right format

nilshg · August 4, 2020, 8:58am

That’s unfortunate, but also pretty much impossible to debug remotely unless you can share the csv file or some dummy data that reproduces your issue.

abcde · August 4, 2020, 9:01am

How can I send my data to you?

nilshg · August 4, 2020, 9:03am

Depends on how large it is and whether you can make it public or not - you could just upload it on GitHub or some other filesharing service, or if you can’t make it public and it’s not too large I can DM you an email address to send it to

abcde · August 4, 2020, 9:04am

Ok, I will send it to your email，thankyou

abcde · August 4, 2020, 9:05am

Would you mind giving me your E-mail address？

nilshg · August 4, 2020, 9:05am

Sent you a DM

abcde · August 4, 2020, 9:11am

I have sent it to you,

nilshg · August 4, 2020, 9:21am

Thanks - the issue you have is that there is one row in your data where missing values are encoded not with an empty cell, but with a single whitespace character. The simplest thing to do is to just use Ctrl+H (find/replace) in Excel to replace " " in your file with "" (i.e. put a single empty character in the top box and nothing in the bottom box). This should make 12 replacements in your file, and if you then do

df = DataFrame(CSV.File("huizong15min.csv"))

you should get all numerical columns (although note that you’ll have Union{Float64, Missing} data type, as there are missing values now, you can use dropmissing!(df) to get rid of those)

abcde · August 4, 2020, 9:54am

I did as you asked, but nothing seems to have changed,And it’s not found in Excel " "

nilshg · August 4, 2020, 9:56am

The problem is in row 17,702 (timestamp 43648.38542), where almost all of your data is missing. Just manually delete the content of the empty cells (or the whole row if you would drop it afterwards anyway)

abcde · August 4, 2020, 9:58am

Sorry, that was my formatting error, I have corrected it, that column should be the date

abcde · August 4, 2020, 9:59am

Topic		Replies	Views
DataFrames/CSV: how to read vectors from *.csv? General Usage	9	2848	March 26, 2021
Read file with CSV.read New to Julia	8	19791	September 9, 2019
Tidying up a csv file (follow-up to question 53261/4) New to Julia dates , csv	9	1223	January 14, 2021
Save Dataframe in file and read it again General Usage question	4	3811	May 28, 2020
Outputing/Inputing vectors in DataFrames General Usage dataframes , csv	2	470	January 17, 2023

Question about my data

Related topics