This is a simple task. Suppose I have a dataframe with name x. And I’d like to extract the columns with name x.name1, x.name3, x.name4, and combine them to create a new data frame using Julia.
I tried the following and got an error message.
newDF = DataFrame[newDF.name1=x.name1, newDF.name2=x.name3, newDF.name3=x.name4]
Can anyone let me know how to fix this?
Moreover, is this a fairly efficient way to handle large CSV files? Because I got data frame x from a csv file, and eventually, I need to write this newDF into a csv file.
newDF = x[[:name1,:name3,:name4]]
will do the trick.
DataFramesMeta also has some helper functions like @select
that would help with this in more complex scenarios.
Also worth noting that the reasons you got an error: the constructor for DataFrame
needs to be called with parentheses, not square brackets. And once you change the brackets, DataFrame
tries to parse newDF.name1
etc. as Symbol
s to use as column names, but can’t figure them out because they contain dots.
Efficiency-wise, as long as the whole data frame fits in memory, yes, it is fairly efficient.
Thank you for your reply. But in the new data frame newDF
, I’d like the column names to be different than simply name1
, name3
and name4
. I can create the dataframe first and then change it later on. Was wondering, is it possible to change it directly from the very beginning of the construction of the new data frame?
Yes, your code works if you follow @ElOceanografo’s suggestion and change the square brackets to parentheses in the constructor and don’t use periods in your column names. Or if you really want periods, do for example Symbol("df1.name1")
Thank you!