I’m trying to create a “long” version data frame (e.g. several observations per individual as in a longitudinal study). However, the data frame for each individual will depend on certain parameters - not the same operation for all N individuals; some will have more rows, some fewer. I was able to do it by creating a blank data frame and then continually updating it within a for loop using vcat.
In terms of pseudocode, this is how it currently looks:
df = DataFrame(ID = Int64[], nᵢ = Int64[], ...) # Define the actual data frame I want
for i in 1:N
dftmp = DataFrame(ID = i, nᵢ = Int64[], ...) # Create a temporary data frame (same args as df)
.
.
.
df = vcat(df, dftmp) # Combine dftmp with df (i.e. Update df)
end
However, I feel like there must be a more efficient way of doing this and wanted guidance. I come from an R background and have manipulated data frames using tidyverse.
However, I am not sure how the command below works:
[DataFrame(ID = i], , nᵢ = ...) for i in 1:N])
In addition, the source data frames will have the same rows but different sets of columns.
Pseudocode:
for i in 1:N
dftmp = DataFrame(ID = i, nᵢ = Int64[], ...) # Create a temporary data frame (same args as df)
.
if (arg is true)
dftmp = ...
else
dftmp = ...
end
df = vcat(df, dftmp) # Combine dftmp with df (i.e. Update df)
end
Is there a better way to do the inside ifelse loop?
OP used ... as a placeholder for some other operations so I re-used it. The code was not runnable clearly.
the source data frames will have the same rows but different sets of columns.
As I have commented above please read vcat documentation. I am copying part of the documentation that is relevant:
The cols keyword argument determines the columns of the returned data frame:
• :setequal: require all data frames to have the same column names disregarding order. If they
appear in different orders, the order of the first provided data frame is used.
• :orderequal: require all data frames to have the same column names and in the same order.
• :intersect: only the columns present in all provided data frames are kept. If the
intersection is empty, an empty data frame is returned.
• :union: columns present in at least one of the provided data frames are kept. Columns not
present in some data frames are filled with missing where necessary.
• A vector of Symbols or strings: only listed columns are kept. Columns not present in some
data frames are filled with missing where necessary.
Now regarding your question:
Pseudocode:
Please share full code if you want a full working code in response.
[DataFrame(ID = i], , nᵢ = ...) for i in 1:N] is a comprehension.
Alternatively you can define e.g. a function taking one argument (i):
function gen_df(i)
...
end
and then use broadcasting like this:
reduce(vcat, gen_df.(1:N), cols=:union) # I use :union as I understand you want union of the columns