Hi,
I am trying to read the rownames and some specific column of a CSV file (to save RAM) so I use the “select” option. The problem with this option is that rownames are dropped, so I want to include the first column using a column index because this column as not always the same name.
So my question is : how is it possible to read the rownames and other columns without reading the whole dataframe ?
One possibility is to do that in two steps and then join a dataframe with the rownames with a second dataframe with the columns selected by names. But maybe there is a better solution ?
Thanks for your advices !
julia> DataFrame(CSV.File("test.csv"))
2×4 DataFrame
Row │ variable_name col2 col3 col4
│ String Int64 Int64 Int64
─────┼────────────────────────────────────
1 │ A 1 2 3
2 │ B 4 5 6
# this is the result I want to obtain,
# but I want to use the name of the column3 because this column is not always at position 3...
julia> DataFrame(CSV.File("test.csv", select=[1,3]))
2×2 DataFrame
Row │ variable_name col3
│ String Int64
─────┼──────────────────────
1 │ A 2
2 │ B 5
# now I want to read the colum3 and the rownames.
# The problem is that the first column has not always the same name so I use the column position 1 to obtain the rownames
julia> DataFrame(CSV.File("test.csv", select=[1,:col3]))
ERROR: `select` keyword argument must be an `AbstractVector` of `Int`, `Symbol`, `String`, or `Bool`, or a selector function of the form `(i, name) -> keep::Bool`
# my 2 steps solution to the problem, but maybe there is a better one...
julia> col1 = DataFrame(CSV.File("test.csv", select=[1]))
2×1 DataFrame
Row │ variable_name
│ String
─────┼───────────────
1 │ A
2 │ B
julia> df2 = DataFrame(CSV.File("test.csv", select=[:col3]))
2×1 DataFrame
Row │ col3
│ Int64
─────┼───────
1 │ 2
2 │ 5
julia> hcat(col1,df2)
2×2 DataFrame
Row │ variable_name col3
│ String Int64
─────┼──────────────────────
1 │ A 2
2 │ B 5