I want to be able to get the numerical data for a single row, say, “Algeria”. This is the function that I wrote for that.
function get_country_data(country)
data_by_country |>
data -> filter(:country => val -> val == country, data) |>
data -> data[1, 2:end] |>
data -> convert(Vector, data)
end;
I try to call the function like this:
get_country_data("Algeria")
But this is the error I’m getting:
ArgumentError: column name :country not found in the data frame
lookupname@index.jl:288[inlined]
getindex@index.jl:297[inlined]
#filter#79@abstractdataframe.jl:1002[inlined]
filter@abstractdataframe.jl:1002[inlined]
(::Main.workspace79.var"#1#5"{String})(::DataFrames.DataFrame)@Other: 3
|>(::DataFrames.DataFrame, ::Main.workspace79.var"#1#5"{String})@operators.jl:834
get_country_data(::String)@Other: 2
top-level scope@Local: 1[inlined]
It looks like this post suffers from the same confusion as your other post about converting from CSV to Parquet: what you are working with after reading in from CSV or Parquet is (most likely) a DataFrame object, which is the same irrespective of how it was constructed (i.e. read from a CSV file, Parquet file, Arrow file,… or indeed constructed “manually” like DataFrame(col1 = rand(10), col2 = rand(10)))
So your title “… retrieve a single row of data from Parquet” is a bit misleading - you are probably asking for a way to select a row from a DataFrame object (unless you are reading your parquet data into some other tabular type, in which case please specify this).
This works by simple indexing like for a standard two-dimensional Julia Array:
df[df."Country/Region" .== "Algeria", :]
will give you all rows for which the Country/Region column has the value Algeria, and all columns in that row.