How to use a function to retrieve a single row of data from Parquet

Hi. I have a dataframe like this:

I want to be able to get the numerical data for a single row, say, “Algeria”. This is the function that I wrote for that.

function get_country_data(country)
	data_by_country |>
		data -> filter(:country => val -> val == country, data) |>
		data -> data[1, 2:end] |>
		data -> convert(Vector, data)
end;

I try to call the function like this:

get_country_data("Algeria")

But this is the error I’m getting:

ArgumentError: column name :country not found in the data frame

lookupname@index.jl:288[inlined]
getindex@index.jl:297[inlined]
#filter#79@abstractdataframe.jl:1002[inlined]
filter@abstractdataframe.jl:1002[inlined]
(::Main.workspace79.var"#1#5"{String})(::DataFrames.DataFrame)@Other: 3
|>(::DataFrames.DataFrame, ::Main.workspace79.var"#1#5"{String})@operators.jl:834
get_country_data(::String)@Other: 2
top-level scope@Local: 1[inlined]

It’s called Country/Region in your screenshot.

2 Likes

I suspect this would help you Best Julia Data Manipulation packages combo 2020-09 - YouTube

I don’t think your code works at all looking at the data.

It looks like this post suffers from the same confusion as your other post about converting from CSV to Parquet: what you are working with after reading in from CSV or Parquet is (most likely) a DataFrame object, which is the same irrespective of how it was constructed (i.e. read from a CSV file, Parquet file, Arrow file,… or indeed constructed “manually” like DataFrame(col1 = rand(10), col2 = rand(10)))

So your title “… retrieve a single row of data from Parquet” is a bit misleading - you are probably asking for a way to select a row from a DataFrame object (unless you are reading your parquet data into some other tabular type, in which case please specify this).

This works by simple indexing like for a standard two-dimensional Julia Array:

df[df."Country/Region" .== "Algeria", :]

will give you all rows for which the Country/Region column has the value Algeria, and all columns in that row.

I’d recomment you work through Bogumil’s excellent introduction to DataFrames here: GitHub - bkamins/Julia-DataFrames-Tutorial: A tutorial on Julia DataFrames package if you intend to work with them.

4 Likes