Interface for loading GBIF datasets into Flux

Here is an example of a dataset resulting from my query against the GBIF occurrences database.

As you can see it has its own DOI, is well-structured, and should be a good candidate for loading as a data source into Flux. I’m interested in applying fluxml.ai and transfer learning.

As I’m new to the Julia/ ML ecosystems could some folks please advise what is the best way to approach a re-useful way of loading this data. Or building some lightweight infrastructure to do so…

Making it easy to get your data into Flux is a great way to build community.
Jeremy Howard:
FastAI.jl Live Q&A (ML Community Call, 2021-08-02) - YouTube 44 mins in

I would say the way to think about this is “how do I get this data into Julia”. Unlike some other language ecosystems, Julia’s ML stack doesn’t try to invent a whole new set of data formats for inputs and outputs. So if you can read in that dataset as an array or dataframe, for example, you’ll be able to use it for ML.

Thanks, and I have loaded data susbsets into dataframes by hand. But the schema for GBIF occurrence datasets is fixed, and I’d like to have a standard way of creating training and reference sets from that schema.

I’m sure I’ll work it out…