Interface for loading GBIF datasets into Flux

banksiaboy · August 7, 2021, 2:35am

Here is an example of a dataset resulting from my query against the GBIF occurrences database.

As you can see it has its own DOI, is well-structured, and should be a good candidate for loading as a data source into Flux. I’m interested in applying fluxml.ai and transfer learning.

As I’m new to the Julia/ ML ecosystems could some folks please advise what is the best way to approach a re-useful way of loading this data. Or building some lightweight infrastructure to do so…

banksiaboy · August 7, 2021, 5:44am

Making it easy to get your data into Flux is a great way to build community.
Jeremy Howard:
FastAI.jl Live Q&A (ML Community Call, 2021-08-02) - YouTube 44 mins in

ToucheSir · August 11, 2021, 6:35pm

I would say the way to think about this is “how do I get this data into Julia”. Unlike some other language ecosystems, Julia’s ML stack doesn’t try to invent a whole new set of data formats for inputs and outputs. So if you can read in that dataset as an array or dataframe, for example, you’ll be able to use it for ML.

banksiaboy · August 20, 2021, 6:47am

Thanks, and I have loaded data susbsets into dataframes by hand. But the schema for GBIF occurrence datasets is fixed, and I’d like to have a standard way of creating training and reference sets from that schema.

I’m sure I’ll work it out…

Topic		Replies	Views
Data loader for Flux Package Announcements package , data , flux	2	648	September 17, 2020
Where to begin for using Julia ML (Flux/MLJ/MLJFlux) with custom datasets? New to Julia question , dataframes , machine-learning	10	250	October 21, 2024
PyTorch DataLoader equivalent for training large models with Flux Machine Learning flux	16	4096	November 8, 2020
How to use dataloader New to Julia flux	0	289	October 31, 2020
Machine Learning using Julia - Aim/Idealogy of Flux.jl to for simplicity over compexity for programmers Machine Learning question , flux , machine-learning	11	1898	February 8, 2022

Interface for loading GBIF datasets into Flux

Related topics