Hello again, friends,
I thought I’d ask this as ML and data science are somewhat out of my forte.
I’d like to use Julia for analyzing different datasets, not just for work. So to give you an idea, let me give a simple example:
Say we have a some S3 Bucket somewhere that holds parquet files, and we can read them in via DuckDB with DuckDB.jl and view them and plot them with DataFrames.jl, as well as slice up the data quickly as opposed to someone doing it with a CSV file in Excel. Lovely, however say we wanted to use ML with Julia to do some predictions. For the sake of it, because I’m not typing out a parquet file, we have the following data as a CSV:
Quarter,Year,"Total Sales","Total Business Expenses","CAPEX","Sales Percent Change from Previous Quarter","Percent Market Inflation Change","Insurance Paid"
1,2010,4502,1030,1000,0.12,0.1,2001
etc.
Let’s say this data goes up to this quarter and year. We want to get a decent estimate of numbers like total sales, adjusting for inflation, for the next 12 quarters. There’s likely lots of other data (columns) that could be in this example dataset (more like there would have to be in order to be more accurate), but for the sake of it that’s all I’m typing out, haha.
My question is…where does one start when using data like this with Flux and/or MLJ? I have to better learn about stuff like loss functions and the different kinds of regressions, but at least looking at the guides for Flux and MLJ they mention using existing datasets or just array with randomly populated data, but not custom ones.
I could be asking this and the answer is right in front of me, but I could also just not be too bright.