I am currently processing a dataset of 31 columns and 100.000 rows. Loading the data, describe them etc… is ok however while running some Query to match some rows is very slow. 100K rows is big but still not very big (might be mistaken here).
At first, I thought I am in Jupyter notebook and/or Jupyter lab therfore i am experimenting latency but then i run a *.jl script and it is still the same.
I then have started julia as julia -p 16 (in a server with 16 cores) and then run the script but still very slow.
I am currently using Query.jl to do my queries but am assuming it is going to be the same for others (need to see).
I have not played with multiprocessing on julia jet (thought it will do a bit of magic for me automatically)…
Anybody haveing a solution on how to use multiprocessing with tables/dataframes ?
My query is as follows:
@from row in df begin @where ((row.X< 10 && row.Y > 15) || (row.X > 15 && row.Y < 10)) && (row.O > 10) @select row @collect DataFrame
I also came in the meantime to JuliaDB which seems doing the job, actually I run the same query… and it comes back within the second.
The columns of the dataset are correctly Typed. At the moment not sure why the previous one was so slow… Hoever JuliaDB seems doing the job. Having both IndexedTable and DataFrame seem useful as DataFrame has useful functions to easily explore/summarize the data.
No this is not related to my previous post. Query.jl seems working and performing OK now. I need to reproduce the results and provide more data before an action/issue can be raised. Thank you.
Might have been a display issue. However, i did try the experiment in my working laptop and in a comp-server and in both I had the issue.