I am doing a comparison between Python and Julia by analyzing the same data set.
I am using a data set containing nearly 800,000 rows of data. I load the data from two csv files containing equal amount of data each and then merge them together.
I applied Random Forest algorithm to the data.
Given below is the time the python and julia code took to do the same tasks.
Loading Data: Python - 2.195s
Julia - 15.232s
Merging data: Python - 0.1505s
Julia - 5.55s
Prediction time: Python - 10.2617s
Julia - 24.5291s
Visualization: Python - 0.3434s
Julia - 35.338s
Can anyone help me in understanding why Julia is so much slower compared to Python in the above mentioned tasks.
I’m guessing that this means you are including loading/compilation time. When you launch Julia and load a plotting package (e.g. using Plots) and run your first plot (e.g. plot(...)), you spend a lot of time waiting for everything to compile, after which point the code is fast.
Compilation time is irrelevant for large computational tasks because it scales with the code size, not the computation time. That is, if you are running something for an hour, waiting 30 seconds to compile at the beginning is irrelevant.
For interactive usage, the compilation delay is annoying, and is something that will be improved in future Julia versions — it’s just a matter of caching compiled code, nothing fundamental. However, for interactive exploration I would typically recommend just opening a Jupyter notebook, loading Plots and whatever other modules you need, and leaving the notebook open as you work (creating and evaluating new notebook cells as needed). If you are working interactively for more than a few minutes, a 30s delay at the beginning quickly becomes irrelevant.
(Even if you are doing development work, the Revise package means that you rarely need to restart an interactive Julia session.)