I’m comparing the performance of Julia and Python for a specific task involving time series data and feature extraction. I’ve implemented a parallel computation in both languages and noticed that Python is significantly faster than Julia in this scenario. I am trying to understand why this might be happening and suggestions for improving the performance of the Julia implementation.
In terms of speed, python has been on average 2x faster.
using Catch22
using Random
using Base.Threads
# Function to compute all features for a given time series
function compute_features(x::AbstractVector)
res = catch22(x)
return res
# Generate the dataset
dataset = [randn(2000) for _ in 1:5000] # 5000 time series, each with 2000 samples
# Print number of threads available
println("Number of threads available: ", Threads.nthreads())
# Measure the start time
start_time = time()
# Function to compute features in parallel
function compute_all_features_parallel(dataset)
results = Vector{Any}(undef, length(dataset))
@threads for i in 1:length(dataset)
results[i] = compute_features(dataset[i])
return results
# Compute features in parallel
results_list = compute_all_features_parallel(dataset)
# Measure the end time
end_time = time() - start_time
println("Multithreaded method time: $(end_time) seconds")
import pycatch22
import os
import time
import numpy as np
from joblib import Parallel, delayed
dataset = [np.random.randn(2000) for _ in range(5000)]
# Compute all with np array is the fastest.
def compute_features(x):
res = pycatch22.catch22_all(x)
return res # just return the values
print(f"Number of cores available: {os.cpu_count()}")
start_time = time.time()
threads_to_use = os.cpu_count()
results_list = Parallel(n_jobs=threads_to_use)(
delayed(compute_features)(dataset[i]) for i in range(len(dataset))
joblib_time = time.time() - start_time
print(f"Joblib method time: {joblib_time:.2f} seconds")