Hi there! Sorry to jump on this so late
I’m the author of Catch22.jl; thanks for bringing this to my attention, I had no idea the performance difference was so dramatic
The main hit to performance didn’t come from any of the Julia code, but from the compiled binaries (see here); even directly calling the shared library was twice as slow. Adding gcc -O2
solved the issue, cutting the run time for each feature by ~1/2.
I’ve just released a new version, v0.7.0, of Catch22.jl (which uses new optimized binaries), along with v0.6.0 of TimeseriesFeatures.jl (which cleans up many of the type instabilities related to feature set calculation).
Performance should now be equivalent to (or better than) the pycatch22 package, especially if you use FeatureSet
s [i.e. Catch22.catch22(x)
]:
Method (1000 time series of 10000 samples) | Time | Memory |
---|---|---|
Python loop (with z-score) | 8.83 s | 324 MiB (final), 644 MiB (peak) |
Manual Julia loop (manual method, no z-score) | 8.75 s | 1.29 MiB |
Julia FeatureSet (with z-score) |
8.23 s | 78.8 MiB |
Python loop
Similar to the original post, but with multiple runs and memory profiling. The pycatch22 package applies a z-score to the input time series for each feature
Manual Julia loop
Like the original post, this test calls the method of each catch22
feature individually, and operates on the raw time series rather than the z-score
Julia FeatureSet
This uses the exported functionality of TimeseriesFeatures
, automatically parallelizing feature calculations and preprocessing the input data with a z-score