Why Mac mini M1 and Mac book air have 4 times performance difference?

I tested exactly same julia code in MacBookAir 2020(M1, RAM: 8GB) and MaMini 2020(M1, RAM 16GB).

The code is

using LaTeXStrings
using DelimitedFiles
using Printf
using GZip
using Statistics
using LinearAlgebra
using PyCall
using Plots

import numpy as np
import pickle as pk
def load_comm():
    with open("./pr_comm.pk", "rb") as f:
        com_p = pk.load(f)
    with open("./cn_comm.pk", "rb") as f:
        com_c = pk.load(f)

    com = [0 for _ in range(len(com_p) + len(com_c))]

    for i in range(0,len(com_p)*2,2):
        if i % 8 < 4:
            com[i] = np.array(com_p[i//2])
            com[i+1] = np.array(com_c[i//2])
            com[i] = np.array(com_c[i//2])
            com[i+1] = np.array(com_p[i//2])

    com = np.array(com)
    return com

function load_sample(N::Int64, sam::String, K::Float64, n::Int64)
    path_step = @sprintf "./%d_step%s_n_%d_Kr_%.2lf.gz" N sam n K
    step = GZip.open(path_step, "r") do file
        # Read the file and parse it with readdlm
        readdlm(file, ',', Float64, '\n')
    return step

function reshape_step(step::Array{Float64, 2}, N::Int64)
    step_reshape = permutedims(reshape(step', N*2+5, 1002, 1000), [2,1,3]);
    return step_reshape

function load_and_process_sample(N::Int64, sam::String, K::Float64, n::Int64)
    step = load_sample(N, sam, K, n)
    step_reshape = reshape_step(step, N)
    omega = step_reshape[:,6:N+5,:]
    ke = omega.^2 .* 0.5
    time_ = step_reshape[:,3,1]
    time_[3:end] = time_[3:end]./100;
    return time_, ke

function plot_ke(ke, time_, comm_elem)
    p1 = plot(time_, ke[:,1,1],title = "Perturbed Node 0", xtick = [])
    p2 = plot(time_, ke[:,comm_elem[1,2:end],1],title = "Perturbed Node's Community", xtick = [])
    p3 = plot(time_, ke[:,comm_elem[2,1:end],1],title = "Nearest Community", xtick = [])
    p3 = plot!(time_, ke[:,comm_elem[5,1:end],1])
    p4 = plot(time_, ke[:,comm_elem[3,1:end],1], xlabel = "Time (s)",title = "Second Community", xtick = [])
    p4 = plot!(time_, ke[:,comm_elem[6,1:end],1])
    p4 = plot!(time_, ke[:,comm_elem[9,1:end],1])
    p = plot(p1, p2, p3, p4, layout = (4,1),legend = false, grid = false, ylabel = "KE",size=(700,800),
        fmt = :png, dpi=300, xlabelformat = :plain)
    return p

elapsed = @elapsed begin
N = 144
K = 6.0
n = 0
comm = py"load_comm"
comm_elem = comm() .+ 1

samples = ["00"]
titles = ["RegularLattice"]

for (i, sam) in enumerate(samples)
    print("Loading samples\n")
    time_, ke = load_and_process_sample(N, sam, K, n)
    println("loaded sample", sam)
    p = plot_ke(ke, time_, comm_elem)
    savefig(p, "./$(titles[i]).png")
println("Elapsed time: ", elapsed, " seconds")

The result in MacMini is about 53 seconds, and MacBook is about 203 seconds…
I cannot understand this difference…

I tested in VS code terminal with julia [filename].jl. The loaded file has about 800MB.

The most likely outcome is that you are running out of RAM. 8gb isn’t a lot and the OS (and vscode, browser etc) all take some.

1 Like

Actually, Julia may occupy so large memory. Same code for Python just takes 20 seconds and occupied memory is much smaller

I don’t understand why people says that Julia is faster than Python. I have also noticed many times that python script is faster than that in Julia. :roll_eyes: :innocent:

As I wrote in my blog: Why am I using Julia? | Julia programming notes

If you only use Python with algorithms for which C++ or Fortran libraries are available, then you will not see much of an advantage from using Julia. But if you write new programs or libraries using new algorithms Julia is much faster (if you follow the Performance Tips).

And yes, for larger Julia projects it is a good idea to have at least 16GB of RAM. The Julia compiler needs more RAM than the Python interpreter.

1 Like

This partly comes down to benchmarking differences. Running

julia myfile.jl

is a debatable way of measuring the performance of a Julia program, because it includes the (significant) startup and just-in-time compilation latency. The idea is that if you’re gonna run a function a thousand times in operational settings, you don’t really care that the first time takes 10 more seconds.
Try BenchmarkTools.jl or Chairmarks.jl if you want more robust Julia benchmarking.

I don’t think that this makes a significant difference in this case. If you process a large amount of data and your OS starts to swap things will always become terribly slow no matter how you benchmark. And the main difference between the two computer in question is the amount of RAM.

As far as I know, the Julia’s slogan is “Julia is Fast like C/C++ or Fortran, but it is Comfortable like Python”.
Moreover, it’s true that Julia’s run time of for or while loop is faster than Python.

The slogan really should be looks like Python and runs like C when properly optimized.

Julia gives you a LOT more power than Python. And you know what they say about great power!

I don’t know why there would be a such a difference between computers, but it would be worth looking into @Oscar_Smith’s suggestion about running out of memory. Try opening a resource monitor and see what the peak memory usage is.

One thing to note as well is that (typically) Python/NumPy defaults to using views for slices, while Julia makes copies. Using the @view macro for single slices or @views for a whole function can save a lot of memory and time.

Try using BenchmarkTools.jl to profile your individual functions. It looks to me like load_and_process_sample and especially reshape_step could be using a lot of memory unnecessarily.

Yeah… after I receiving replies, I checked memory. The occupied memory in macbook air is about 10GB…
Could you tell me how can I optimized the code?

Although, I converted the reshape_step and load_sample to Python function by using PyCall, I want to know there is another way.

As a first step, take a look at the official performance tips in particular the “pre-allocating outputs” and “Consider Using views for slices” sections.


Thanks a lot!!

1 Like