The One Billion Row Challenge

Oh the file itself is loaded into RAM? That will definitely change my timings.

1 Like

yep, standard string manipulation error. @view not to blame.
Fix by using:

station = @view(measurement[1:prevind(measurement,pos)])

Using this @view approach should improve speed.

prevind :person_facepalming: I should have figure it out. thnx.
now, don’t think it can go any “baseline” than this then and it runs in about 4min 47sec on my laptop.

ndé=-24.3/23.8/76.3, Yellowknife=-54.2/-4.3/43.7, Yerevan=-43.4/12.4/61.4, Yinchuan=-41.3/9.0/61.1, Zagreb=-38.3/10.7/62.9, Zanzibar City=-21.8/26.0/75.3, Zürich=-38.6/9.3/61.5, Ürümqi=-43.0/7.4/58.1, İzmir=-34.3/17.9/70.2}
./calculate_average_3j5a_julia.sh  281.46s user 5.44s system 100% cpu 4:46.84 total

below is full updated version of it:

mutable struct MeasurementsStats
    min::Float32; max::Float32; sum::Float64; count::Int64
end

roundjava(it, digits) = round(it, RoundNearestTiesUp; digits = digits)
roundjava(it) = roundjava(it, 1)

function outformat((station, stats))
     average = roundjava(10*stats.sum/stats.count)/10
    "$station=$(roundjava(stats.min))/$(roundjava(average))/$(roundjava(stats.max))"
end

function calculate_average(measurements)
    station_measurements = Dict{String,MeasurementsStats}()
    open(measurements, "r") do file
        for measurement in eachline(file)
            pos = findfirst(';', measurement)
            station = @view(measurement[1:prevind(measurement, pos)])
            temperature = parse(Float32, @view(measurement[pos+1:end]))
            stats = get!(station_measurements, station) do
                MeasurementsStats(temperature, temperature, 0, 0)
            end
            stats.min = min(stats.min, temperature)
            stats.max = max(stats.max, temperature)
            stats.sum += temperature
            stats.count += 1
        end
    end
    results = collect(station_measurements)
    sort!(results, by = it -> it[1])
    print("{")
    join(stdout, (outformat(result) for result in results), ", ")
    println("}")
end

calculate_average(isempty(ARGS) ? "./measurements.txt" : ARGS[1])

From the prior thread on this, the baseline could be the duckdb entry loaded into a DataFrame. Preferably with the latest dev version of duckdb because the version on the registry had an issue previously.

1 Like

The mutable struct might cause some allocation slowdown in the big loop. Replacing it with a Tuple might help. The following is untested, but should be fixable:

function calculate_average(measurements)
    station_measurements = Tuple{Float32, Float32, Float64, Int}[]
    # use sizehint! on station_measurements to avoid some allocations?
    station_index = Dict{String,Int}()
    station_count = 0
    open(measurements, "r") do file
        for measurement in eachline(file)
            pos = findfirst(';', measurement)
            station = @view(measurement[1:prevind(measurement, pos)])
            temperature = parse(Float32, @view(measurement[pos+1:end]))
            idx = get!(station_index, station) do
                push!(station_measurements, (temperature, temperature, 0.0, 0))
                station_count += 1
            end
            stats = station_measurements[idx]
            station_measurements[idx] = (min(stats.min, temperature),
                                         max(stats.max, temperature),
                                         stats.sum + temperature,
                                         stats.count + 1)
        end
    end
    print("{")
    for (i,p) in enumerate(sort(stations_measurements))
        print(i > 1 ? ", " : "", outformat(p))
    end
    println("}")
end

outformat becomes:

function outformat((station, stats))
    average = roundjava(10*stats[3]/stats[4])/10
    "$station=$(roundjava(stats[1]))/$(roundjava(average))/$(roundjava(stats[2]))"
end
1 Like

They explain that in the challenge somewhere. Official measurement is from cold-start of the program, but the actual data file is already in page-cache.

Ideally no IO should hit the bus and the data is processed zero-copy (just mapped).

For julia, you should probably modify the requirements, i.e. report the following performance numbers:

  1. Naive (script runs, i.e. needs compilation)
  2. The standard julia way. You run the code on a tiny sample datafile in order to get everything compiled, then you time running it on the actual data.
  3. With precompilation (i.e. you make a package and precompile, then run the script)
  4. With prepared sysimage.

Of these, (1) and (2) are low-effort, and (1) will have the worst timings (startup + compilation included) and (2) will have the best timings (startup + compilation excluded).

The way I understand it, the 1brc benchmark ensures that stuff is in the page-cache by running the entire thing multiple times on a linux machine with ample memory, without any flushing. But afaiu you can also just put your data file into /tmp on a machine with swap disabled.

I’d like to note that 1brc invites a complete anti-pattern: If the file is not already loaded into page-cache, then a naive mmap on SSD will be worse that read, because you can’t feed the ssd read queue unless you massively multi-thread (page-faults block!). Consider MAP_POPULATE.

An excellent example is 1brc/src/main/java/dev/morling/onebrc/CalculateAverage_shipilev.java at main · gunnarmorling/1brc · GitHub

This is by far not the fastest submission to 1brc, but it is well-documented, careful and idiomatic. On modern java, it should work around that issue (it spawns enough tasks to fill the io queue if you are in a post-loom world), and it also takes care to work in settings where the data doesn’t even fit into main memory.

That being said, I have not verified that the shipilev submission actually works well in that setting :wink:

3 Likes

Right, I see what you mean. Mutable struct is an attempt for naive but reasonable first approach to the problem… but maybe nobody in Julia land would start with it anyway? I was looking at the baseline implementation of the challenge, where results aggregator is a mutable struct (well kind of obviously, it’s a class with mutable fields), and results get added to a kind of standard map/dictionary.
It goes beyond a naive implementation very quickly, so it’s not a real “baseline” to the problem as it’s ridiculously easy to beat it. But I’m already learning things by just starting from it. Thnx!

1 Like

Hi, I was about to make a post on 1BRC in Julia, and then found this discussion. Have there been any benchmarks shared already?

My implementation makes use of external packages with a best time of 89.459 s. I am sure there’s a lot more room for improvement. It would therefore be great to share ideas/solutions here.

1 Like

how fast is their disk… at that point should the benchmark starts with data in RAM?

The file starts in RAM. I think they don’t force it (e.g. by disabling swap and putting it on /tmp/), but it is a natural consequence of methodology, kernel and machine sizing, and it is known to them.

1 Like