CPU cycles and syscalls

I am doing a project on Julias performance compared to assembly, C and FPGA rpogramming. The thing I have noticed thus far is:
Julia uses 300,000,000 cycles and 130.7 msec to calculate 2+2
C uses 290,000 cycles and 0.37 msec
I am using perf stat to do my measurments.
How can I improve Julias performance, or get more accurate measurements? Is it possible to compile a Julia binary so I dont have to call de JIT compiler on every run?

julia> @btime 2+2
  0.001 ns (0 allocations: 0 bytes)

I’m not sure that this is a terribly useful benchmark - it might be more meaningful to measure operations that your application actually spends a significant amount of time on performing?

This is due to constant propagation. A single addition operation is not meaningful to measure but it at least doesn’t take 0.001 ns.

Agree it was a silly example - but goes to show that one can do 2+2 in Julia in less than 300 million (!?) cycles. It was rather meant to prompt OP to provide a bit more context to understand how he arrives at the conclusion that Julia is slower than C by a factor of 350 for adding integers.

The purpose ultimately is showing pros and cons of synthesized hardware regarding calculations. My proposed calculation is Fibonacci sequences, and this addition test was just to see if the experiment is viable. My idea is to implement he same algorithm in fpga , Julia and in my Xilinx fpga and compare cpu usage , time , operations , cycles etc.