Why are two functions behave differently?

I’m doing some exercises in the ThinkJulia book.
Specifically, I try to solve the exercise 10-10:

Write a function that reads the file words.txt and builds an array with one element per word. Write two versions of this function, one using push! and the other using the idiom t = [t..., x]. Which one takes longer to run? Why?

Here is my code:

function version1(filename::String)
    array = []
    open(filename) do file
        for line in readlines(file)
            push!(array, line)
        end
    end
    array
end

function version2(filename::String)
    array = []
    open(filename) do file
        for line in readlines(file)
            array = [array..., line]
        end
    end
    array
end

@time println(version1(joinpath("..", "words.txt")))
@time println(version2(joinpath("..", "words.txt")))

When running two function, I observed that version1 prints the array after running function while version2 hangs for a while and then print the array.

version 1:
3.709137 seconds (4.47 M allocations: 115.453 MiB, 0.39% gc time, 2.21% compilation time)
version 2:
77.762709 seconds (5.06 M allocations: 96.644 GiB, 1.11% gc time, 0.14% compilation time)

What actually runs under 2 function? Why did we see the big gap between them? Thank you.

Your first function allocates an array and then asks for that array to be given one more element of space over and over.

Your second function allocates a whole new array each time that is one element bigger than the last and then copies all the old words into the new array, adding one new word to the end.

Allocation and copying data are both very expensive operations, so the second one takes a lot longer!

1 Like

A few more things:

  1. Splatting of dynamically sized containers is very slow.
  2. The used vector is untyped: array = []. This compounds the problem. Use a typed container: array = String[].
  3. The function does not solve the exercise correctly. It should return a vector with one word per element, not one line per element.
2 Likes