Extract relevant lines from large file using IDs from other file

There are a few things that stand out immediately:

  1. You’re using non-constant global variables, which are particularly slow in Julia. Instead, put your actual code into a function and call it.
  2. Any[] is an abstractly-typed container, which will be slower than a container of a concrete type (like String).
  3. You’re building a vector and then repeatedly trying to search for items in it. That means that a vector is probably not the right data structure: a Set{String}() will make the ID in queries line much faster (it has O(1) lookup instead of O(N)).

Please read Performance Tips · The Julia Language which covers the general Julia performance best practices (for example, “Avoid Global Variables” is item 1).

3 Likes