Understanding several concepts from a julia> ?filter example

I’m new to the language, but somewhat experienced with the programming in general. I am learning from Julia Data Science book and I reached the section discussing filters. Having some issues understanding the material there, I checked the built-in help by typing ?filters in the terminal. One of the examples is this:

julia> d = Dict(1=>"a", 2=>"b")
  Dict{Int64, String} with 2 entries:
    2 => "b"
    1 => "a"

 julia> filter(p->isodd(p.first), d)
 Dict{Int64, String} with 1 entry:
    1 => "a"

Now, I’m completely baffled by the first argument in the filter(p->isodd(p.first), d). So, as per the documentation,

filter(f, d::AbstractDict) removes elements for which f is false.

p->isodd(p.first) looks like an anonymous function. I am performing isodd(p.first) on parameter p. What does the parameter p represent here? is it the key, the value, or the pair?

Next, Base.isodd documentation says that isodd returns a boolean based on the number argument, isodd(x::Number) -> Bool. From this I understand that p.first is a number, and the only numbers I have in the Dict are its keys. So here again I am confused by what p.first does. Base.First documentation doesn’t say anything useful. When I try to change .first to .last I get an error.

type Pair has no field last

so when I check Base.Pair documentation it says that

Pair is treated as a single “scalar” for broadcasting operations

Is this what is happening here, is p.first necessary because the whole key/value pair is treated as a single “scalar”? That’s as far as I got through deciphering this. So, 1 => "a" is pair or “scalar” one = odd, bool is true, not removed, 2 => "b" is pair two = even, bool is false, f removes it? Please help me understand this. Even if this is true, why do I need .first? Error when trying just p -> isodd(p) that I’m missing a method for isodd - again, nothing I can figure out from reading the Base.isodd documentation.

And finally, is this really the best way to do it? I mean, can you filter Dict elements from this example in a less convoluted way? Do you have to go with the anonymous function and the obscure “scalar” behavior of the Core.Pair objects when broadcasting? I’m asking honestly, this is all very beginner unfriendly. I would like to go through some more basic and logical with this example in hope to understand why this example has been set up this way.

Thanks in advance.

All you need to know is that dictionaries are iterators of pairs and that each pair has a pair.first (key) and pair.second (value). Everything else follows from that.

If you are unsure about the iterator items, you can always collect(iterator) to get a vector of items.

2 Likes

Hi @MK-1138 and welcome to the Julia community! :wave:

I can see why you are getting confused here - especially if you are coming from another language!
Let’s break this down one step at a time.

First given our dictionary d as:

d = Dict(1 => "a", 2 => "b")

The keys of this dictionary will be:

julia> keys(d)
julia> KeySet for a Dict{Int64, String} with 2 entries. Keys:
  2
  1

The keys of this dictionary are in fact the integers, 1 and 2.
This is fully valid in Julia to do but I admit, it is somewhat strange to see a dictionary key being an integer value - that is what I thought when I first saw it.
Moving on, let’s look at our values:

julia> d |> values
julia> ValueIterator for a Dict{Int64, String} with 2 entries. Values:
  "b"
  "a"

So here, we can see that the values are the strings "a" and "b".
To get one of these values, we can index using our keys:

julia> d[1]
julia> "a"

This works because we are using the integer 1 as a key which has the value "a".
Now, moving on to the filter function, this accepts as its first argument, an anonymous function, and its second object, an Iterable object (such as lists, dictionary, etc.).
How anonymous functions can look in Julia are like this:

x -> x^2

This is a valid anonymous function in Julia and also can looks strange!
However, what it expects is any value that is given to it, is squared.
Let’s look at d and use it in an anonymous function to print our values:

julia> values(d) .|> val -> println(val);
b
a

And here, we can read this statement as, “using the values of d, send each one to be printed.”
The .|> symbol can be thought of as the “each one” in the statement (if you are unfamiliar with broadcasting, please check it out here).
And the val variable is just an anonymous variable that contains the value of each key passed.

Where it gets confusing is that in the filter statement, filter(p -> isodd(p.first), d) is how is our dictionary being passed to the anonymous function?
In this case, as d is a dictionary, each key-value pair in that dictionary gets converted into a special memory efficient type of dictionary called a Pair and passed to the anonymous function.
Here is an example:

julia>p = 1 => "a"
julia> 1 => "a"

and because it is a Pair, we can utilize a special accessor that gives us back quickly the key:

julia>p.first
julia>1

So going back to our filter statement, filter(p -> isodd(p.first), d) we can translated this into the statement, "For each key-value Pair in my dictionary, return me all the key-value Pairs if the key for that Pair is an odd value.

Another simpler example might be something like this:

> fruits = Dict("apples" => 1.5, "bananas" => 3.0, "oranges" => 2)
> Dict{String, Real} with 3 entries:
  "bananas" => 3.0
  "oranges" => 2
  "apples"  => 1.5

> filter(pair -> pair.second <= 2.0, fruits)
> ("apples" => 1.5, "oranges" => 2)

Or if you want even more verbosity:

> cost = 2.0
> for fruit in keys(groceries)
      if groceries[fruit] <= cost
          println(fruit)
      end
  end

oranges
apples

Hope that clears up some of the confusion!

~ tcp :deciduous_tree:

1 Like

Thank you both @juliohm and @TheCedarPrince. You both explained beautifully, in your own two very different ways, what I asked. I appreciate it.

dictionaries are iterators of pairs and (…) each pair has a pair.first (key) and pair.second (value).

Only now I understand what I read in Core.Pair docs:

The elements are stored in the fields first and second .

Therefore, also thank you for the recommendation to check the broadcasting docs. The answers are helping me understanding how to read the docs as well.