Hi!
I started to learn Julia by solving Advent of Code exercises from previous years. My background is R but I’m a hobbyist dealing mostly with text mining for fun.
So far everything is fun and games, I go through AoC 2020 and I’m at day 6. Now, I have following task (paraphrased, hopefully correctly):
My input contains chunks of strings in lines.
The chunks are separated by empty lines.
I’m supposed to count all unique occurrences of lowercase letters for each chunk.
And sum up the occurrences for each chunk as the result.
Guided by the manual and Stack Overflow I ended up with following code. My plan:
Create a vector vec of vectors (chunks). Nested vectors (chunks) contain strings (lines) of my input.
Loop through the elements of nested vectors (strings/lines) and find all occurrences of each letter with countmap function using StatsBase. I’ve got a dictionary as a result.
For each chunk merge the dictionaries mapcounted for separate line. That will give me occurrences of each letter for a chunk as a merged dictionary.
Count elements of merged dictionaries, ignoring number of occurrences. I’m going for number of unique letters.
Sum up counted elements of merged dictionaries for each chunk.
My problem - the number I’m getting is too high and I don’t know what’s wrong with my reasoning and/or my code.
I think you might have overcomplicated the solution slightly. I’m not 100% sure I understand what is going on, but I have a feeling that the bug in this particular solution is that you keep adding unique counts of earlier chunks into the mix by merging cnt with mrg. Instead, you should simply take the length of cnt when accumulating the sub-results into cnt2.
That being said, I suggest you follow these rules of thumb:
avoid using global variables – they are error-prone and bad for the performance (a consequence of how the Julia compiler works);
prefer declarative over imperative – the less side effects you have, the less error-prone your code is (and it is also easier to test);
prefer built-in and stdlib functions over external libraries.
This particular problem can be solved with the following code:
sum(split(read("aoc2020d06input.txt", String), "\n\n")) do group
chars_in_group = collect(replace(group, '\n' => ""))
return length(unique(chars_in_group))
end
Taking it apart:
read() reads the entire contents of a file into a string.
split() breaks this up into chunks along double line breaks.
replace() joins the lines of the chunk, collect() takes all the characters in the string.
unique() drops repetitions, and length() returns the total (unique) count.
The external sum sums up the sub-sums for every chunk.
The do syntax is a Julia specialty, I suggest you look it up in the manual, it is a real treat.
On a related node, I happen to have solved all the tasks of AoC2020 in Julia. You can find my solutions on GitHub, for reference.
To see what happens to your code, try putting printlns in “appropriate” places.
In my opinion the problem is the failure to reset the chunk vector, after each subcycle.
See the difference by leaving the println with and without the statement commented out.
PS
Next time don’t attach a screenshot, but copy and paste the instructions so that I (or whoever else) doesn’t have to rewrite them from scratch.
file=open(".\\v1.8.3\\aoc1 2020.txt")
v=Vector{String}[]
chunk=String[]
for line in eachline(file)
if line == ""
push!(v,chunk)
# chunk=String[]
println("vector ", v)
else
push!(chunk,line)
println("chunk ", chunk)
end
end
push!(v,chunk)
close(file)
Thank you for the tips, they were very helpful. It seems I have to think a little bit more on handling the problem before I actually start to solve it. Seeing that it can be done in a very simple way is always very inspiring to me.