Help with Advent of Code exercise

thetathetatheta · January 27, 2023, 3:38pm

Hi!
I started to learn Julia by solving Advent of Code exercises from previous years. My background is R but I’m a hobbyist dealing mostly with text mining for fun.
So far everything is fun and games, I go through AoC 2020 and I’m at day 6. Now, I have following task (paraphrased, hopefully correctly):

My input contains chunks of strings in lines.
The chunks are separated by empty lines.
I’m supposed to count all unique occurrences of lowercase letters for each chunk.
And sum up the occurrences for each chunk as the result.

(two first chunks in my input file:)

fekdcbayqxnwvh
fycwqktvxandeb
kqbafvcxyewrdn
akwqcvenxfydbs
ewbaxdcvnkyfq

timjneyhbvxkfagdpzrous
gsumijvxoheptbafnkyzrd
yxtbnupramvdezhkfojsig
soaruhxnpiemjvytzbfdkg
vfanlgjoiskzmubtxhceyprd

Guided by the manual and Stack Overflow I ended up with following code. My plan:

Create a vector vec of vectors (chunks). Nested vectors (chunks) contain strings (lines) of my input.
Loop through the elements of nested vectors (strings/lines) and find all occurrences of each letter with countmap function using StatsBase. I’ve got a dictionary as a result.
For each chunk merge the dictionaries mapcounted for separate line. That will give me occurrences of each letter for a chunk as a merged dictionary.
Count elements of merged dictionaries, ignoring number of occurrences. I’m going for number of unique letters.
Sum up counted elements of merged dictionaries for each chunk.

My problem - the number I’m getting is too high and I don’t know what’s wrong with my reasoning and/or my code.

The code:

I’d be very grateful if someone could advise me on what I’m doing wrong and push me in the right direction.

HanD · January 27, 2023, 4:02pm

Hello and welcome to the Julia community!

I think you might have overcomplicated the solution slightly. I’m not 100% sure I understand what is going on, but I have a feeling that the bug in this particular solution is that you keep adding unique counts of earlier chunks into the mix by merging cnt with mrg. Instead, you should simply take the length of cnt when accumulating the sub-results into cnt2.

That being said, I suggest you follow these rules of thumb:

avoid using global variables – they are error-prone and bad for the performance (a consequence of how the Julia compiler works);
prefer declarative over imperative – the less side effects you have, the less error-prone your code is (and it is also easier to test);
prefer built-in and stdlib functions over external libraries.

This particular problem can be solved with the following code:

sum(split(read("aoc2020d06input.txt", String), "\n\n")) do group
    chars_in_group = collect(replace(group, '\n' => ""))
    return length(unique(chars_in_group))
end

Taking it apart:

read() reads the entire contents of a file into a string.
split() breaks this up into chunks along double line breaks.
replace() joins the lines of the chunk, collect() takes all the characters in the string.
unique() drops repetitions, and length() returns the total (unique) count.
The external sum sums up the sub-sums for every chunk.

The do syntax is a Julia specialty, I suggest you look it up in the manual, it is a real treat.

On a related node, I happen to have solved all the tasks of AoC2020 in Julia. You can find my solutions on GitHub, for reference.

HTH!

cjdoris · January 27, 2023, 4:35pm

@HanD Can I suggest you put the solution and explanation into a “details” block, since it is a spoiler for the OP.

rocco_sprmnt21 · January 28, 2023, 9:03am

To see what happens to your code, try putting printlns in “appropriate” places.
In my opinion the problem is the failure to reset the chunk vector, after each subcycle.
See the difference by leaving the println with and without the statement commented out.

PS
Next time don’t attach a screenshot, but copy and paste the instructions so that I (or whoever else) doesn’t have to rewrite them from scratch.

file=open(".\\v1.8.3\\aoc1 2020.txt")
v=Vector{String}[]
chunk=String[]
    for line in eachline(file)
        if line == ""
            push!(v,chunk)
            # chunk=String[]
            println("vector   ", v)
        else
            push!(chunk,line)
            println("chunk    ", chunk)
        end
    end

push!(v,chunk)
close(file)

rocco_sprmnt21 · January 28, 2023, 9:07am

in this way?

Summary

ss=split(read(".\\v1.8.3\\aoc1 2020.txt", String),"\n\n")
mapreduce(uss->count(islowercase,uss),+, unique.(ss))

thetathetatheta · January 30, 2023, 8:28am

Thank you for the tips, they were very helpful. It seems I have to think a little bit more on handling the problem before I actually start to solve it. Seeing that it can be done in a very simple way is always very inspiring to me.

Topic		Replies	Views
How to count all unique character frequency in a string? New to Julia question , statistics , strings	25	12106	January 8, 2019
Count words challenge Performance	24	2178	March 23, 2021
Count occurances for matrix rows (where column order does not matter) General Usage question , count	30	864	December 13, 2022
Cumulative count function? General Usage	8	910	July 31, 2019
Advent of Code 2021 General Usage	4	1121	December 2, 2021

Help with Advent of Code exercise

Related topics