I have a dataset that looks like this:
Elapsed Days |
Insects Collected on that day |
1 |
0 |
2 |
3 |
4 |
25 |
5 |
21 |
6 |
14 |
8 |
2 |
and I have a second dataset that is similar but is missing different days
Elapsed Days |
Insects Collected on that day |
1 |
1 |
3 |
4 |
4 |
17 |
5 |
43 |
8 |
2 |
12 |
3 |
I want to fill in the missing day in the sequence and place a zero in insects collected column
I could do this manually, but I have 34 datasets some with only 4 data points and some with 18 data points. I want all of the datasets to have the same number of rows. I know I could do it with and βifβ or βfor/endβ loop but that seems like the brute force method and was hoping Julia had a more elegant way to do this
Thank you
Mike
master = DataFrame(elapsed = 1:18)
master.collected .= 0
set_1 = DataFrame(elapsed = [1,2,4,5,6,8], collected = [0,3,25,21,14,2])
set_2 = DataFrame(elapsed = [1,3,4,5,8,12], collected = [1,4,17,43,2,3])
master = leftjoin(master, set_1, on=:elapsed, makeunique=true)
master = leftjoin(master, set_2, on=:elapsed, makeunique=true)
master.collected_1 = coalesce.(master.collected_1, 0)
master.collected_2 = coalesce.(master.collected_2, 0)
master.collected = master.collected_1 .+ master.collected_2
sort!(master, :elapsed)
println(master)
18Γ4 DataFrame
Row β elapsed collected collected_1 collected_2
β Int64 Int64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 1 0 1
2 β 2 3 3 0
3 β 3 4 0 4
4 β 4 42 25 17
5 β 5 64 21 43
6 β 6 14 14 0
7 β 7 0 0 0
8 β 8 4 2 2
9 β 9 0 0 0
10 β 10 0 0 0
11 β 11 0 0 0
12 β 12 3 0 3
13 β 13 0 0 0
14 β 14 0 0 0
15 β 15 0 0 0
16 β 16 0 0 0
17 β 17 0 0 0
18 β 18 0 0 0
Thank you so much. I never thought about LeftJoin. Makes perfect sense now.
2 Likes