I’m very new to Julia as well as coding. I’m wanting to prep my data for a classification task. I’m trying to write a function that will look through an object of type DataFrame and find any duplicate rows that have different values for the classification column (since it would be silly to try and classify those rows if the same inputs result in a different classification). I started to write a for loop that would check, but then I thought that wouldn’t work because there might be a row at index 3 and another at 30 that are (with the exception of the classification column) identical that the for loop wouldn’t catch (at least I think it wouldn’t…). I’ll attach what I wrote so far, but I’m not really sure where to go with this one… Any advice would be much appreciated!
print("Enter name of classification column \n\n")
class_name = readline()
for i = 1:size(df, 2)
for j = 1:size(df, 1)
if df[j, :class_name] == df[j+1, :class_name]
One thing to note,
class_name are different things. The first is a
Symbol, which is kind of like a
String, and the second is the actual value you get from
I’m not sure I agree with your premise - it’s not uncommon in regression and classification tasks for the same covariates to produce different outcomes, as in general you wouldn’t expect to perfectly observe all covariates (and hence have no error in the outcome).
In any case, it sounds like you might be interested in a
groupby operation on all your covariates, something like:
julia> df = DataFrame(y = [1, 2, 3, 4, 5], x1 = ["a", "b", "c", "c", "e"], x2 = ["f", "g", "d", "d", "e"])
Row │ y x1 x2
│ Int64 String String
1 │ 1 a f
2 │ 2 b g
3 │ 3 c d
4 │ 4 c d
5 │ 5 e e
julia> combine(groupby(df, [:x1, :x2]), :y => Ref => :y)
Row │ x1 x2 y
│ String String SubArray…
1 │ a f 
2 │ b g 
3 │ c d [3, 4]
4 │ e e 
When doing this you have to decide what to do with the multiple different outcomes (you could replace
Ref by e.g.
first to keep the first observed
mean to get the average, or whatever other function is appropriate in your case).