DataFrame: Cannot have duplicated names for indices

Hello everyone,

I’d just spent 5 hours on a simple issue with joining two dataframes.
I’ve basically two dataframes, one with a column of sectors and value, the other with new sectors mapped to the previous one as well as weights associated.

My issue is that I can’t perform leftjoin on the column sector in my real example, due to the error message “Cannot have duplicated names for indices”.

I’m not able to reproduce a MWE giving the same error message, but here an example of what I want to perform:

df_initial = DataFrame(code = ["a","b","c"],country = ["AU","AU","AU"] ,value = [10, 68, 50])
insertcols!(df_initial, :sector => string.(df_initial[:,:code], "-",df_initial[:,:country]))
df_weights = DataFrame(code = ["a","a","a","b","b","c"], country = ["AU","AU","AU","AU","AU","AU"],new_sector = ["new-1","new-2","new-3","new-4","new-2","new-1"], weights = [0.2, 0.2, 0.6, 0.4, 0.6, 1])
insertcols!(df_weights, :sector => string.(df_weights[:,:code], "-",df_weights[:,:country]))

df_test = leftjoin(df_weights[:,[:sector,:new_sector, :weights]], df_initial[:,[:sector,:value]], on = :sector)

Have you any idea of what can be the source of a potential “duplicated names for indices” message?

you could add ;makeunique=true to the join call and see which columns clash

Unfortunately it can’t help, I still have the same issue using makeunique = true.

Indeed, this could help if the issue was about column names, but it isn’t. It seems that duplicated keys for merging aren’t allowed.

I can see it by using

leftjoin(df, df_2, on = :sector, validate = (true, true))

which gives the following error (in my real example):

Merge key(s) in df1 are not unique. df1 contains 1452 duplicate keys: (sector = "AUT-A01",), ..., (sector = "ROW-J59_J60",).

But I really don’t understand why I can’t have nonunique keys in this real example while it is allowed in my MWE above…

Can you please show a full stack trace. This is an error not in DataFrames.jl, but in the package that provides vectors for your columns. Most likely some of your columns come from and this causes an error (such columns do not allow for duplicate rows).

You’re right! Some of my columns come from a NamedArray initially.

Here is indeed a MWE giving the error message I have:

using DataFrames
using NamedArrays

initial = NamedArray([5748.61], ["AUT-A01"])
df_initial = DataFrame(:sector => names(initial,1), :value => initial[:,1])
df_weights = DataFrame(sector = ["AUT-A01","AUT-A01","AUT-A01","AUT-A01","AUT-A01"],
    new = ["a","b","c","d","e"],
 weights = [0.2, 0.2, 0.4, 0.1, 0.1]

test = leftjoin(df_weights, df_initial, on = :sector)

you need to convert these columns to Vector because you have duplicates.


Thank you so much!

For those wondering, this is the solution with the MWE thanks to @bkamins:

using DataFrames
using NamedArrays

initial = NamedArray([5748.61], ["AUT-A01"])
df_initial = DataFrame(:sector => vec(names(initial,1)), :value => vec(initial[:,1]))
df_weights = DataFrame(sector = ["AUT-A01","AUT-A01","AUT-A01","AUT-A01","AUT-A01"],
    new = ["a","b","c","d","e"],
 weights = [0.2, 0.2, 0.4, 0.1, 0.1]

test = leftjoin(df_weights, df_initial, on = :sector)
1 Like

7 posts were split to a new topic: Column types in DataFrames vs. InMemoryDatasets