DataFrame: Cannot have duplicated names for indices

tlorans · March 24, 2022, 2:22pm

Hello everyone,

I’d just spent 5 hours on a simple issue with joining two dataframes.
I’ve basically two dataframes, one with a column of sectors and value, the other with new sectors mapped to the previous one as well as weights associated.

My issue is that I can’t perform leftjoin on the column sector in my real example, due to the error message “Cannot have duplicated names for indices”.

I’m not able to reproduce a MWE giving the same error message, but here an example of what I want to perform:

df_initial = DataFrame(code = ["a","b","c"],country = ["AU","AU","AU"] ,value = [10, 68, 50])
insertcols!(df_initial, :sector => string.(df_initial[:,:code], "-",df_initial[:,:country]))
df_weights = DataFrame(code = ["a","a","a","b","b","c"], country = ["AU","AU","AU","AU","AU","AU"],new_sector = ["new-1","new-2","new-3","new-4","new-2","new-1"], weights = [0.2, 0.2, 0.6, 0.4, 0.6, 1])
insertcols!(df_weights, :sector => string.(df_weights[:,:code], "-",df_weights[:,:country]))

df_test = leftjoin(df_weights[:,[:sector,:new_sector, :weights]], df_initial[:,[:sector,:value]], on = :sector)

Have you any idea of what can be the source of a potential “duplicated names for indices” message?

lawless-m · March 24, 2022, 2:25pm

you could add ;makeunique=true to the join call and see which columns clash

tlorans · March 24, 2022, 2:31pm

Unfortunately it can’t help, I still have the same issue using makeunique = true.

Indeed, this could help if the issue was about column names, but it isn’t. It seems that duplicated keys for merging aren’t allowed.

I can see it by using

leftjoin(df, df_2, on = :sector, validate = (true, true))

which gives the following error (in my real example):

Merge key(s) in df1 are not unique. df1 contains 1452 duplicate keys: (sector = "AUT-A01",), ..., (sector = "ROW-J59_J60",).

But I really don’t understand why I can’t have nonunique keys in this real example while it is allowed in my MWE above…

bkamins · March 24, 2022, 2:53pm

Can you please show a full stack trace. This is an error not in DataFrames.jl, but in the package that provides vectors for your columns. Most likely some of your columns come from https://github.com/davidavdav/NamedArrays.jl and this causes an error (such columns do not allow for duplicate rows).

tlorans · March 24, 2022, 2:59pm

You’re right! Some of my columns come from a NamedArray initially.

Here is indeed a MWE giving the error message I have:


using DataFrames
using NamedArrays

initial = NamedArray([5748.61], ["AUT-A01"])
df_initial = DataFrame(:sector => names(initial,1), :value => initial[:,1])
df_weights = DataFrame(sector = ["AUT-A01","AUT-A01","AUT-A01","AUT-A01","AUT-A01"],
    new = ["a","b","c","d","e"],
 weights = [0.2, 0.2, 0.4, 0.1, 0.1]
)

test = leftjoin(df_weights, df_initial, on = :sector)

bkamins · March 24, 2022, 3:06pm

you need to convert these columns to Vector because you have duplicates.

tlorans · March 24, 2022, 3:13pm

Thank you so much!

For those wondering, this is the solution with the MWE thanks to @bkamins:


using DataFrames
using NamedArrays

initial = NamedArray([5748.61], ["AUT-A01"])
df_initial = DataFrame(:sector => vec(names(initial,1)), :value => vec(initial[:,1]))
df_weights = DataFrame(sector = ["AUT-A01","AUT-A01","AUT-A01","AUT-A01","AUT-A01"],
    new = ["a","b","c","d","e"],
 weights = [0.2, 0.2, 0.4, 0.1, 0.1]
)

test = leftjoin(df_weights, df_initial, on = :sector)

mbauman · March 29, 2022, 2:48pm

7 posts were split to a new topic: Column types in DataFrames vs. InMemoryDatasets

Topic		Replies	Views
Query.@join with repeated names Data query	1	713	April 24, 2019
Joining two dataframes of different size only when colum values are coincident New to Julia dataframes	2	332	February 1, 2023
How to merge 2 dataframes (DataFrames.jl) General Usage question , dataframes	5	5181	July 9, 2021
Unsure how to solve error message when applying unstack to DataFrame General Usage question , package , dataframes , unstack	6	3623	May 14, 2024
Using nonunique() with multiple dataframe columns General Usage question , dataframes	2	741	July 13, 2021

DataFrame: Cannot have duplicated names for indices

Related topics