# How to group by multiple data types?

Hello all,

I’m a bit new to the Julia programming language and haven’t been able to find an answer that solves my problem. I have the following dataframe:

``````10×5 DataFrame
Row │ DATE    TOPIC_I    TOPIC_J   JOINT_PROB    DOC_COUNT
│ Date      String15     String15    Any                     Any
──┼─────────────────────────────────────────
1 │ 2000-09-01  TOPIC_153  TOPIC_87  0.03806723138    979
2 │ 2000-09-01  TOPIC_81   TOPIC_87  0.01825187194    979
3 │ 2000-09-01  TOPIC_249  TOPIC_87  0.01616933848    979
4 │ 2000-09-01  TOPIC_124  TOPIC_87  0.006607188145   979
5 │ 2000-09-01  TOPIC_140  TOPIC_87  0.0008916937195  979
6 │ 2000-09-01  TOPIC_101  TOPIC_87  0.001341542903   979
7 │ 2000-09-01  TOPIC_89   TOPIC_87  0.07842244991    979
8 │ 2000-09-01  TOPIC_233  TOPIC_87  0.01956784903    979
9 │ 2000-09-01  TOPIC_144  TOPIC_87  0.01501348474    979
10 │ 2000-09-01  TOPIC_201  TOPIC_87  0.007407990334   979
``````

I am trying to group by the DATE and TOPIC_I rows, sum the JOINT_PROB rows and take the average of the DOC_COUNT rows. I have implemented the code below:

``````# Convert the joint probabilities column and document count column to the correct types.
stuff = [typeof(x) for x in probabilities_data[!, :JOINT_PROB]]
println(unique(stuff))

probabilities_data[!, :JOINT_PROB] = [typeof(x) == String ? tryparse(Float64,x) : x for x in probabilities_data[!, :JOINT_PROB]]

stuff = [typeof(x) for x in probabilities_data[!, :JOINT_PROB]]
println(unique(stuff))

p_i_group = groupby(probabilities_data, [:DATE, :TOPIC_I])
pi_df = combine(p_i_group, :TOPIC_J => sum => :PROB_I)
``````

I countinue to get the following error related to the last line of code above:

``````TaskFailedException:
MethodError: no method matching +(::String15, ::String15)
``````

As far as I can tell my syntax is correct. Can someone help me find what I am missing?

Those are string columns… perhaps you mean to be summing the probabilities?

1 Like

My apologies. You are correct. I want to sum the “JOINT_PROB” column. I updated the question to reflect this.

1 Like

Your `JOINT_PROB` and `DOC_COUNT` have both `eltype` `Any` which signals that there is a risk that they do not contain only numbers. If they contain only numbers then the following will work:

``````using Statistics
p_i_group = groupby(probabilities_data, [:DATE, :TOPIC_I])
pi_df = combine(p_i_group, :JOINT_PROB => sum => :PROB_I, :DOC_COUNT => mean => :MEAN_COUNT)
``````

the easiest way to check if your column contains only numbers is to do e.g. `float.(probabilities_data.JOINT_PROB)`. If this errors this means that you have some bad data in your columns.

1 Like

It seems there was an issue with the input data. After fixing that and setting the column to “JOINT_PROB” based on your and @tbeason feedback it looks like the code is working. Thank you both for your help!