Error in JuMP / dataframe: How to get rid of missing and replace with blanks in DataFrames?

I am running into an error and the error message suggests that the source is JuMP. But I think the source is missing reported in dataframe. I would like to get rid of missing so the dataframe shows those entries as blanks. All the useful functions I can see would replace the whole columns whereas I would like to iterate through the whole dataframe and replace all “missing”.

The code below works fine when I create a dataframe by setting out the data within the code . This has one blank entry as I use "" to define it.

using JuMP, Clp, DataFrames
function get_data()
    Relationship = DataFrame(ID = ["A1","A2","A4"], LINK_1 = ["A5","A6","A10"],
    LINK_2 = ["A3","A7",""], LINK_3 = ["A9","A11","A12"])
    ID_t =  collect(1:15)
    ID = Vector{String}(undef,15)
    for i in 1:15
        ID[i]="A".*string(ID_t[i])
    end

    PRICE = rand(10.0:100.0,15)
    Master = DataFrame(ID = ID,PRICE = PRICE)
    return Relationship, Master
end

    Relationship, Master = get_data()
price = Master.PRICE
    model = Model(Clp.Optimizer)
    @variable(model, 0<=x[Master.ID]<=1)
    @variable(model,t[1:length(Relationship.ID)])

 for (i, row) in enumerate(eachrow(Relationship))
    for key in values(row)
        if !isempty(key) 
            @constraint(model, t[i] == x[key])
        end
    end
end

Now, if I write the dataframes to CSV files and read it back in Julia, it shows the missing entries and I run into an error.

Relationship, Master = get_data()
    CSV.write("Relat_2.csv",Relationship)
    CSV.write("Master_2.csv",Master)

    Relationship = CSV.read("Relat_2.csv")
    Master = CSV.read("Master_2.csv")
    price = Master.PRICE
    model = Model(Clp.Optimizer)
    @variable(model, 0<=x[Master.ID]<=1)
    @variable(model,t[1:length(Relationship.ID)])

  for (i, row) in enumerate(eachrow(Relationship))
    for key in values(row)
        if !isempty(key) 
            @constraint(model, t[i] == x[key])
        end
    end
end
MethodError: no method matching iterate(::Missing)
Closest candidates are:
  iterate(!Matched::ExponentialBackOff) at error.jl:252
  iterate(!Matched::ExponentialBackOff, !Matched::Any) at error.jl:252
  iterate(!Matched::MathOptInterface.Bridges.Objective.Map, !Matched::Any...) at C:\Users\.julia\packages\MathOptInterface\ZJFKw\src\Bridges\Objective\map.jl:30
  ...
in top-level scope at test_example.jl:30
in isempty at base\essentials.jl:737

Initially, I got this error message on the actual problem, but I may have made some other errors
JuMP error

How can I replace missing with blanks (assuming that missing maybe in many columns in any rows)? I don’t want to delete whole columns or whole rows

Seems like just checking if key is missing should be enough?

1 Like

In general, you can use both replace and coalesce to replace missing values with anything you want.

1 Like

Thank you. This is what I came up with but your suggestion seems more efficient.

for col in names(Relationship)
          replace!(Relationship[!,col], missing =>"")
end

You should use replace rather than replace! since replace! won’t change the column type to exclude missing values.

Thank you. It took me a while to figure out how to use replace. I think what @quinnj suggested may be more efficient.

Now I have run into another issue when I try to multiply a numerical matrix with x
ERROR: MethodError: no method matching -(::CartesianIndex{1}, ::Int64)
This should reproduce the error:
ones(5,15)*x
Would you know how to multiply a tuple /dictionary type of vector with a matrix?

You would have to be clear about what output you expect to get. Note that above, x is not defined… so I can’t reproduce the error

In the original post, the code defines x as a JuMP variable, but the issue is more to do with basic usage of Julia involving multiplication of matrices (with non-integer indices) in Julia. For example, to do a sum comprehension, I would use sum(price*x[id] for (id, price) in zip(Master.ID, Master.PRICE). I don’t know how multiplication with a matrix would work.

The full code that reproduces the error is given below

using JuMP, Clp, DataFrames
function get_data()
    Relationship = DataFrame(ID = ["A1","A2","A4"], LINK_1 = ["A5","A6","A10"],
    LINK_2 = ["A3","A7",""], LINK_3 = ["A9","A11","A12"])
    ID_t =  collect(1:15)
    ID = Vector{String}(undef,15)
    for i in 1:15
        ID[i]="A".*string(ID_t[i])
    end

    PRICE = rand(10.0:100.0,15)
    Master = DataFrame(ID = ID,PRICE = PRICE)
    return Relationship, Master
end


    Relationship, Master = get_data()
    price = Master.PRICE
        model = Model(Clp.Optimizer)
        @variable(model, 0<=x[Master.ID]<=1)
        @variable(model,t[1:length(Relationship.ID)])
    
        for (i, row) in enumerate(eachrow(Relationship))
        for key in values(row)
            if !isempty(key)
                @constraint(model, t[i] == x[key])
            end
        end
    end
    ones(5,15)*x   # This gives the error
julia> x
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
    Dimension 1, ["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"]
And data, a 15-element Array{VariableRef,1}:

I will post this as a separate question as I suspect it is arising from JuMP

Yeah sorry I have no idea how JuMP works so I cant be of much help.