# How would I divide each row of a Julia dataframe by that rows maximum and return a new dataframe

Hi,

I’ve been looking around and messing with map, eachrow and different things but I haven’t been able to figure it out.

Essentially, if I have a N x M dataframe, I want to return a new N x M dataframe, except each value in the new dataframe is each value in the old dataframe divided by the maximum of the row it originally sits in the old dataframe.

For a 1 row dataframe of [2,4,6] it should return [.33,.66,1].

But in my use case mapped to a dataframe with many rows.

Probably not the most efficient solution:

``````df = DataFrame(a = rand(1:10, 3), b = rand(1:10, 3), c=rand(1:10, 3))
dfn = DataFrame(Float64, 0, 3) ## create a new dataframe with same number of columns
for r in eachrow(df)
m = collect(r) ./ maximum(r)
push!(dfn, m)
end
``````

Have you tried

``````df ./ maximum.(eachrow(df))
``````

May be it is possible to use `Matrix` instead of DataFrame? Rectangular matrix, filled with the values of the same type it’s, well, matrix.

1 Like

Yeah dataframe eachrow is taking too long damn. How would I do this with a Matrix? There isn’t a eachrow method

There is an `eachrow` function for matrices. What version of Julia are you using?

1 Like

The `eachrow` version seems slow. Try the second version below:

``````julia> foo(df) = df ./ maximum.(eachrow(df));

julia> bar(df) = df ./ [maximum(df[i,:]) for i in 1:size(df,1)];
``````

Performance test:

``````julia> df = DataFrame([Symbol("c\$i") => rand(1000) for i in 1:100]...);

julia> size(df)
(1000, 100)

julia> @btime foo(\$df);
5.139 s (55639946 allocations: 1003.91 MiB)

julia> @btime bar(\$df);
41.489 ms (555946 allocations: 10.83 MiB)
``````

I suppose it can be golf coded, but generally it can be something like this

``````m = rand(1000, 100)
function baz!(m)
for i in axes(m, 1)
@views m[i, :] .= m[i, :] ./ maximum(m[i, :])
end
end

@btime baz!(\$m) # 386.500 μs (3000 allocations: 140.63 KiB)
``````
``````function fasterbaz!(m)
m ./= maximum(m; dims = 2)
end
``````
``````@btime fasterbaz!(\$m);
181.505 μs (22 allocations: 8.52 KiB)
``````

Interestingly, my non-allocating version is slower, probably because it’s accessing m’s memory in non-optimal order:

``````function fasterbaz2!(m)
ncols = size(m, 2)
for i in axes(m, 1)
maxi = maximum(m[i, j] for j in 1:ncols)
for j in 1:ncols
m[i, j] /= maxi
end
end
m
end
``````
``````@btime fasterbaz2!(\$m);
338.469 μs (0 allocations: 0 bytes)
``````
1 Like

It can be written as

``````function baa!(m)
for i in axes(m, 1)
mval = -Inf
for j in axes(m, 2)
mval = mval < m[i, j] ? m[i, j] : mval
end
for j in axes(m, 2)
m[i, j] /= mval
end
end
end

@btime baa!(\$m)   # 202.932 μs (0 allocations: 0 bytes)
``````

which is still slower then allocating version.

But you gave me an idea

``````
function baa2!(m)
maxi = Vector{Float64}(undef, size(m, 1))
@inbounds for i in axes(m, 1)
mval = -Inf
for j in axes(m, 2)
mval = mval < m[i, j] ? m[i, j] : mval
end
maxi[i] = mval
end
@inbounds for j in axes(m, 2)
for i in axes(m, 1)
m[i, j] /= maxi[i]
end
end
end

@btime baa2!(\$m)  # 123.239 μs (1 allocation: 7.94 KiB)
``````
1 Like

I got one more: although it’s getting a bit ridiculous syntax wise ``````function fasterbaz3!(m)
nrows, ncols = size(m)

maximums = m[:, 1] # copying the first col saves one col in the first iteration haha

# iterate down the rows first which matches julia's memory layout
@inbounds for j in 2:ncols, i in 1:nrows
maximums[i] = max(maximums[i], m[i, j])
end

@inbounds for j in 1:ncols, i in 1:nrows
m[i, j] /= maximums[i]
end
end
``````
``````@btime fasterbaz3!(\$m);
113.390 μs (1 allocation: 7.94 KiB)
``````
1 Like

haha nice exactly the same moment

1 Like

ok ok very last one! let’s use the fact that multiplications are faster than divisions…

``````function fasterbaz4!(m)
nrows, ncols = size(m)

maximums = m[:, 1]

@inbounds for j in 2:ncols, i in 1:nrows
maximums[i] = max(maximums[i], m[i, j])
end

# now maximums are actually their inverse for multiplication below
maximums .= 1 ./ maximums

@inbounds for j in 1:ncols, i in 1:nrows
m[i, j] *= maximums[i]
end
end
``````
``````@btime fasterbaz4!(\$m);
79.004 μs (1 allocation: 7.94 KiB)
``````
1 Like

This is so so so cool!!!

And now my turn

``````
function baa3!(m)
maxi = m[:, 1]
ncol = size(m, 2)
@inbounds for j in 2:ncol
for i in axes(m, 1)
maxi[i] = maxi[i] < m[i, j] ? m[i, j] : maxi[i]
end
end

maxi .= 1 ./ maxi
@inbounds for j in axes(m, 2)
for i in axes(m, 1)
m[i, j] *= maxi[i]
end
end
end

@btime baa3!(\$m) # 34.706 μs (1 allocation: 7.94 KiB)
``````
3 Likes

Try changing `? :` to `ifelse`

Wow! I’m shocked at that performance difference. There was definitely going to be a penalty for iterating on a DataFrame but… wow.

Something is going wrong with that `maximum` broadcast.

``````julia> foo2(df) = df ./ [maximum(row) for row in eachrow(df)]
foo2 (generic function with 1 method)

julia> @btime foo2(\$df)
29.195 ms (554947 allocations: 10.81 MiB)
``````

That being said, it’s a lot better with Tables.rows instead of eachrow.

``````julia> foo2(df) = df ./ [maximum(row) for row in Tables.rows(df)]
foo2 (generic function with 1 method)

julia> @btime foo2(\$df)
12.237 ms (9063 allocations: 985.81 KiB)
``````