Fitting dose response curves


#1

Dear community!

I have been trying to find a package for fitting sigmoidal dose-response data for the last two days. However, I completely failed.

I meanwhile found a way to solve the problem using the nonlinear fitting algorithm of the CurveFit.jl package. However, that was also hard, because most of the documentation is hard to understand and it took me several examples from other forums to actually convert my formula into something the CurveFit package could digest.

There are still some problems, though.

  1. The fitting via the CurveFit-package is rather slow.
  2. The algorithm heavily depends on my first guess of the attribute values, especially the center (EC50) of the curve. If I guess too far away, the algorithm will not converge and only produce NaN-values.

So these are my questions:

  1. Is there a package for dose-response-curve fitting out there, that is not officially listed? If so, can somebody please point me to it?
  2. (This might be more of a developmental question:) Is there any chance to include this fitting function into Julia or the CurveFit-package?

Thank you in advance.
Best

Moses


#2

It would be helpful to post a minimum working example of your problem. Without that, my guess would be that you run you run into numerical problems when feeding extreme values into the sigmoid function (or similar) and get NaNs; these should be fixed first, then the general approach should work.


#3

Hello,

here’s a part of my code (hope it works). It is working, but I need to guess the log(EC50)-value rather close to the mark.

using Plots
using CurveFit
plotlyjs() # I am using Juno, so I plot to the Plot pane

x     = [linspace(-8, -4, 9);] # sample x-values
y     = [0, 0, 0.05, 0.2, 0.5, 0.8, 0.95, 1, 1] # sample y-values
xbase = collect(linspace(minimum(x), maximum(x), 100)) # x-values for fitted curve

# function to calculate residuals
fun(x,p) = x[2] - (p[1] + ((p[2] - p[1]) / (1 + 10^((p[3] - x[1]) * p[4]))))

#= 
x[2] is DoseResp(x) (see below)
x[1] is x
p[1] is A1
p[2] is A2
p[3] is x0
p[4] is p

the actual function is:
DoseResp(x) = A1 + ((A2 - A1) / (1 + 10^((x0 - x) * p)))
A1: bottom asymptote
A2: top asymptote
x0: center of the curve (this will be the log(EC50) in the end)
p: Hill slope
=#

# initial guess for Parameters
#= These values are crucial, especially the third one (p[3]). The final result will be -6.
The Fitting works if the guess is close enough to that value. But if the initial guess is more than
0.5 away from it, it only results in NaN for the parameters. =#
p0 = [0.2, 1, -5.9, 0.9]

# fitting
xy = [x y] # concatenate x and y
coefs, converged, iter = CurveFit.nonlinear_fit(xy, fun, p0, 1e-3, 20000)

# generating fitted curve
yfit = [coefs[1] + ((coefs[2] - coefs[1]) / (1 + 10^((coefs[3] - x) * coefs[4]))) for x in xbase] 

# plotting
plot(x, y, seriestype=:scatter)
plot!(xbase, yfit)

#4

Please quote your code (you can edit your previous post) and use standard indentation.


#5

Hello,

obviously some moderator had already added the citation for the code. Thank you!

I must admit that I do not know about standard indentations, because I am not a programmer or IT person (I am a pharmacist who picked up some programming skills along the way).
Could you please point me at some “rules” about indentions?

Thanks ahead.
Best

Moses


#6

I don’t see anything obviously wrong with your code, but I’m not familiar with the CurveFit package, so i can’t tell if you’re using it incorrectly.

However, I was able to implement your problem pretty easily using Optim.jl, which is a general-purpose optimization tool in Julia. Optim isn’t specifically meant for curve fitting, but it’s easy to make it work. Here’s my code:

x     = linspace(-8, -4, 9) # sample x-values
y     = [0, 0, 0.05, 0.2, 0.5, 0.8, 0.95, 1, 1] # sample y-values

# Define a helper function for the dose response
dose_response(x, p) =  p[1] + ((p[2] - p[1]) / (1 + 10^((p[3] - x) * p[4])))

# Create an anonymous function called `cost` which 
# does the following: 
#   * given parameter values p
#   * for each sample in zip(x, y): 
#      * compute the expected dose response of x given p
#      * take the squared absolute value of the error between
#        that expected response and the measured response y
#   * sum those squared errors to compute the total cost
# 
# Optim.jl will try to find the values of p that minimize
# that cost. 
cost = let samples = zip(x, y)
    function (p) 
        sum(samples) do sample
            x, y = sample
            abs2(y - dose_response(x, p))
        end
    end
end

using Optim

# Construct a wrapper around our cost function. This wrapper
# knows how to compute its own derivatives with respect to the
# elements of p using automatic differentiation. 
f = OnceDifferentiable(cost, zeros(4), autodiff=:forward)

# Create a totally random initial guess for p:
p0 = rand(4)

# Optimize the parameters p starting at that random guess
# using the LBFGS algorithm:
result = Optim.optimize(f, p0, LBFGS())

# Extract the optimal parameters from the result:
pstar = Optim.minimizer(result)

# Plot the result
plt = plot(x, y, marker=:dot, line=nothing)
xbase = collect(linspace(minimum(x), maximum(x), 100)) # x-values for fitted curve
plot!(plt, xbase, dose_response.(xbase, (pstar,)))
plt

Using the above code, I can reliably find what look (to my untrained eye) to be the correct parameters. Moreover, it works even if I start with a random p0, and the optimization takes less than 10 milliseconds on my laptop.

So, given that this is definitely a solvable problem, I would venture that something is going wrong in CurveFit or in the way you’re using it. It’s hard to say more about that without any knowledge of the package, though.

Hope this helps!


#7

@rdeits beat me to it but here’s another solution that also uses Optim:

x = [linspace(-8,-4,9);] # sample x-values
y = [0,0,0.05,0.2,0.5,0.8,0.95,1,1] # sample y-values
xbase = collect(linspace(minimum(x),maximum(x),100)) # x-values for fitted curve

DoseResp(x,A1,A2,x0,p) = A1+((A2-A1)/(1+10^((x0-x)*p)))

# error function, square difference between data and model
errfun(x,y,p) = sum(abs2(y[i]-DoseResp(x[i],p...)) for i=1:length(x))

# fit 10 times the model with random initial conditions
mfit = [optimize(p->errfun(x,y,p), rand(4), NelderMead()) for repeat = 1:10]

# keep only the best one (smallest error)
mfit = mfit[indmin(Optim.minimum.(mfit))]

pmin = Optim.minimizer(mfit)
yfit = DoseResp.(xbase,pmin...)

plot(x,y,seriestype=:scatter)
plot!(xbase,yfit)

#8

@rdeits and @jonathanBieler
Thank you for your suggestions. I will test them on Monday, when I am back at work and have access to my computer.
I’ll let you know about the results then.


#9

@rdeits and @jonathanBieler
Thank you again. I tested both solutions and compared them against commercial software. The results are identical, minor differences coming from rounding. Both solutions are also much faster than my approach using CurveFit.jl.


#10

Optim developer here, good to see it finds the same optimum as your suggested by the software you trust (the commercial software).

“We” (the github organization Optim lives in) also have an LsqFit.jl package. It is currently in active development, but it’s probably meant to be something very close to what you were looking for. @iewaij might be able to quickly whip together an example using LsqFit (most recently tagged version or master, either one will be cool).

Thanks for writing this example up! It should be possible to avoid creating the OnceDifferentiable-instance (as this seems to confuse many users, and it doesn’t seem to be reused here)

optimize(f, p0, LBFGS(); autodiff = :forward)

#11

@pkofod It would be nice, if somebody could provide an example with LsqFit.jl.
I had already tried to use LsqFit, but failed. I get an error that looks like some parameter type is not correct:

x0 = [linspace(-8,-4,9);] # sample x values
y0 = [0, 0, 0.05, 0.2, 0.5, 0.8, 0.95, 1, 1] # sample y values
xbase = collect(linspace(minimum(x), maximum(x), 100)) # x values for fitted curve

DoseResp(x0, p) = p[1] + ((p[2] - p[1]) / (1 + 10^((p[3] - x) * p[4])))

p0 = [0.2, 1, -5.8, 0.9] # initial guess of the parameters, close to final values

# fitting
fit = LsqFit.curve_fit(DoseResp, x0, y0, p0)

# generate fitted curve
yfit=[fit.param[1] + ((fit.param[2] - fit.param[1]) / (1 + 10^((fit.param[3] - x) * fit.param[4]))) for x in xbase]

However, the line fit = LsqFit.curve_fit(DoseResp, x0, y0, p0) throws the following error:

MethodError: no method matching ^(::Int64, ::Array{Float64,1})
Closest candidates are:
  ^(::Integer, !Matched::Bool) at bool.jl:93
  ^(::Number, !Matched::Rational) at rational.jl:423
  ^(::Integer, !Matched::BigInt) at gmp.jl:483
  ...
in curve_fit at LsqFit\src\curve_fit.jl:43
in #curve_fit#4 at LsqFit\src\curve_fit.jl:45
in #lmfit#3 at LsqFit\src\curve_fit.jl:38
in #lmfit#2 at LsqFit\src\curve_fit.jl:13
in #levenberg_marquardt#1 at LsqFit\src\levenberg_marquardt.jl:62

I do not understand the Int64 part of the message, because all parameters and variables are Float64.

One more question: Does Optim.jl by any chance provide the total sum of squares (TSS)? I found the residual sum of squares being represented by Optim.mimimum. TSS would help to calculate R^2 (frequently used in Biology/Biochemistry).

Best
Moses


#12

How would this have to be implemented then? If I use the line with the f parameter, I get

UndefVarError: f not defined

If I exchange f for the cost function of @rdeits’ initial post, I get the following error:

MethodError: no method matching optimize(::##1#3{Base.Iterators.Zip2{Array{Float64,1},Array{Float64,1}}}, ::Array{Float64,1}, ::Optim.LBFGS{Void,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64},Optim.##73#75}; autodiff=:forward)
Closest candidates are:
  optimize(::Any, ::AbstractArray, ::Optim.Optimizer) at C:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\JuliaPro\pkgs-0.6.2.1\v0.6\Optim\src\multivariate/optimize/interface.jl:74 got unsupported keyword argument "autodiff"
  optimize(::Any, ::AbstractArray, ::Optim.Optimizer, !Matched::Optim.Options) at C:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\JuliaPro\pkgs-0.6.2.1\v0.6\Optim\src\multivariate/optimize/interface.jl:74 got unsupported keyword argument "autodiff"
  optimize(::Any, ::Any, ::Any, !Matched::AbstractArray; kwargs...) at C:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\JuliaPro\pkgs-0.6.2.1\v0.6\Optim\src\multivariate/optimize/interface.jl:56
  ...
Console

#13

What version of Optim is in your version of juliapro?


#14

Hi Moses,

The error using LsqFit.jl is caused by the model definition, you have to add a dot(.) for element-wise calculation (yeah bit tricky!). So the function should be:

DoseResp(x, p) = p[1] + ((p[2] - p[1]) ./ (1 .+ 10.^((p[3] .- x) .* p[4])))

or just use @. macro in the example below:

using LsqFit

@. DoseResp(x, p) = p[1] + ((p[2] - p[1]) / (1 + 10^((p[3] - x) * p[4])))

# example data
xdata = linspace(-8,-4,9)
ydata = DoseResp(xdata, [0.2, 1.0, -5.8, 0.9]) + 0.01*randn(length(xdata))
initial_p = [0.0, 1.0, -6.0, 1.0]

# fit data
fit = curve_fit(DoseResp, xdata, ydata, initial_p)
fit.param

There’s a r2() and other tools function in LsqFit.jl, it haven’t been merged unfortunately because we are making a lot of changes on LsqFit. If you don’t mind, you can check out to my curve_fit_tools branch or update the source code in your Julia library from here.

Pkg.rm("LsqFit") # remove the current version
Pkg.clone("https://github.com/iewaij/LsqFit.jl.git")
Pkg.checkout("LsqFit", "curve-fit-tools")

The routine is the same but with more functionalities:

# fit data
fit = curve_fit(DoseResp, xdata, ydata, initial_p)

# output
Results of Least Squares Fitting:
* Algorithm: Levenberg-Marquardt
* Iterations: 8
* Converged: true
* Estimated Parameters: [0.178863, 1.00522, -5.82878, 0.830257]
* Sample Size: 9
* Degrees of Freedom: 5
* Weights: Float64[]
* Sum of Squared Errors: 0.0007
* Mean Squared Errors: 0.0001
* R²: 0.9991
* Adjusted R²: 0.9983

Variance Inferences:
k   value std error     95% conf int
1  0.1789    0.0116   (0.149, 0.209)
2  1.0052    0.0145   (0.968, 1.043)
3 -5.8288    0.0321 (-5.911, -5.746)
4  0.8303    0.0519   (0.697, 0.964)

# goodness of fit
tss = sst(fit)
r_square = r2(fit)

#15

Thanks!


#16

@pkofod: I am using JuliaPro 0.62 with Optim 0.11.0.


#17

@iewaij Thank you very much! The curve_fit_tools would basically cover everything I am looking for. Just roughly guessed: When do you expect this to be merged with LsqFit.jl?

Where/How do I define the iterations? I am getting an error from the fit = curve_fit(DoseResp, xdata, ydata, initial_p) command

UndefVarError: iterations not defined

I tried to use it with the keyword iterations but that produced another error:

fit = curve_fit(DoseResp, xdata, ydata, initial_p, iterations = 200)

MethodError: no method matching levenberg_marquardt(::LsqFit.#f#5{#DoseResp,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},Array{Float64,1}}, ::Calculus.#g#5{LsqFit.#f#5{#DoseResp,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},Array{Float64,1}},Symbol}, ::Array{Float64,1}; iterations=200)

I removed my installed LsqFit and checked out your tools as you suggested. Should I rather update the code of the library?


#18

Oh sorry I was experimenting some modifications on OptimBase. The 21st line of curve_fit.jl should be:

return LsqFitResult(n, dof, p, ydata, f(p), g(p), wt, summary(results), OptimBase.iterations(results), converged(results))

I added the prefix in my repo, maybe try again?


#19

The current master branch of LsqFit.jl is for Julia v0.7/1.0, there’s also a huge reconstruction underway, at least several weeks are expected for code review. But if there’re features needed and could be released early, maybe we should push these features first @pkofod?


#20

@Moses I’ve been interested in developing a dose-response package for Julia. I only saw this thread on Monday, when a lot of your questions have been answered, but I ran into some of the same issues you had and have tried different approaches. I’m not a Julia developer, but my believe is that a dedicated package can be very useful in many fields, with good tools for different models and clear estimates of the different parameters of the models. I know we probably have enough tools already with Optim, JuMP, LsqFit and others, but many casual users would benefit from a package specifically designed for dose response analysis. What do you think?