Domain error when running user-defined function, but not when code is run outside function

I am trying to write a function that will calculate deviance residuals for a general linear model (specified using GLM.jl). The function extracts predictions from the fitted model (pred), a vector containing the response variable (response_y) and the error distribution specified for the fitted model (distType). I then use a conditional if-elseif-else statement to calculate the deviance residuals depending on the output of distType.

# Write function 
using Test
using Distributions
using Random
using DataFrames
using StatsBase
using GLM
using Gadfly
using Statistics
using Compose

# Define function to calculate deviance residuals 
function calcDevResids(;model, df)
    
    ###################################################
    # Section #1: Extract response vector and model residuals 
    ###################################################

    # Extract fitted/predicted values from model object
    pred = GLM.predict(model)

    # Extract vector containing response variable 
    response_y = df.y

    ###################################################
    # Section #2: Calculate deviance residuals
    #             - requires conditional statements to 
    #               loop through different data distributions
    ###################################################

    # Extract statistical distribution specified in model 
    distType = typeof(model).parameters[1].parameters[1].parameters[2]

    # Use if-elseif-else statements to calculate deviance residuals depending
    # on which errors were specified during model specification 

        # If model was specified with Poisson errors
        if distType == "Poisson"
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Poisson(), response_y, pred))
        
        # If model was specified with Gamma errors
        elseif distType == "Gamma"
            return sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Gamma(), response_y, pred))
        
        # If model was specified with Bernoulli errors 
        elseif distType == "Bernoulli"
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Bernoulli(), response_y, pred))
        
        # If model was specified with Binomial errors 
        else distType == "Binomial"
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Binomial(), response_y, pred))
        
        end 
end

And here is a basic model to test the function:

# Set reproducible seed
Random.seed!(123)

# Simulate n = 100 data points from a Poisson distribution (lambda = 7)
y = rand(Poisson(7), 100)

# Assign grouping variables (5 groups of 20)
x = repeat([1, 2, 3, 4, 5],
            inner = 20,
            outer = 1)

df = hcat(y, x)
df = DataFrame(df)
colnames = ["y","x"]
rename!(df, Symbol.(colnames))

# Run model
modPoisson = fit(GeneralizedLinearModel,
                    @formula(y ~ 1 + x),
                    df,
                    Poisson(),
                    GLM.LogLink())

However, I keep getting the follow error when I run the function:

ERROR: DomainError with -10.0: log1p will only return a complex result if called with a complex argument. Try log1p(Complex(x)).

But, when I run the code outside of the function call (see below), I get the expected output and no error.

pred = GLM.predict(modPoisson)
response_y = df.y
dv = sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Poisson(), response_y, pred))

julia> dv = sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Poisson(), response_y, pred)) 100-element Array{Float64,1}: 0.9061633419878498 1.5493622873332258 0.9061633419878498 0.217165948855438 -0.9383362226732821 -0.9383362226732821 ⋮ 0.07437940612764843 0.07437940612764843 0.8013566375722013 0.07437940612764843 1.4748443746174682 -0.7265187364238832

I have tried adding complex() and Complex() to the terms in the code to create dv, with no success.

Any pointers or advice would be much appreciated. I am a Julia-newbie (< 3 months experience), so I am sure I am missing something quite simple.

Try to add @show distType after its definition in calcDevResids. It prints the following:

distType = Distributions.Poisson{Float64}

so the test if distType == "Poisson" will fail: distType is not the string “Poisson” but the type Distributions.Poisson{Float64}. Instead, you can write if distType <: Poisson.

A couple of remarks:

  • The lone return looks weird.
  • In else distType == "Binomial" the comparison is useless since there’s no if.
  • Instead of
    df = hcat(y, x)
    df = DataFrame(df)
    colnames = ["y","x"]
    rename!(df, Symbol.(colnames))
    
    you can write
    DataFrame(; y, x)
    
1 Like

Instead of

distType = typeof(model).parameters[1].parameters[1].parameters[2]

I would prefer the more robust

resp_type(m :: GeneralizedLinearModel{T1, T2})  where {T1, T2} = T1;
distr_type(::Type{GLM.GlmResp{T1, T2, T3}}) where {T1,T2,T3} = T2;
distType2 = distr_type(resp_type(model.model));

Then it would be nice to define

deviance_residual(D, response_y, pred) = 
	sign.(response_y .- pred) .* sqrt.(GLM.devresid.(D, response_y, pred))

If one could then construct the distribution from its type somehow (from the return arg of distr_type above), the entire if ... else loop could be replaced with

distType = distr_type(resp_type(model.model));
# Construct the distribution object - how?
D = ??
return deviance_residual(D, response_y, pred)

However, I am not sure how to call the constructor Poisson() from the data type Poisson{Float64} that distr_type returns.

@hendri54 Thanks for your suggestions - I am going to need some time to work through your suggestions and wrap my head around the code.

@sudete. This seemed to solve the problem - Thanks very much. However, it created another problem later on in the workflow. When I pass the example model to the updated function, I get a new error message. Can you maybe advise as to what I am missing? Full updated code below:

##########################
# WRITE UPDATED FUNCTION
##########################

using Test
using Distributions
using Random
using DataFrames
using StatsBase
using GLM
using Statistics
using Compose

# Define function to calculate deviance residuals 
function calcDevResids(;model, df)
    
    ###################################################
    # Section #1: Extract response vector and model residuals 
    ###################################################

    # Extract fitted/predicted values from model object
    pred = GLM.predict(model)

    # Extract vector containing response variable 
    response_y = df.y

    ###################################################
    # Section #2: Calculate deviance residuals
    #             - requires conditional statements to 
    #               loop through different data distributions
    ###################################################

    # Extract statistical distribution specified in model 
    distType = typeof(model).parameters[1].parameters[1].parameters[2]

    # Use if-elseif-else statements to calculate deviance residuals depending
    # on which errors were specified during model specification 

       # If model was specified with Poisson errors
        if distType <: Poisson
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Poisson(), response_y, pred))

        # If model was specified with negative binomial errors
        elseif distType <: NegativeBinomial
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.NegativeBinomial(), response_y, pred))
        
        # If model was specified with Gamma errors
        elseif distType <: Gamma
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Gamma(), response_y, pred))
        
        # If model was specified with Bernoulli errors 
        elseif distType <: Bernoulli
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Bernoulli(), response_y, pred))
        
        # If model was specified with Binomial errors 
        elseif distType <: Binomial
            sign.(response_y .- pred) .* sqrt.(GLM.devresid.(Distributions.Binomial(), response_y, pred))
        
        end 
end

###########################
# MAKE SOME EXAMPLE DATA
###########################

# Set reproducible seed
Random.seed!(123)

# Simulate n = 100 data points from a Poisson distribution (lambda = 7)
y = rand(Poisson(7), 100)

# Assign grouping variables (5 groups of 20)
x = repeat([1, 2, 3, 4, 5],
            inner = 20,
            outer = 1)

df = DataFrame(; y, x)

# Run model
modPoisson = fit(GeneralizedLinearModel,
                    @formula(y ~ 1 + x),
                    df,
                    Poisson(),
                    GLM.LogLink())

Applying the updated function returns the new error. I assume that I am not declaring the condition arguments in the if-elseif statements properly (i.e. the distType <: Poisson/Bernoulli/Binomial/Gamma parts will all return the same error)? I didn’t think I could/should declare the conditions passed to distType <: x?

##################################################
# REPRODUCE NEW ERROR WHEN APPLYING FUNCTION
##################################################

calcDevResids(model = modPoisson, df = df)

This returns the following error.

ERROR: UndefVarError: Poisson not defined

For the conditions, they look fine to me (assuming Gamma, etc. are the correct types for the respective cases. I would add a print statement in each if/else to check everything works as you expect.

The code actually runs without error for me (on Julia 1.6-rc3). The Poisson symbol should be defined since you use Distributions (even without that, it should be defined (reexported) by GLM). So maybe there’s something wrong in your session… Did you try executing this code in a fresh Julia session?

In any case it would be useful to have the full error message.

1 Like