I have not figured out why the Julia and Python implementations yield different results when performing this “simple” linear regression with the following data. While sklearn is consistent with Excel, GLM fails to perform the regression and returns a zero value for the slope.
- Julia
using DataFrames, GLM
x = [6.20E-13, 5.83E-11, 1.06E-10, 1.70E-10, 2.64E-10, 3.79E-10, 4.51E-10,
6.25E-10, 6.92E-10, 5.45E-10, 8.22E-10, 1.11E-09, 1.41E-09]
y = [0, 0.009836193, 0.017599565, 0.025362938, 0.040889682, 0.056416427, 0.071943172,
0.087469917, 0.102996662, 0.098084522, 0.137400303, 0.175161579, 0.214743511]
data = DataFrame(x=x, y=y)
model = lm(@formula(y ~ x), data)
println(model)
- Python
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([6.20E-13, 5.83E-11, 1.06E-10, 1.70E-10, 2.64E-10, 3.79E-10, 4.51E-10,
6.25E-10, 6.92E-10, 5.45E-10, 8.22E-10, 1.11E-09, 1.41E-09]).reshape(-1, 1)
y = np.array([0, 0.009836193, 0.017599565, 0.025362938, 0.040889682, 0.056416427, 0.071943172,
0.087469917, 0.102996662, 0.098084522, 0.137400303, 0.175161579, 0.214743511])
model = LinearRegression()
model.fit(x, y)
print(model.coef_)
print(model.intercept_)
Any insights are appreciated!
Note that GLM yields a non-zero slope when the axes are reversed and the results are consistent when compared with Python (or Excel).