Can we use logistic regression directly in a case control study?

Juan · May 18, 2023, 7:44am

Hello.

Say I have a very imbalanced dataset, so I decide to do a case-control study.

For example, we have 1 million healthy people and 300 people with cancer.
I take 300 people from each group and I want to fit a logistic model.

How do I need to adapt the model or modify the results to take into account that the data doesn’t come from a random sampling?
Or I can just use it as is because logistic models are using OR, and this is good for case-control problems?

Another option would be to get a random sample from the population and use different weights for ill and healthy people.

nilshg · May 18, 2023, 8:29am

I don’t think this is a question about logistic regression at all, but a question about causal inference more broadly - your issue is that you are comparing a “treatment” group (those with cancer) to a control group, but the treatment group is selected on observed outcome and it is therefore unlikely that potential outcomes are independent of treatment status, which is what you need to identify treatment effects.

This isn’t really a Julia question at all but about basics of causal inference so I would recommend you refer to the standard literature in the field, such as the Imbens/Rubin textbook.

PharmCat · June 6, 2023, 12:32am

Also you can look at propensity score matching techniques. Try to look at this repo.

Topic		Replies	Views
Do we need to use weights in imbalanced survival models? Statistics	0	160	May 17, 2023
Logistic regression for data with missing values Probabilistic Programming question , turing	1	88	November 22, 2024
How to sample from a logistic regression model New to Julia statistics	9	2161	February 6, 2019
Simulate data for logistic regression General Usage	3	1696	September 5, 2018
Class imbalance in the predictors Machine Learning	3	198	June 6, 2023

Can we use logistic regression directly in a case control study?

Related topics