How to solve regression problem with no related data each other

RickandMortyforever · March 3, 2022, 11:30am

This is more of a problem solving question rather than programming. I’ll delete this post if it is not appropriate.

I am trying to solve a regression problem. And I looked at scatter plot that is frustrating me.
I can’t see any continuous independent variables which has positive or negative relationship with target variable “perf”.

Is there any models or algorithms that can help this situation?
Do I need to do feature transformation?

I would really appreciate with little advices.

juliohm · March 3, 2022, 11:59am

I would start with more basic questions and try to test hypothesis before attempting any regression model.

Formulate your hypothesis in a frequentist or Bayesian setting (Julia is awesome for that) and then you will have more evidence to guide your next steps. Keep in mind that it is not always possible to build a predictive model, and that these kinds of tests can really help you gain insight about the problem at hand.

RickandMortyforever · March 3, 2022, 12:03pm

Thanks for replying.

tlienart · March 3, 2022, 12:55pm

sometimes there’s just nothing in your data… and here it may be such a case. I agree with the advice to formulate hypotheses and work with that but eyeballing the plot, I’d be surprised if you found a strong regression model.

One thing you can do as well is bin your target variable i.e. instead of trying to predict pref you try to predict perf < t1 , perf >= t1 (possibly more classes but if you already get bad results with 2 classes it’s not a great sign).

If you have a strong sense that there should be an exploitable relationship, then I’d dig a bit deeper in the data to try to figure out whether there are sources of noise that you could eliminate.

Just a few thoughts though, good luck!

RickandMortyforever · March 3, 2022, 12:57pm

Thanks for the advice.

zxzkja · March 4, 2022, 1:21am

I agree with the general comments that you should form a hypothesis first, but it’s worth noting that this plot hides the density of points because they are overlapping. This is most evident in the sex-work plot which only has four points showing. However, there are many more points overlapping, so you can’t see which of the four corners is more common. If the top right and bottom left have more observations than the other two, then you’d have a positive correlation between sex and work.

One way to better see the density of points is to set alpha to a lower value, e.g. 0.3, so you can see where the points overlap. Another option is to look at a 2d histogram or density plot, e.g.
https://docs.juliaplots.org/latest/generated/gr/#gr-ref10

Here is the equivalent idea in R: https://www.r-graph-gallery.com/2d-density-plot-with-ggplot2.html

Topic		Replies	Views
Need help in solving my assignment New to Julia question	9	376	April 3, 2022
How can I use a linear model with NaN parameters due to missing train data? Modelling & Simulations statistics , flux , mlj , glm , linear-regression	0	405	July 15, 2023
[ANN] Linear Regression v0.7-alpha Package Announcements statistics , regression	18	2101	December 6, 2021
Regression with variables in array Statistics	2	457	December 30, 2022
Need some help on sampler error with simple linear regression Probabilistic programming	2	573	February 5, 2022

How to solve regression problem with no related data each other

Related topics