This isn’t really a question about Julia so maybe this should be tagged as Offtopic, but I’m working with a data set that has several series of data that all closely follow a log normal distribution, but not quite. The plot below shows three of these series: the dots are a plot of an empirical CDF of each series (
Distributions.ecdf) while the lines are the plotted CDFs from fitted log normal distributions (
When sampling from the fitted distributions, the mean/median values of the samples are consistently greater than they should be because of the consistent error in the fit (as seen above).
My questions are:
Is it better/easier to attempt to transform the data so that they better fit the distribution, or is it better to try to tweak the cumulative distribution function so that it fits the data?
If implementing a custom distribution is the recommended approach, can anyone provide some guidance as to how I would go about tweaking the log normal CDF? I’m at the boundaries of my stats/math knowledge here and I don’t know where to start really. I have more series than the 3 shown above and the error in the fit looks the same for all of them - the slope of the curve needs to be a bit flatter at lower values and then there is a point at which the slope of the tails needs to be steeper.
The reason that I want to do this is that I have additional series of similar data that don’t have nearly as many observations so I’d like to be able to run simulations/make predictions about those data sets knowing that they will follow this same shape as more observations become available (basically, I need to be able to predict what the future observations might look like).