Is there a good tool for finding the best curve to fit my data?

#42
This is what I get, 1) including the zeroes (for which I had to give an error of 1), 2) without the zeroes, and in each case with fixed parameter N (should be the total of wins = 464) and with free N. BTW, this is ordinary least squares regression with iteration.
 

Attachments

Last edited:
#45
@Koen Van de moortel , that is the density for the normal distribution. Do you think that it fits the data well?

There are several data points in the upper tail where your density says that it would be essentially zero probability of having an observation. It did not fit the data well on post no 7. It did not fit well with truncated normal. I tried (the skewed distribution of) the gamma, the lognormal and the Weibull distribution. They did not fit well. The best fit, in my simple view, is the empirical distribution.

But what do I know? You are a physicist and I am just a statistician.
 
#47
@Koen Van de moortel , that is the density for the normal distribution. Do you think that it fits the data well?

There are several data points in the upper tail where your density says that it would be essentially zero probability of having an observation. It did not fit the data well on post no 7. It did not fit well with truncated normal. I tried (the skewed distribution of) the gamma, the lognormal and the Weibull distribution. They did not fit well. The best fit, in my simple view, is the empirical distribution.

But what do I know? You are a physicist and I am just a statistician.
All I can say is that it fits a lot obviously better than the author's graph. I just wondered how that could be so strange, so I tried it myself. I'm not claiming to know if this has to be a normal distribution; that will not be an easy thing to calculate I guess, and as above mentioned, it might depend on the players tactique and skills.
And what on earth is "the" empirical distribution? One that fits all?

And BTW there is no need to be cynical here; I hate cynicism. You have no clue about my experience. I also don't like it when people hide their real names. I like to know who I'm talking to.
 
#48
@Koen Van de moortel Maybe I misunderstood what you had in post #42 and how it estimates the distribution. Maybe you can explain that. Is it supposed to be a mixture distribution? Do you know what I mean by that? Why don't you take the opportunity to explain what your estimator is about. How is it different to Total least squares (TLS)?

And what on earth is "the" empirical distribution?
Seriously? Have a look at this.


You have no clue about my experience.
Now I have a clue.
 
#49
@Koen Van de moortel Maybe I misunderstood what you had in post #42 and how it estimates the distribution. Maybe you can explain that. Is it supposed to be a mixture distribution? Do you know what I mean by that? Why don't you take the opportunity to explain what your estimator is about. How is it different to Total least squares (TLS)?


Seriously? Have a look at this.



Now I have a clue.
You said "THE" empirical distribution. I can only imagine AN empirical distribution. And that doesn't explain anything about the underlying mechanism, just fitting a polynome doesn't explain anything. Seriously.
And my graphs in #42 are just normal OLS fittings of the given data with the Gauss distribution.
 

Dason

Ambassador to the humans
#50
You said "THE" empirical distribution. I can only imagine AN empirical distribution. And that doesn't explain anything about the underlying mechanism, just fitting a polynome doesn't explain anything. Seriously.
And my graphs in #42 are just normal OLS fittings of the given data with the Gauss distribution.
For a sample of data there is a coordinator empirical distribution. That would be "the" empirical distribution beyond discussed. It's the relevant empirical distribution.

How much stats training do you have by the way?
 
#51
How much stats training do you have by the way?
I don't know all the inventions of the number crunchers who are - in my modest opinion - making things often way too complicated, but I did get the courses we needed in physics, about probability, distributions, error propagation, least squares regression, numerical analysis, etc. And I do have experience with measuring methodology!
What surprises me on all these statistics forums is that so many people seem to have no clue what they are doing; they have learned hundreds of tricks and techniques, but they don't know what to do with them. I see questions all the time like "Should I use this or that test? Is this significant, yes or no? There is something wrong with my data, should I do this or that transformation? Should I use linear or nonlinear regression?" etc. etc., as if they are running like a chicken without a head. They just see the numbers, believe in them like they are the holy gospel, and they forget to ask the essential questions: "What is the mechanism behind the numbers? How do we expect - using plain common sense logical thinking - x to influence y?". Nobody seems to care about measurement precisions, about errors caused by transformations, about the logical validity of their model (e.g. does it make sense to extrapolate it?); they just believe what comes out of their software without having an idea about the precision of the parameters it produces. They don't question the techniques used, etc. I've seen the weirdest things, like this statistics professor who does linear regression of "miles per gallon" vs "car weight", seriously (
), or all these people who still use the BMI (m/h²) while it is obvious from a physical (and experimental) perspective one should use CI (m/h³), just because their professor told them to do so, etc.
That's why I wrote my FittingKVdm software: it is interactive, you see the iteration working and you get a good intuitive feeling about how the parameters change your model; you see how measurement errors influence the confidence limits, and most important: for invertible model functions, it offers a better algorithm (multidirectional regression) which is not implemented in any other program I know. You can read about my latest findings here:
https://www.researchgate.net/profile/Koen-Van-De-Moortel/research
 
#54
Is there a particular reason you think there should be an explicit parameterized distribution for your situation? And what do you plan on doing once you have such a formula?
Hmmm... I see that I never replied to this.

(a) It's performance data, which, in my experience, is variable about a mean. Since many biological processes do follow a normal distribution, it seemed like this one may well, too.

(b) It's partly curiosity, but I also planned to use it to assess whether additional data, either from a particular player or in the aggregate, tended to come closer to a normal distribution. I would calculate the least squares or chi-square at various points to assess this.

This is not an ultra serious piece of work that needs to meet a peer-review. It's partly curiosity and partly educational for me.
 

hlsmith

Less is more. Stay pure. Stay poor.
#55
Yes knowing the data generating process (or function) is very important. That is why everyone should write out the nonparametric structural causal model before beginning to model.

P.S., Also it is typically fun to play around with using genetic algorithms to explore function spaces. The traditional algorithms are fine, but if one gets interested enough - they have a coupled them with neural networks and software like AI.Feynman has been able to recover all of the natural data creation structures for ~99 physics formulas used in the Feynman lecture series. However - if your data does not include all of the possible data generating processes or if there is too much epistemological error - it won't get things quite right.
 
Last edited: