# which optimal bin width formula to choose for histogram

#### stony

##### New Member
For histogram construction, I learned from [Scott 79] that for Gaussian distributed samples, 3.49*sigma*N^(-1/3) is optimal as the bin width. However, for other more general distributions with more distorted shapes, such as a mixture of Gaussians, or with pdf shape of approximately generalized Gaussian type, is there any optimal bin width formula to use?

#### Daniel Dvorkin

##### New Member
There's no hard-and-fast answer to this one. The default in R is Sturges' Rule: k = 1 + log2(n), where k is the desired number of equally-spaced bins and n is the sample size, which works surprisingly well for a lot of samples but can also break badly in some cases. The Wikipedia page gives a list of alternatives:

http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width

FWIW, I've found Scott's rule (which is also an option in R's hist() function) to be best overall, but YMMV.

#### stony

##### New Member
Thanks a lot. I don't have access to the orginial paper on Scott's rule. I wonder if there is any analytical results on the accuracy of this or other methods, such as MSE of the estimated pdf?

#### Dason

Is there a reason you would prefer a histogram over an estimated density using something like a kernel density estimate?

#### stony

##### New Member
I am not familiar with kernal density estimation yet. Is KDE always better than histogram? If use KDE, without knowing the distribution, how do I know which kernal function to choose, and what bandwidth to use?

Thanks

#### Dason

The Gaussian or Epanechnikov kernels are typically used. The bandwidth question is similar to the optimal bin width question. It depends partially on what you think the underlying density is like but there are some algorithms that give nice properties asymptotically. What software are you using?

#### stony

##### New Member
Then, comparing histogram and KDE, is there any condition that can indicate which method is better to use?

I am using matlab. Is there any built-in function to determine bandwidth for KDE?

Does Gaussian as kernal mean decomposing the pdf as mixture of Gaussians?

#### Dason

Then, comparing histogram and KDE, is there any condition that can indicate which method is better to use?
I'm partially under the impression that KDE is almost always better. If you're trying to estimate a continuous density then it makes sense to me to use one of the default kernels that is used in KDE because they work pretty well.
I am using matlab. Is there any built-in function to determine bandwidth for KDE?
I don't know. Maybe?
Does Gaussian as kernal mean decomposing the pdf as mixture of Gaussians?
Well - I wouldn't say you're decomposing the pdf as a mixture of Gaussians because you don't have a density to decompose in the first place. You're estimating the density as a mixture of Gaussians yes. But with a histogram you're estimating the density as a mixture of... uniforms. I think in most cases it makes more sense to use something like a mixture of Gaussians (although there have been shown to be really nice properties associated with the Epanechnikov kernel as well.

#### stony

##### New Member
So that means histogram is a special case of kernal density estimation with uniform as the kernal function, right?

#### noetsi

##### Fortran must die
The only reason to use a histogram is that it is commonly utilized in industry and is intuitively obvious including to many who wont be interested (or understand)KDF.

#### Dason

Well not really because you aren't centering the uniform around the observed values. But it is attempting to do the same thing.

#### noetsi

##### Fortran must die
If it matters there are different rules built into software to determine the number of bins and width. I can find those.

#### stony

##### New Member
If it matters there are different rules built into software to determine the number of bins and width. I can find those.
I am trying to estimate pdf from a few hundreds of data points, or even less. For optimal bandwidth of KDE, is there any simple formula to use, like Scott's rule for histogram so that an online searching is not needed? Thanks for any hints.

#### stony

##### New Member
Well not really because you aren't centering the uniform around the observed values. But it is attempting to do the same thing.
Then, for KDE with uniform as kernal funciton, is there a good bandwidth formula that one could say in most cases perform no worse than histogram using Scott's rule?

#### noetsi

##### Fortran must die
I am trying to estimate pdf from a few hundreds of data points, or even less. For optimal bandwidth of KDE, is there any simple formula to use, like Scott's rule for histogram so that an online searching is not needed? Thanks for any hints.
Sorry I only know of histograms not KDF.