# Central Limit Theorem, the heart of inferential statistics!

In one of my previous post I’ve attached you the following image: https://datasciencecgp.files.wordpress.com/2015/01/roadtodatascientist1.png , in this image we have an interesting roadmap to follow to be an Horizontal Data Scientist.

Then the following station will be do a deep insight in the heart of inferential statistics the Central Limit Theorem.

The central limit theorem has an interesting history. The first version of this theorem was postulated by the French-born mathematician Abraham de Moivre who, in a remarkable article published in 1733, used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This finding was far ahead of its time, and was nearly forgotten until the famous French mathematician Pierre-Simon Laplace rescued it from obscurity in his monumental work Théorie Analytique des Probabilités, which was published in 1812. Laplace expanded De Moivre’s finding by approximating the binomial distribution with the normal distribution. But as with De Moivre, Laplace’s finding received little attention in his own time. It was not until the nineteenth century was at an end that the importance of the central limit theorem was discerned, when, in 1901, Russian mathematician Aleksandr Lyapunov defined it in general terms and proved precisely how it worked mathematically. Nowadays, the central limit theorem is considered to be the unofficial sovereign of probability theory. (Source: Wikipedia)

Turning back to reality, the Central Limit Theorem tells us that, for a reasonable size n, the sampling distribution (the distribution of all the means of all the possible samples of size n) is approximated by a Normal curve whose mean is mu, the mean of the population, and whose standard deviation is the standard deviation of the population divided by the square root of the sample size, n.
The cool part about the central limit theorem is that the sampling distribution of the means is also normally distributed even if the population is not.

As an R lover let’s see our R example: consider a population that consists of the numbers from 1 to 500. It follows that the population size, N, is 500, and the elements of the population are N1=1, N2=2, … N500=500. This population has a mean = 1 and a standard deviation = 1.

Let’s see what happens when we construct sampling distributions of various sample sizes from a population. For a given sample size, n, the first interesting question is how many samples there are of size n we can take from the population. Before we can answer that we have to answer the question: are we sampling with or without replacement? For most practical sampling situations, we sample without replacement.
With replacement: When we sample with replacement, we draw an element from the population, record its value, and then replace it. Therefore, the same element might be drawn into a sample more than once.

Without replacement: In most practical sampling problems we sample without replacement. That is, we draw one element from the population, record it, then draw another. It is not possible for the same element to be drawn more than once.

Turning back to our R example, we will use the R function EXP which defines the exponential distribution, a one parameter distribution for a `gamlss.family` object to be used in GAMLSS fitting using the function `gamlss()`. The `mu` parameter represents the mean of the distribution. The functions `dEXP`, `pEXP`, `qEXP` and `rEXP` define the density, distribution function, quantile function and random generation for the specific parameterization of the exponential distribution defined by function EXP.

Let’s use the rexp which is used to draw random numbers from a distribution, we’ll draw 1000 numbers at random from the exponential distribution having mean 1 and standard deviation 1.

Now we will repeatedly sample from the exponential distribution. Each sample will select five random numbers from the exponential distribution having mean and standard deviation equal to 1. We will then find the mean of the unique element in our sample. We will repeat this experiment 1000 times :

vectormeans=rep(0,1000)
for (i in 1:1000) { vectormeans[i]=mean(rexp(n1,rate=1)) }
hist(vectormeans,prob=TRUE,breaks=12, main=”Exponential Distribution with sample n=1″)
lines(density(vectormeans))

Now repite the experiment increasing the size from 1 to 500, and we have the following results:

Then we can see an incredible evolution from Exponential to Normal distribution, Central Limit Theorem demonstrated! Take a look of the 500 sample size, amazing!

But what kind of application do we have with this Theorem. As a healthcare lover just think in a sampling distribution of serum cholesterol, you can do a sample of x individuals for measuring their cholesterol levels and from this info you want to know different kind of probabilities to find an interesting pattern.

Calculating probabilities using the central limit theorem is quite similar to calculating them from the normal distribution:

1. Calculate the standard error.
2. Draw a picture of the normal approximation to the sampling distribution and shade in the appropriate probability.
3. Convert to standard units: z = (x – mu) / SE, where mu is the population mean.
4. Determine the area under the normal curve.

We can also use the central limit theorem to approximate percentiles of the sampling distribution or how large would the sample size need to be in order to insure a 95% probability that the sample average will be within x mg/dl of the population mean.

Finally I recommend you to take a loof at this site, there you’ll find the bean machine, also known as the quincunx or Galton box, is a device invented by Sir Francis Galton to demonstrate the Central Limit Theorem, in particular that the Normal distribution is approximated from the Binomial distribution, or properly speaking, de Moivre–Laplace theorem, enjoy it!