Sunday, October 4, 2015

Samples size..

Estimate the mean and sample size

In this post I will estimate the mean with different sample sizes. This is to get a better understanding of sample size 30, the magic number to distinguish between small and large samples sizes.

require(ggplot2)
## Loading required package: ggplot2
#number of tests
tests <- seq(1, 1000, by=1)
#sample sizes to test with
samplesize <- seq(1, 60, by=1)


#for each sample size 
results <-  sapply(samplesize, function(p) {  
  #we generate 1000 samples with size p, take the mean of each sample, calculate
  #the sd of means.
  sd(sapply(tests, function(i) { mean(rnorm(p,mean=80,sd=2))}))
}
)

#prep results
results = cbind(1:60, results)
results = data.frame(results)
colnames(results) = c("sample_size", 'sd')
ggplot(results, aes(sample_size, sd)) + geom_point() + 
  ggtitle("Estimate mean with increasing sample size")

As you can see the estimate is already very accurate with a sample size of 20. Other type of estimates might require a larger sample size though!