Estimate the mean and sample size
Steve de Peijper
2015
In this post I will estimate the mean with different sample sizes. This is to get a better understanding of sample size 30, the magic number to distinguish between small and large samples sizes.
require(ggplot2)
## Loading required package: ggplot2
#number of tests
tests <- seq(1, 1000, by=1)
#sample sizes to test with
samplesize <- seq(1, 60, by=1)
#for each sample size
results <- sapply(samplesize, function(p) {
#we generate 1000 samples with size p, take the mean of each sample, calculate
#the sd of means.
sd(sapply(tests, function(i) { mean(rnorm(p,mean=80,sd=2))}))
}
)
#prep results
results = cbind(1:60, results)
results = data.frame(results)
colnames(results) = c("sample_size", 'sd')
ggplot(results, aes(sample_size, sd)) + geom_point() +
ggtitle("Estimate mean with increasing sample size")
As you can see the estimate is already very accurate with a sample size of 20. Other type of estimates might require a larger sample size though!