Sunday, November 1, 2015

Inference on the slope

Inference on the slope

In this post I will explore the concept of inference on the slope. When you fit a model you get a confidence interval for the slope. This is explained in this video.

In this post I will simulate this to get a better understanding of the theory.

set.seed(300)
n=100
x <- c(1:n) 
beta0 <- 3 # intercept
beta1 <- 1.8 # slope
y <- beta0 +  beta1 * x + rnorm(n,mean = 0,sd = 20) # sd of error is 6
fit <- lm(y ~ x)
plot(x,y)
abline(fit)

f = summary(fit)
e = resid(fit)

#slope
f$coef[2,1]
## [1] 1.753188
#standard error of the slope
f$coef[2,2]
## [1] 0.06440277
#standard error of the slope, calculated manually
(sqrt(sum(e^2)/(n-2))) / sqrt(((sum(x^2)) - sum(x)^2/100))
## [1] 0.06440277

What if we fit a model 1000 times

library(ggplot2)

ss = data.frame(1,1)
reeks <- c(1:1000)
for (i in reeks) {
  y <- beta0 +  beta1 * x + rnorm(n,mean = 0,sd = 20) # sd of error is 6
  fit <- lm(y ~ x)
  ss[i,1] = summary(fit)$coef[2,1]
  ss[i,2] = summary(fit)$coef[1,1]
  }

colnames(ss) = c('slopes', 'intercept')
ds = data.frame(cbind(x,y)); colnames(ds) = c('x','y')

g = ggplot(ds, aes(x,y)) + geom_point()
g + geom_abline(data=ss[1:150,], aes(slope=slopes, intercept=intercept, alpha="0.005"))

ggplot(ss, aes(slopes)) +  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mean(ss[,1])
## [1] 1.800199
sd(ss[,1])
## [1] 0.07030048