We want to fit a line (\(y = ax + b\)) to some data, for example
a <- runif(1) + 0.5
b <- runif(1) + 2
example_data <- tibble(x=runif(100, 0, 10), e=rnorm(100)) %>%
mutate(y=a*x+b+e)
ggplot(example_data, aes(x=x, y=y)) +
geom_point() +
scale_y_continuous(limits = c(0, 15)) +
scale_x_continuous(limits = c(0, 10))
## Warning: Removed 13 rows containing missing values (geom_point).
Given data \(x\) and \(y\), for estimated values \(\hat{a}, \hat{b}\), define the loss function
\[l(a,b) = \frac{1}{N}\sum_i^N(y_i-(\hat{a}x_i+\hat{b}))^2\]
mse <- function(y, x, a, b) {
mean((y - (a*x + b))^2)
}
Partial derivatives for each parameter \(a, b\) are calculated:
\[ \frac{\partial}{\partial a} = \frac{2}{N}\sum_i^N-x_i(y_i-(ax_i+b)) \] \[ \frac{\partial}{\partial b} = \frac{2}{N}\sum_i^N-(y_i-(ax_i+b)) \]
and a single iteration, given starting points for \(a, b\), the data, and learning rate can be written:
step_gradient <- function(a_now, b_now, data, rate) {
a_grad <- 2 * mean(-data$x * (data$y - (a_now * data$x + b_now)))
b_grad <- 2 * mean(-(data$y - (a_now * data$x + b_now)))
a_new <- a_now - (rate * a_grad)
b_new <- b_now - (rate * b_grad)
return(tibble(a=a_new, b=b_new))
}
Now, we can take multiple iterations.
coefs <- tibble(a=0, b=0)
rate <- 0.01
MAX_ITER <- 500
for (i in 1:MAX_ITER) {
now <- coefs %>%
tail(1)
coefs <- bind_rows(coefs, step_gradient(now$a, now$b, example_data, rate))
}
For brevity, let’s just take a sample of all iterations
sample_coefs <- coefs %>%
mutate(iter=1, original_iter=cumsum(iter)) %>%
slice(c(1:4, seq(5, MAX_ITER, MAX_ITER/50), MAX_ITER)) %>%
rowwise() %>%
mutate(loss=mse(example_data$y, example_data$x, a, b)) %>%
ungroup() %>%
mutate(iter=cumsum(iter))