PyTorchStepByStep - Chapter 0: Visualizing Gradient Descent - ZhangZhihuiAAA

PyTorchStepByStep - Chapter 0: Visualizing Gradient Descent

Gradient descent is an iterative technique commonly used in machine learning and deep learning to find the best possible set of parameters / coefficients for a given model, data points, and loss function, starting from an initial, and usually random, guess.

import numpy as np


true_b = 1
true_w = 2
N = 100

# Data Generationrng = np.random.default_rng(54321)
x = rng.random(N)
epsilon = (0.1 * rng.standard_normal(N))
y = true_b + true_w * x + epsilon

print(x.shape)
print(y.shape)

(100,)
(100,)

# Shuffle the indices
idx = np.arange(N)
np.random.shuffle(idx)

# Use first 80 random indices for train
train_idx = idx[:int(N * 0.8)]
# Use the ramaining indices for validation
val_idx = idx[int(N * 0.8):]

# Generate train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

# Step 0 - Initialize parameters "b" and "w" randomly
rng = np.random.default_rng(54321)
b = rng.standard_normal(1)
w = rng.standard_normal(1)

print(b, w) # [0.82483768] [0.40749068]

# Step 1 - Compute our model's predicted output - forward pass
yhat = b + w * x_train

# Step 2 - Compute the loss
# We are using ALL data points, so this is BATCH gradient descent. 
# How wrong is our model? That's the error!
error = yhat - y_train

# It is a regression, so it computes mean squared error (MSE)
loss = (error ** 2).mean()

# Reminder:
# true_b = 1
# true_w = 2

# We have to split the ranges in 100 evenly spaced intervals each.
# Sure, we’re somewhat cheating here, since we know the true values of b and w, so we can 
# choose the perfect ranges for the parameters. But it is for educational purposes only.
b_range = np.linspace(true_b - 3, true_b + 3, 101)
w_range = np.linspace(true_w - 3, true_w + 3, 101)
# meshgrid is a handy function that generates a grid of b and w values for all combinations
bs, ws = np.meshgrid(b_range, w_range)
bs.shape, ws.shape  # ((101, 101), (101, 101))

print(x_train.reshape(-1, 1).shape)

# (80, 1)

all_predictions = np.apply_along_axis(
    func1d=lambda x: bs + ws * x,
    axis=1,
    arr=x_train.reshape(-1, 1),
)
all_predictions.shape # (80, 101, 101)

all_labels = y_train.reshape(-1, 1, 1)
all_labels.shape # (80, 1, 1)

all_errors = all_predictions - all_labels
all_errors.shape # (80, 101, 101)

Each prediction has its own error, so we get 80 matrices of shape (101, 101), again, one matrix for each data point, each matrix containing a grid of errors.

all_losses = (all_errors ** 2).mean(axis=0)
all_losses.shape # (101, 101)

# Step 3 - Compute gradients for both "b" and "w" parameters
b_grad = 2 * error.mean()
w_grad = 2 * (x_train * error).mean()
print(b_grad, w_grad) # -1.965279236007916 -1.3213968667849743

以gradients作为learning rate的权重来更新parameters.

# Set learning rate - this is "eta"
lr = .1
print(b, w)
# [0.82483768] [0.40749068]

# Step 4 - Update parameters using gradients and the learning rate
b = b - lr * b_grad
w = w - lr * w_grad

print(b, w)
# [1.02136561] [0.53963036]

Figure 0.9 - Updated model’s predictions

It looks better.

true_b = 1
true_w = 2
N = 100

# Data Generation
rng = np.random.default_rng(54321)

# We divide w by 10
bad_w = true_w / 10
# And multiply x by 10
bad_x = rng.random(N) * 10

epsilon = 0.1 * rng.standard_normal(N)

# So, the net effect on y is zero - it is still the same as before
y = true_b + bad_w * bad_x + epsilon

# Generate train and validation sets
# Use the same train_idx and val_idx as before, but apply to bad_x
bad_x_train, y_train = bad_x[train_idx], y[train_idx]
bad_x_val, y_val = bad_x[val_idx], y[val_idx]

from sklearn.preprocessing import StandardScaler


scaler = StandardScaler(with_mean=True, with_std=True)
# We use the TRAIN set ONLY to fit the scaler
scaler.fit(x_train.reshape(-1, 1))

# Now we can use the already fit scaler to TRANSFORM both TRAIN and VALIDATION sets
scaled_x_train = scaler.transform(x_train.reshape(-1, 1))
scaled_x_val = scaler.transform(x_val.reshape(-1, 1))

posted on 2024-09-28 11:32 ZhangZhihuiAAA 阅读(6) 评论(0) 编辑收藏举报

刷新页面返回顶部


Copyright © 2024 ZhangZhihuiAAA Powered by .NET 9.0 on Kubernetes 博客园

导航