借助一个例子简要了解机器学习

练习:训练一个模型, `基于适合狗的护具的大小`来预测适合狗的靴子尺寸

环境: azureml_py

import pandas
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv
!pip install statsmodels


# Make a dictionary of data for boot sizes and harness sizes in cm
data = {
    'boot_size' : [ 39, 38, 37, 39, 38, 35, 37, 36, 35, 40, 
                    40, 36, 38, 39, 42, 42, 36, 36, 35, 41, 
                    42, 38, 37, 35, 40, 36, 35, 39, 41, 37, 
                    35, 41, 39, 41, 42, 42, 36, 37, 37, 39,
                    42, 35, 36, 41, 41, 41, 39, 39, 35, 39
 ],
    'harness_size': [ 58, 58, 52, 58, 57, 52, 55, 53, 49, 54,
                59, 56, 53, 58, 57, 58, 56, 51, 50, 59,
                59, 59, 55, 50, 55, 52, 53, 54, 61, 56,
                55, 60, 57, 56, 61, 58, 53, 57, 57, 55,
                60, 51, 52, 56, 55, 57, 58, 57, 51, 59
                ]
}

# Convert it into a table using pandas
dataset = pandas.DataFrame(data)
dataset

	boot_size	harness_size
0	39	58
1	38	58
2	37	52
3	39	58
4	38	57
5	35	52
6	37	55
7	36	53
8	35	49
9	40	54
10	40	59
11	36	56
12	38	53
13	39	58
14	42	57
15	42	58
16	36	56
17	36	51
18	35	50
19	41	59
20	42	59
21	38	59
22	37	55
23	35	50
24	40	55
25	36	52
26	35	53
27	39	54
28	41	61
29	37	56
30	35	55
31	41	60
32	39	57
33	41	56
34	42	61
35	42	58
36	36	53
37	37	57
38	37	57
39	39	55
40	42	60
41	35	51
42	36	52
43	41	56
44	41	55
45	41	57
46	39	58
47	39	57
48	35	51
49	39	59

我们希望使用"harness size"来估计"boot size"。这意味着' harness_size '是我们的input。我们需要一个模型来处理输入并对boot size输出)做出自己的估计。

普通最小二乘法Ordinary Least Squares (OLS)

示例: Linear Regression Example — scikit-learn 1.4.0 documentation

这里使用一个现有的库来创建我们的模型, 但是我们还不训练它

import statsmodels.formula.api as smf

# 首先, 我们使用一种特殊语法定义公式, 说明boot_size由harness_size解释
formula = "boot_size ~ harness_size"
model = smf.ols(formula = formula, data = dataset)

# 已经创建了模型, 但它没有内部参数的设置
if not hasattr(model, 'params'):
    print("Model selected but it does not have parameters set. We need to train it!")

OLS模型有两个参数(斜率和偏移量), 但这些参数还没有在我们的模型中设置。我们需要训练我们的模型来找到这些值, 这样模型就可以根据狗的护具尺寸可靠地估计狗的靴子尺寸。

fitted_model = model.fit()

# 现在已拟合, 输出我们模型的信息
print("The following model parameters have been found:\n" +
        f"Line slope: {fitted_model.params[1]}\n"+
        f"Line Intercept: {fitted_model.params[0]}")

The following model parameters have been found:

Line slope: 0.5859254167382711

Line Intercept: 5.719109812682577

import matplotlib.pyplot as plt

# 显示数据点的散点图, 并添加拟合线
plt.scatter(dataset["harness_size"], dataset["boot_size"])
plt.plot(dataset["harness_size"], fitted_model.params[1] * dataset["harness_size"] + fitted_model.params[0], 'r', label='Fitted line')

# 添加标签和图例
plt.xlabel("harness_size")
plt.ylabel("boot_size")
plt.legend()

在这里插入图片描述

# 预测护具大小为61.4时对应的靴子大小
harness_size = { 'harness_size' : [61.4] }
approximate_boot_size = fitted_model.predict(harness_size)

print("Estimated approximate_boot_size:")
print(approximate_boot_size[0])

Estimated approximate_boot_size:

41.694930400412424

输入和输出

在这里插入图片描述

训练的目标是改进模型, 使它可以进行高质量的估计或预测

模型不会自行训练, 它们使用数据和两段代码（目标函数和优化器）进行训练

目标函数判断模型表现得不错（正确地估计了靴子尺寸）还是糟糕

在训练期间, 模型进行预测, 目标函数计算其性能。优化器是随后更改模型参数的代码, 以便模型下次会做得更好, 通常可以使用开源框架

目标、数据和优化器只是训练模型的一种手段, 训练完成后, 就不再需要它们

训练只会改变模型内部的参数值;它不会改变使用的模型类型

练习:可视化输入和输出

这次，我们从文件加载数据，对其进行过滤，并将其绘制成图形。为了更好地了解模型构建或局限性

这里使用 Pandas 加载数据

import pandas
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv


dataset = pandas.read_csv('doggy-boot-harness.csv')

# 数据很多，使用head()只打印前几行
dataset.head()

	boot_size	harness_size	sex	age_years
0	39	58	male	12.0
1	38	58	male	9.6
2	37	52	female	8.6
3	39	58	male	10.2
4	38	57	male	7.8

按列筛选数据

print("Harness sizes")
print(dataset.harness_size)

del dataset["sex"]
del dataset["age_years"]


print("\nAvailable columns after deleting sex and age information:")
print(dataset.columns.values)

在这里插入图片描述

Available columns after deleting sex and age information:

['boot_size' 'harness_size']

print("TOP OF TABLE")
print(dataset.head())

print("\nBOTTOM OF TABLE")
print(dataset.tail())

在这里插入图片描述

print(f"We have {len(dataset)} rows of data")

is_small = dataset.harness_size < 55
print("\nWhether the dog's harness was smaller than size 55:")
print(is_small)

在这里插入图片描述

data_from_small_dogs = dataset[is_small]
print("\nData for dogs with harness smaller than size 55:")
print(data_from_small_dogs)

print(f"\nNumber of dogs with harness size less than 55: {len(data_from_small_dogs)}")

在这里插入图片描述

上面也可写作

data_from_small_dogs = dataset[dataset.boot_size < 55].copy()

对copy()的调用是可选的，但有助于避免在更复杂的场景中出现意外行为

Python 直接赋值、浅拷贝和深度拷贝

Python 字典(Dictionary) copy()方法 | 菜鸟教程 (runoob.com)

Python 直接赋值、浅拷贝和深度拷贝解析 | 菜鸟教程 (runoob.com)
1、b = a: 赋值引用，a 和 b 都指向同一个对象。

2、b = a.copy(): 浅拷贝, a 和 b 是一个独立的对象，但他们的子对象还是指向统一对象（是引用）。

浅拷贝：深拷贝父对象（一级目录），子对象（二级目录）不拷贝，还是引用

b = copy.deepcopy(a): 深度拷贝, a 和 b 完全拷贝了父对象及其子对象，两者是完全独立的。

import matplotlib.pyplot as plt

plt.scatter(data_smaller_paws["harness_size"], data_smaller_paws["boot_size"])

plt.xlabel("harness_size")
plt.ylabel("boot_size")

在这里插入图片描述

某些客户可能希望以英寸为单位，而不是以厘米为单位

那么我们需要创建新的横坐标

data_smaller_paws['harness_size_imperial'] = data_smaller_paws.harness_size / 2.54

plt.scatter(data_smaller_paws["harness_size_imperial"], data_smaller_paws["boot_size"])
plt.xlabel("harness_size_imperial")
plt.ylabel("boot_size")

在这里插入图片描述

如何使用模型

在这里插入图片描述

在训练过程中，目标函数通常需要知道模型的输出和正确答案是什么。这些值称为标签。在我们的场景中，如果我们的模型预测了靴子大小，则靴子大小就是我们的标签。

模型完成训练后，可以将其单独保存到文件中。我们不再需要原始数据、目标函数或模型优化器。当我们想使用模型时，我们可以从磁盘加载它，为其提供新数据，并返回预测。

对新数据使用经过训练的模型

在我们刚刚学习机器学习时，构建、训练然后使用模型是很常见的;但在现实世界中，我们不希望每次都要进行预测时就训练模型。

创建基本模型
将其保存到磁盘
从磁盘加载
使用它来预测不在训练数据集中的狗

正如我们之前所做的那样，创建一个简单的线性回归模型，并在我们的数据集上对其进行训练

import statsmodels.formula.api as smf

model = smf.ols(formula = "boot_size ~ harness_size", data = data).fit()

保存并加载模型

import joblib

model_filename = './avalanche_dog_boot_model.pkl'
joblib.dump(model, model_filename)

print("Model saved!")

Joblib: running Python functions as pipeline jobs — joblib 1.3.2 documentation

model_loaded = joblib.load(model_filename)

print("We have loaded a model with the following parameters:")
print(model_loaded.params)

We have loaded a model with the following parameters:

Intercept 5.719110

harness_size 0.585925

dtype: float64

实际应用场景

def load_model_and_predict(harness_size):
    '''
    This function loads a pretrained model. 
    It uses the model with the customer's dog's harness size 
    to predict the size of boots that will fit that dog.

    harness_size: The dog harness size, in cm 
    '''

    # Load the model from file and print basic information about it
    loaded_model = joblib.load(model_filename)
    print("We've loaded a model with the following parameters:")
    print(loaded_model.params)

    inputs = {"harness_size":[harness_size]} 

    # Use the model to make a prediction
    predicted_boot_size = loaded_model.predict(inputs)[0]

    return predicted_boot_size


def check_size_of_boots(selected_harness_size, selected_boot_size):
    '''
    Calculates whether the customer has chosen a pair of doggy boots that 
    are a sensible size. This works by estimating the dog's actual boot 
    size from their harness size.

    This returns a message for the customer that should be shown before
    they complete their payment 

    selected_harness_size: The size of the harness the customer wants to buy
    selected_boot_size: The size of the doggy boots the customer wants to buy
    '''

    # Estimate the customer's dog's boot size
    estimated_boot_size = load_model_and_predict(selected_harness_size)

    # Round to the nearest whole number because we don't sell partial sizes
    estimated_boot_size = int(round(estimated_boot_size))

    # Check if the boot size selected is appropriate
    if selected_boot_size == estimated_boot_size:
        return f"Great choice! We think these boots will fit your avalanche dog well."

    if selected_boot_size < estimated_boot_size:
        return "The boots you have selected might be TOO SMALL for a dog as "\
               f"big as yours. We recommend a doggy boots size of {estimated_boot_size}."

    if selected_boot_size > estimated_boot_size:
        return "The boots you have selected might be TOO BIG for a dog as "\
               f"small as yours. We recommend a doggy boots size of {estimated_boot_size}."

# Practice using our new warning system
check_size_of_boots(selected_harness_size=55, selected_boot_size=39)

在这里插入图片描述

posted @ 2024-01-24 01:36 泥烟阅读(8) 评论(0) 编辑收藏举报

刷新页面返回顶部

	boot_size	harness_size
0	39	58
1	38	58
2	37	52
3	39	58
4	38	57
5	35	52
6	37	55
7	36	53
8	35	49
9	40	54
10	40	59
11	36	56
12	38	53
13	39	58
14	42	57
15	42	58
16	36	56
17	36	51
18	35	50
19	41	59
20	42	59
21	38	59
22	37	55
23	35	50
24	40	55
25	36	52
26	35	53
27	39	54
28	41	61
29	37	56
30	35	55
31	41	60
32	39	57
33	41	56
34	42	61
35	42	58
36	36	53
37	37	57
38	37	57
39	39	55
40	42	60
41	35	51
42	36	52
43	41	56
44	41	55
45	41	57
46	39	58
47	39	57
48	35	51
49	39	59

	boot_size	harness_size
0	39	58
1	38	58
2	37	52
3	39	58
4	38	57
5	35	52
6	37	55
7	36	53
8	35	49
9	40	54
10	40	59
11	36	56
12	38	53
13	39	58
14	42	57
15	42	58
16	36	56
17	36	51
18	35	50
19	41	59
20	42	59
21	38	59
22	37	55
23	35	50
24	40	55
25	36	52
26	35	53
27	39	54
28	41	61
29	37	56
30	35	55
31	41	60
32	39	57
33	41	56
34	42	61
35	42	58
36	36	53
37	37	57
38	37	57
39	39	55
40	42	60
41	35	51
42	36	52
43	41	56
44	41	55
45	41	57
46	39	58
47	39	57
48	35	51
49	39	59

泥烟

借助一个例子简要了解机器学习

练习:训练一个模型, 基于适合狗的护具的大小来预测适合狗的靴子尺寸

输入和输出

练习:可视化输入和输出

Python 直接赋值、浅拷贝和深度拷贝

如何使用模型

对新数据使用经过训练的模型

实际应用场景

公告

练习:训练一个模型, `基于适合狗的护具的大小`来预测适合狗的靴子尺寸

	boot_size	harness_size
0	39	58
1	38	58
2	37	52
3	39	58
4	38	57
5	35	52
6	37	55
7	36	53
8	35	49
9	40	54
10	40	59
11	36	56
12	38	53
13	39	58
14	42	57
15	42	58
16	36	56
17	36	51
18	35	50
19	41	59
20	42	59
21	38	59
22	37	55
23	35	50
24	40	55
25	36	52
26	35	53
27	39	54
28	41	61
29	37	56
30	35	55
31	41	60
32	39	57
33	41	56
34	42	61
35	42	58
36	36	53
37	37	57
38	37	57
39	39	55
40	42	60
41	35	51
42	36	52
43	41	56
44	41	55
45	41	57
46	39	58
47	39	57
48	35	51
49	39	59