Jupyter Notebook与机器学习：使用Scikit-Learn构建模型

介绍

Jupyter Notebook是一款强大的交互式开发环境，广泛应用于数据科学和机器学习领域。Scikit-Learn是一个流行的Python机器学习库，提供了简单高效的工具用于数据挖掘和数据分析。本教程将详细介绍如何在Jupyter Notebook中使用Scikit-Learn构建机器学习模型，涵盖数据加载与预处理、模型训练与评估等步骤。

前提条件

基本的Python编程知识
基本的机器学习概念
安装了Jupyter Notebook和Scikit-Learn库

教程大纲

环境设置
数据加载与预处理
数据集划分
模型选择与训练
模型评估
模型优化
保存和加载模型
总结与展望

1. 环境设置

1.1 安装Jupyter Notebook和Scikit-Learn

在终端中执行以下命令来安装Jupyter Notebook和Scikit-Learn：

pip install jupyter scikit-learn
1

1.2 启动Jupyter Notebook

在终端中执行以下命令来启动Jupyter Notebook：

jupyter notebook
1

2. 数据加载与预处理

2.1 导入必要的库

在Jupyter Notebook中导入所需的Python库：

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
1
2
3
4
5

2.2 加载数据集

使用Scikit-Learn自带的Iris数据集进行演示：

iris = load_iris()
X = iris.data
y = iris.target
# 将数据集转换为DataFrame
df = pd.DataFrame(data=np.c_[X, y], columns=iris.feature_names + ['target'])
df.head()
1
2
3
4
5
6
7

2.3 数据预处理

标准化数据：

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
1
2

3. 数据集划分

将数据集划分为训练集和测试集：

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
1

4. 模型选择与训练

4.1 选择模型

选择一个简单的机器学习模型，如逻辑回归：

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
1
2
3

4.2 训练模型

在训练集上训练模型：

model.fit(X_train, y_train)
1

5. 模型评估

5.1 预测与评估

在测试集上进行预测并评估模型性能：

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
y_pred = model.predict(X_test)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# 打印分类报告
print("Classification Report:")
print(classification_report(y_test, y_pred))
# 绘制混淆矩阵
import matplotlib.pyplot as plt
import seaborn as sns
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

6. 模型优化

6.1 超参数调优

使用网格搜索进行超参数调优：

from sklearn.model_selection import GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'solver': ['liblinear', 'saga']
}
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation accuracy: {grid_search.best_score_ * 100:.2f}%")
# 使用最佳参数训练最终模型
best_model = grid_search.best_estimator_
best_model.fit(X_train, y_train)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

7. 保存和加载模型

7.1 保存模型

使用joblib库保存训练好的模型：

import joblib
joblib.dump(best_model, 'logistic_regression_model.pkl')
1
2
3

7.2 加载模型

加载保存的模型：

loaded_model = joblib.load('logistic_regression_model.pkl')
# 在测试集上评估加载的模型
loaded_model_accuracy = loaded_model.score(X_test, y_test)
print(f"Loaded model accuracy: {loaded_model_accuracy * 100:.2f}%")
1
2
3
4
5

8. 总结与展望

通过本教程，您已经学习了如何在Jupyter Notebook中使用Scikit-Learn构建机器学习模型的完整流程，包括数据加载与预处理、模型选择与训练、模型评估、模型优化以及模型的保存和加载。您可以将这些知识应用到其他机器学习任务中，并尝试使用更复杂的数据集和模型，进一步提高机器学习技能。希望本教程能帮助您在数据科学和机器学习领域取得更大进步！

原文链接:https://blog.csdn.net/weixin_41859354/article/details/140569905

posted on 2024-11-18 20:04 sunny123456 阅读(126) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· logstash详解

· jvm高级相关知识

· Angr - 笔记汇总 - 转载 from Forgo7ten

· python实现RabbitMQ六种模式

· word生成产生错误的原因

阅读排行：
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· AI与.NET技术实操系列（二）：开始使用ML.NET
· .NET10 - 预览版1新功能体验（一）

sunny123456

公告

搜索

常用链接

最新随笔

我的标签

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论