GBDT算法相关-数据挖掘

GBDT算法-策略提升决策树

Gradient Boosting Decision Trees(GBDT)是一种集成学习算法,它结合了多个决策树模型以进行强大的预测。GBDT在许多机器学习问题中都表现出色,并且在许多应用场景中都有广泛的应用。

以下是GBDT算法的基本实现步骤:

  1. 初始化模型: GBDT从一个简单的模型开始,通常选择一个具有常数预测值的模型。
  2. 迭代训练: GBDT通过迭代训练多个决策树模型,每次迭代都会根据前一轮的残差来训练一个新的决策树。这个新的决策树模型将被添加到集成中,并且每个模型都会对残差进行拟合。
  3. 集成模型: 集成模型是通过将多个决策树的预测结果进行组合而获得的。通常,GBDT使用加法模型,通过将多个树的输出相加来进行预测。
  4. 损失函数: GBDT通常使用平方损失函数(对于回归问题)或对数损失函数(对于分类问题)来衡量模型的性能。
  5. 正则化: 为了防止过拟合,GBDT通常使用正则化技术,如限制树的深度、限制叶子节点的最小样本数或引入学习率。

具体的应用场景包括但不限于以下几个领域:

  1. 回归问题: GBDT广泛用于解决回归问题,如房价预测、股票价格预测等。它可以捕获复杂的非线性关系,从而提高预测准确性。
  2. 分类问题: GBDT也用于解决分类问题,如垃圾邮件分类、客户流失预测等。它可以处理高维数据和不平衡数据集。
  3. 排名问题: 在搜索引擎和推荐系统中,GBDT可用于排序问题,以提供个性化的搜索结果或推荐内容。
  4. 异常检测: GBDT可以用于检测异常行为或欺诈行为,如信用卡欺诈检测。
  5. 自然语言处理: 在文本分类、情感分析和命名实体识别等自然语言处理任务中,GBDT也有应用。
  6. 图像识别: GBDT可以用于图像分类和目标检测,尤其在计算机视觉领域有一定应用。

总之,GBDT是一种强大的机器学习算法,适用于各种预测和分类任务,并在实际应用中取得了良好的效果。

实例:

适用于GBDT的数据集:

  1. Kaggle: Kaggle是一个著名的数据科学竞赛平台,提供了大量的数据集,包括用于分类、回归等任务的数据集。您可以在Kaggle上浏览数据集,并找到适合您的GBDT项目的数据。
  2. UCI机器学习库: UCI机器学习库是一个经典的数据集资源库,包含了各种用于机器学习研究的数据集。这些数据集涵盖了不同的领域和任务,适用于GBDT的数据也很多。
  3. Scikit-learn内置数据集: 如果您使用Python的Scikit-learn库进行机器学习,该库内置了一些示例数据集,您可以直接加载并用于GBDT模型的训练和测试。
  4. OpenML: OpenML是一个共享机器学习数据集和实验的平台,提供了丰富的数据集资源,可以筛选包含GBDT相关任务的数据。
  5. 政府和研究机构: 一些政府和研究机构发布了各种领域的数据集,这些数据集可能包含有关社会、环境、健康等方面的信息,适用于不同类型的GBDT任务。
  6. GitHub: 一些开发者和研究人员在GitHub上分享了自己创建的数据集。您可以通过GitHub搜索找到适用于GBDT的数据集。
  7. 竞赛平台: 一些在线竞赛平台(除Kaggle之外)也提供了数据集,您可以参加竞赛并使用提供的数据进行训练和测试。

在选择数据集时,确保数据集与您的研究或项目目标相关,并符合您的任务类型(分类、回归等)。另外,了解数据集的特征和数据预处理需求,以确保能够有效地在GBDT模型中使用。

以下是一些具体的步骤,以帮助您在UCI机器学习库中找到适用于GBDT的数据集:

  1. 访问UCI机器学习库网站: 打开Web浏览器,前往UCI机器学习库的官方网站。网址是:UCI机器学习库
  2. 浏览数据集列表: 在UCI机器学习库的主页上,您将看到一个包含各种数据集的列表。这些数据集按主题分类,例如分类、回归、聚类等。GBDT通常用于分类和回归任务,所以您可以选择与这两个主题相关的数据集。
  3. 选择分类或回归主题: 如果您计划执行分类任务,请选择“Classification”主题;如果您计划执行回归任务,请选择“Regression”主题。
  4. 浏览数据集列表: 在您选择的主题下,您将看到一系列数据集的名称。单击数据集的名称以查看更多详细信息。
  5. 查看数据集详细信息: 在数据集的详细信息页面,您将找到有关数据集的更多信息,包括数据描述、特征数、样本数等。请查看数据集的描述,以确定它是否适合您的项目或研究。
  6. 下载数据集: 如果您决定使用特定数据集,请查找该数据集的下载链接。通常,数据集将以数据文件的形式提供,您可以单击链接下载数据。
  7. 解压缩数据集: 下载后,将数据集文件解压缩(如果需要)。这将为您提供原始数据文件,您可以将其用于机器学习任务。
  8. 数据预处理: 在使用数据集之前,可能需要进行数据预处理,例如处理缺失值、进行特征工程或标准化数据。确保您了解数据集的结构以及需要执行的预处理步骤。
  9. 加载数据集: 使用Python或其他适当的工具加载数据集。通常,您可以将数据集加载到Pandas DataFrame中,以便进行后续的数据分析和建模。
  10. 开始实验: 利用加载的数据集执行GBDT分类或回归任务。您可以使用GBDT库(如LightGBM或XGBoost)来构建和训练模型。
  11. 评估模型性能: 使用适当的评估指标来评估模型的性能。这可能包括准确性、精确度、召回率、均方根误差(RMSE)等。
  12. 分析结果: 分析模型的性能和实验结果,以获得关于数据集和任务的见解,并撰写您的研究报告或项目文档。

通过按照以上步骤操作,您将能够在UCI机器学习库中找到适用于GBDT的数据集,并用于您的分类或回归任务。

一个使用GBDT进行分类任务的示例:

问题描述: 假设我们有一个电子商务网站,希望根据用户的行为数据来预测是否会购买某个产品(二分类问题:购买/不购买)。

步骤:

  1. 数据收集: 收集用户的行为数据,包括浏览页面的次数、停留时间、搜索关键词、购物车中的商品数量等。
  2. 数据预处理: 对数据进行清洗和特征工程,将原始数据转化为可供GBDT使用的格式。这可能包括缺失值处理、特征编码(如独热编码)、数据标准化等。
  3. 数据拆分: 将数据集拆分为训练集和测试集。训练集用于训练模型,测试集用于评估模型性能。
  4. 模型训练: 使用GBDT算法(如LightGBM或XGBoost)训练分类模型。模型的目标是预测用户购买(1)或不购买(0)的概率。
import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 读取数据
data = pd.read_csv('user_behavior_data.csv')

# 特征工程和数据预处理
# ...

# 拆分数据集
X = data.drop('purchase', axis=1)
y = data['purchase']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 训练GBDT模型
params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

train_data = lgb.Dataset(X_train, label=y_train)
model = lgb.train(params, train_data, num_boost_round=100)

# 预测
y_pred = model.predict(X_test, num_iteration=model.best_iteration)
y_pred_binary = [1 if pred >= 0.5 else 0 for pred in y_pred]

# 评估模型性能
accuracy = accuracy_score(y_test, y_pred_binary)
print(f'Accuracy: {accuracy}')

  1. 模型评估: 使用测试集评估模型性能,通常使用准确性、精确度、召回率、F1分数等指标来衡量模型的性能。
  2. 模型应用: 训练好的模型可以用于实际应用中,根据用户的行为数据预测他们是否会购买某个产品。这可以帮助电子商务网站优化推荐系统,提供个性化的产品推荐,从而提高销售效率。

这是一个简单的示例,展示了如何使用GBDT算法进行分类任务。在实际应用中,您可以进一步改进特征工程、调整模型参数以及采用更复杂的模型来提高分类性能。

参数配置:

LightGBM是一个高效的梯度提升框架,它具有众多可调参数,以便您可以根据问题的特性进行调整。以下是您提供的参数示例的解释:

  1. boosting_type:这是梯度提升的类型。在这个示例中,它设置为'gbdt',表示传统的梯度提升决策树。除此之外,还有'goss'(Gradient-based One-Side Sampling)和'dart'(Dropouts meet Multiple Additive Regression Trees)等选项。
  2. objective:这是要优化的目标函数。在这里,它设置为'binary',表示二分类问题。LightGBM支持各种不同类型的问题,包括二分类、多分类、回归等,因此您可以根据您的问题类型进行设置。
  3. metric:这是用于评估模型性能的度量标准。在这里,它设置为'binary_logloss',表示二分类问题的对数损失(Log Loss)。您可以根据需要选择其他度量标准,如'auc'(Area Under the ROC Curve)等。
  4. num_leaves:这是梯度提升决策树中每棵树的叶子节点数目的最大值。较大的值可以提高模型的复杂性,但也可能导致过拟合。通常,您可以通过交叉验证来选择合适的值。
  5. learning_rate:学习率控制了每次迭代中模型参数的更新幅度。较小的学习率通常需要更多的迭代次数,但可以提高模型的稳定性。较大的学习率可能导致快速收敛,但可能会跳过最优解。通常,您需要在训练过程中调整学习率。
  6. feature_fraction:这是用于特征抽样的参数,它控制每次迭代中随机选择的特征的比例。较小的值可以提高模型的鲁棒性,降低过拟合的风险。通常,0.8至0.9之间的值是一个不错的起点。

除了上述参数,LightGBM还有许多其他可调参数,包括树的深度、正则化项、bagging参数等等。您可以根据您的具体问题和数据集来调整这些参数,以获得最佳的模型性能。通常,使用交叉验证来搜索最佳参数组合是一个好的实践。

以鸢尾花数据集为例:

1.数据集读取:

如果您从UCI机器学习库下载了Iris数据集,通常会以CSV或其他格式提供数据文件。在这种情况下,您可以使用Pandas库来读取数据集。以下是一个示例:

import pandas as pd

# 从UCI下载的Iris数据集文件路径
file_path = 'path_to_iris_dataset.csv'  # 请将"path_to_iris_dataset.csv"替换为您的文件路径

# 使用Pandas读取CSV文件
iris_df = pd.read_csv(file_path)

# 现在您可以使用iris_df来访问数据集的特征和目标

在上述示例中,请将file_path替换为您下载的Iris数据集文件的实际路径。然后,使用pd.read_csv函数读取CSV文件,将数据加载到Pandas DataFrame中。接下来,您可以使用iris_df来访问数据集的特征和目标,进行进一步的分析和机器学习任务。

请确保您已经下载了正确格式的Iris数据集文件,并将文件路径正确指定在代码中。具体实现时要进行一定的修改:

相关方法:

  1. 要获取某一列中所有唯一的元素,您可以使用Pandas DataFrame中的unique()方法或nunique()方法,具体取决于您想要的结果。

使用unique()方法获取唯一元素的列表:

unique_elements = df['ColumnName'].unique()
print(unique_elements)

在上面的代码中,将"ColumnName"替换为您要查找唯一元素的列的名称。unique()方法将返回一个包含该列中唯一元素的NumPy数组。

使用nunique()方法获取唯一元素的数量:

unique_count = df['ColumnName'].nunique()
print(unique_count)

这将返回指定列中唯一元素的数量,而不是元素本身。

您可以根据需要选择其中一个方法,以便获取列中的唯一元素或唯一元素的数量。

  1. 如果您在使用Pandas读取CSV文件时没有列标(也称为列名),可以通过将header参数设置为None来防止数据被当成列标。这将告诉Pandas不要将第一行作为列标,而是将数据的第一行作为数据的一部分。

以下是一个示例:

import pandas as pd

# 从CSV文件读取数据,没有列标
data = pd.read_csv('your_csv_file.csv', header=None)

# 您现在可以通过默认的整数索引来访问列

在这个示例中,header=None参数告诉Pandas不要将第一行数据作为列标,而是将其作为数据的一部分。这样,您将使用默认的整数索引来访问列,而不会将数据的第一行误认为是列标。

  1. 要确定每一列是否有空缺值,您可以使用Pandas的isnull()函数和any()函数。以下是一种方法:
import pandas as pd

# 从CSV文件读取数据
data = pd.read_csv('your_csv_file.csv')

# 检查每一列是否有空缺值
missing_values = data.isnull().any()

# 显示有空缺值的列
columns_with_missing_values = missing_values[missing_values == True]
print(columns_with_missing_values)

在这个示例中,isnull()函数用于检查每个单元格是否为空缺值,并返回一个布尔值的DataFrame,其中True表示该单元格为空缺值,False表示该单元格不是空缺值。然后,any()函数用于检查每一列是否存在至少一个空缺值,它返回一个布尔值的Series,其中True表示该列至少有一个空缺值,False表示该列没有空缺值。最后,我们筛选出有空缺值的列并打印出来。

这样,您就可以确定哪些列包含空缺值。如果某列包含空缺值,您可以进一步使用Pandas的方法来处理这些空缺值,如填充或删除。

当数据集中存在非int、bool、float等数据类型的参数时,您可以使用编码技术将其转换为模型可以理解的数值形式。这通常适用于分类特征或文本数据。以下是一些常见的编码方法:

  1. 独热编码(One-Hot Encoding):对于分类特征,您可以使用独热编码将其转换为二进制编码。每个不同的类别值都将被转换为一个新的二进制特征,表示是否属于该类别。这种方法适用于类别之间没有明显的顺序关系的特征。
  2. 标签编码(Label Encoding):对于有序的分类特征,您可以使用标签编码将类别映射到整数值。例如,如果特征是"低"、"中"、"高",可以将它们编码为0、1、2。这种编码适用于类别之间有明确的顺序关系的情况。
  3. 文本特征提取:对于包含文本信息的特征,您可以使用文本特征提取技术,如词袋模型(Bag of Words)或词嵌入(Word Embeddings),将文本数据转换为数值特征。这允许您在模型中使用文本信息。
  4. 自定义编码:有时,您可能需要根据特定的领域知识或数据的性质来定义自己的编码方式。例如,将某一特征的不同取值映射为具有某种含义的数值。

下面是一个示例,展示如何使用Python中的pandas库进行独热编码和标签编码:

import pandas as pd

# 创建一个包含分类特征的DataFrame
data = pd.DataFrame({'color': ['red', 'green', 'blue', 'red', 'blue']})

# 使用独热编码
data_encoded = pd.get_dummies(data, columns=['color'])

# 输出独热编码后的结果
print(data_encoded)

以上示例中,"color"特征被独热编码为三个新的二进制特征,分别表示"red"、"green"和"blue"类别。

请根据您的数据集和问题选择合适的编码方法,以确保模型能够正确理解和使用非int、bool、float等类型的数据。

在Python中,您可以使用scikit-learn库来实现标签编码(Label Encoding)。标签编码将类别特征映射为整数值,通常适用于有序的类别特征,其中类别之间有明确的顺序关系。以下是一个简单的示例:

首先,确保您已经安装了scikit-learn库。如果没有安装,您可以使用以下命令来安装:

pip install scikit-learn

然后,可以按照以下步骤实现标签编码:

from sklearn.preprocessing import LabelEncoder

# 创建一个包含类别特征的列表
categories = ['low', 'medium', 'high', 'low', 'medium']

# 初始化LabelEncoder
label_encoder = LabelEncoder()

# 使用LabelEncoder对类别进行编码
encoded_categories = label_encoder.fit_transform(categories)

# 输出编码后的结果
print(encoded_categories)

上述示例中,我们首先导入LabelEncoder类,然后创建了一个包含类别特征的列表categories。接下来,我们初始化LabelEncoder对象,并使用fit_transform方法对类别进行编码。最后,我们输出了编码后的结果。

请注意,标签编码会按照类别出现的顺序为其分配整数值。在这个示例中,"low"被编码为0,"medium"被编码为1,"high"被编码为2。标签编码通常用于机器学习算法,以便处理类别特征。

要将NumPy数组(np.ndarray)转换为Pandas DataFrame,您可以使用Pandas库的DataFrame构造函数。以下是将NumPy数组转换为DataFrame的示例:

首先,确保您已经导入了pandas库和numpy库:

import pandas as pd
import numpy as np

然后,您可以使用pd.DataFrame()构造函数将NumPy数组转换为DataFrame。以下是一个示例:

# 创建一个NumPy数组
numpy_data = np.array([[1, 'Alice', 25],
                      [2, 'Bob', 30],
                      [3, 'Charlie', 35]])

# 将NumPy数组转换为DataFrame
df = pd.DataFrame(data=numpy_data, columns=['ID', 'Name', 'Age'])

# 打印DataFrame
print(df)

在上述示例中,我们首先创建了一个包含数据的NumPy数组numpy_data。然后,我们使用pd.DataFrame()构造函数将该数组转换为DataFrame,并指定了列名('ID'、'Name'、'Age')作为参数columns传递给构造函数。

您可以根据需要调整列名和数据类型,然后使用Pandas DataFrame来进一步处理和分析数据。

输出结果如下:

[LightGBM] [Info] Number of positive: 80, number of negative: 40
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000982 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 91
[LightGBM] [Info] Number of data points in the train set: 120, number of used features: 4
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.666667 -> initscore=0.693147
[LightGBM] [Info] Start training from score 0.693147
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Accuracy: 0.6333333333333333

这个输出是来自LightGBM(Gradient Boosting Decision Tree)模型的训练日志和性能评估结果的一部分。

  • [LightGBM] [Info] Number of positive: 80, number of negative: 40:这部分信息显示了在训练数据集中正类别(购买的情况)有80个样本,负类别(不购买的情况)有40个样本。
  • [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000982 seconds. You can set force_row_wise=true to remove the overhead. And if memory is not enough, you can set force_col_wise=true.:这个警告信息表明LightGBM自动选择了行级多线程计算。它还提到了测试的开销(overhead)很小,但如果内存不足,可以考虑设置force_col_wise=true来选择列级多线程计算。
  • [LightGBM] [Info] Total Bins 91:这部分信息表示总共有91个分箱(bins)用于特征分割。
  • [LightGBM] [Info] Number of data points in the train set: 120, number of used features: 4:这部分信息显示了训练集中的数据点数量为120个,使用的特征数量为4个。
  • [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.666667 -> initscore=0.693147:这个信息是LightGBM初始化得分的一部分,指示了正类别的平均概率(pavg)为0.666667,从而初始化得分(initscore)为0.693147。
  • [LightGBM] [Info] Start training from score 0.693147:这部分信息表示LightGBM将从初始得分0.693147开始训练。
  • [LightGBM] [Warning] No further splits with positive gain, best gain: -inf:这些警告表示在训练过程中,某些特征无法再分裂以获得正增益(positive gain),因此树的生长停止了。
  • Accuracy: 0.6333333333333333:最后,这是模型性能的评估结果。Accuracy表示分类准确率,约为63.33%。

请注意,这些日志和警告信息对于理解LightGBM模型的训练过程和性能评估非常有用,可以帮助您调整模型的参数以获得更好的性能。

通过观察代码发现,自己在进行代码实现的时候, 设定的是二分类的问题,而实际上应该是三分类的问题,参数设定不当,修改了相应的pred部分函数以及一些参数,最终可以正常使用,获得了100%的验证集准确率:


[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000078 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 91
[LightGBM] [Info] Number of data points in the train set: 120, number of used features: 4
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.073920
[LightGBM] [Info] Start training from score -1.123930
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Accuracy: 1.0

Code:

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.preprocessing import LabelEncoder

# 读取数据,并将header设定为 None 
iris_df = pd.read_csv("iris/iris.data", header = None)
# print(iris_df.head())

# # 判断空缺值是否存在
# missing_values = iris_df.isnull().any()
# columns_with_missing_values = missing_values[missing_values == True]
# print(columns_with_missing_values)

# 拆分数据集, 分成标签及特征
X = iris_df.drop(4, axis=1)
y = iris_df[4]

# 初始化LabelEncoder
label_encoder = LabelEncoder()
y = pd.DataFrame(label_encoder.fit_transform(y))

# print(X.head())
# print(y.head())

# print(y[0].unique())

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 设置LightGBM的参数
params = {
    'objective': 'multiclass',
    'num_class': 3,  # 三分类问题
    'boosting_type': 'gbdt',
    'metric': 'multi_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

train_data = lgb.Dataset(X_train, label=y_train)
model = lgb.train(params, train_data, num_boost_round=100)

# 预测
y_pred = model.predict(X_test, num_iteration=model.best_iteration)
y_pred_class = np.argmax(y_pred, axis=1)

# 评估模型性能
accuracy = accuracy_score(y_test, y_pred_class)
print(f'Accuracy: {accuracy}')

文件结构如下:

Untitled

posted @ 2023-09-11 09:41  aondw  阅读(62)  评论(0编辑  收藏  举报