在发出《.NET Core玩转机器学习》和《使用ML.NET预测纽约出租车费》两文后,相信读者朋友们即使在不明就里的情况下,也能按照内容顺利跑完代码运行出结果,对使用.NET Core和ML.NET,以及机器学习的效果有了初步感知。得到这些体验后,那么就需要回头小结一下了,本文仍然基于一个情感分析的案例,以刚接触机器学习的.NET开发者的视角,侧重展开一下起手ML.NET的基本理解和步骤。
1. 描述问题产生的场景
2. 针对特定场景收集数据
3. 对数据预处理
4. 确定模型(算法)进行训练
5. 对训练好的模型进行验证和调优
6. 使用模型进行预测分析
UC Irvine Machine Learning Repository来自加州大学
这次我从UCI找到一个刚好只是每行有一个句子加一个标签,并且标签已标注好每个句子是正向还是负向的数据集了。在Sentiment Labelled Sentences Data Set下载。格式类似如下:
A very, very, very slow-moving, aimless movie about a distressed, drifting young man. 0 Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out. 0 Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. 0 Very little music or anything to speak of. 0 The best scene in the movie was when Gerardo is trying to find a song that keeps running through his head. 1 The rest of the movie lacks art, charm, meaning... If it's about emptiness, it works I guess because it's empty. 0 Wasted two hours. 0 ...
const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt"; const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt"; public class SentimentData { [Column(ordinal: "0")] public string SentimentText; [Column(ordinal: "1", name: "Label")] public float Sentiment; } var pipeline = new LearningPipeline(); pipeline.Add(new TextLoader<SentimentData>(_dataPath, useHeader: false, separator: "tab")); pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
public class SentimentPrediction { [ColumnName("PredictedLabel")] public bool Sentiment; } pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 }); PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab"); var evaluator = new BinaryClassificationEvaluator(); BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData); Console.WriteLine(); Console.WriteLine("PredictionModel quality metrics evaluation"); Console.WriteLine("------------------------------------------"); Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}"); Console.WriteLine($"Auc: {metrics.Auc:P2}"); Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
像Accuracy,Auc,F1Score都是一些常见的评价指标,包含了正确率、误差一类的得分,如果得分很低,就需要调整前一个步骤中定义模型时的参数值。详细的解释参考:Machine learning glossary
IEnumerable<SentimentData> sentiments = new[] { new SentimentData { SentimentText = "Contoso's 11 is a wonderful experience", Sentiment = 0 }, new SentimentData { SentimentText = "The acting in this movie is very bad", Sentiment = 0 }, new SentimentData { SentimentText = "Joe versus the Volcano Coffee Company is a great film.", Sentiment = 0 } }; IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments); Console.WriteLine(); Console.WriteLine("Sentiment Predictions"); Console.WriteLine("---------------------"); var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction)); foreach (var item in sentimentsAndPredictions) { Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}"); }
顺便提一下,微软Azure还有一个机器学习的在线工作室,链接地址为:https://studio.azureml.net/,相关的AI项目库在:https://gallery.azure.ai/browse,对于暂时无法安装本地机器学习环境,以及找不到练手项目的朋友 ,不妨试试这个。
using System; using Microsoft.ML.Models; using Microsoft.ML.Runtime; using Microsoft.ML.Runtime.Api; using Microsoft.ML.Trainers; using Microsoft.ML.Transforms; using System.Collections.Generic; using System.Linq; using Microsoft.ML; namespace SentimentAnalysis { class Program { const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt"; const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt"; public class SentimentData { [Column(ordinal: "0")] public string SentimentText; [Column(ordinal: "1", name: "Label")] public float Sentiment; } public class SentimentPrediction { [ColumnName("PredictedLabel")] public bool Sentiment; } public static PredictionModel<SentimentData, SentimentPrediction> Train() { var pipeline = new LearningPipeline(); pipeline.Add(new TextLoader<SentimentData>(_dataPath, useHeader: false, separator: "tab")); pipeline.Add(new TextFeaturizer("Features", "SentimentText")); pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 }); PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>(); return model; } public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model) { var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab"); var evaluator = new BinaryClassificationEvaluator(); BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData); Console.WriteLine(); Console.WriteLine("PredictionModel quality metrics evaluation"); Console.WriteLine("------------------------------------------"); Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}"); Console.WriteLine($"Auc: {metrics.Auc:P2}"); Console.WriteLine($"F1Score: {metrics.F1Score:P2}"); } public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model) { IEnumerable<SentimentData> sentiments = new[] { new SentimentData { SentimentText = "Contoso's 11 is a wonderful experience", Sentiment = 0 }, new SentimentData { SentimentText = "The acting in this movie is very bad", Sentiment = 0 }, new SentimentData { SentimentText = "Joe versus the Volcano Coffee Company is a great film.", Sentiment = 0 } }; IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments); Console.WriteLine(); Console.WriteLine("Sentiment Predictions"); Console.WriteLine("---------------------"); var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction)); foreach (var item in sentimentsAndPredictions) { Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}"); } Console.WriteLine(); } static void Main(string[] args) { var model = Train(); Evaluate(model); Predict(model); } } }