iOS - 机器学习《三》
前言:
承接上文,上次只是找了些理论资料与代码,最近有时间就写了一个demo,一片崩。。。
简单说下这次的demo吧,还是想做一个自动识别判断影评的模型。
一、构建训练数据
1、我本来准备的数据是这样的
[ { "text":"这部电影真好看","label":"好评" }, { "text":"太烂了","label":"差评" }, { "text":"一般般,不算差也不算好","label":"中评" },
但是这个数据不能直接用于训练,目前iOS的ML不支持中文,所以我把中文转成16进制,好评中评差评转成 字符串0,1,2来区分。
2、代码
//1.读取JSON文件 NSMutableArray *textList = [[NSMutableArray alloc] init]; NSData *JSONData = [NSData dataWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"MLData" ofType:@"json"]]; NSString *str =[[NSString alloc] initWithData:JSONData encoding:NSUTF8StringEncoding]; str = [self removeSpaceAndNewline:str]; // NSLog(@"%@",str); NSError *error; NSArray *dataPathList = [NSJSONSerialization JSONObjectWithData:JSONData options:NSJSONReadingMutableContainers error:&error]; if (dataPathList.count == 0){ NSLog(@"JSON解析失败"); return ; } //2.将JSON文件的汉字处理成64编码 [dataPathList enumerateObjectsUsingBlock:^(NSDictionary *obj, NSUInteger idx, BOOL * _Nonnull stop) { NSString *moviceContent = obj[@"text"];//影评内容 moviceContent = [self hexStringFromString:moviceContent]; NSString *moviceType = obj[@"label"];//影评类型 NSString *movieTypeNum = @"0"; if ([moviceType isEqualToString:@"好评"]) { movieTypeNum = @"2"; } else if ([moviceType isEqualToString:@"中评"]) { movieTypeNum = @"1"; } else if ([moviceType isEqualToString:@"差评"]) { movieTypeNum = @"0"; } NSDictionary *dict = @{@"text":moviceContent,@"label":movieTypeNum}; [textList addObject:dict]; }]; //3.导出JSON文件 NSData *whirtData =[NSJSONSerialization dataWithJSONObject:textList options:NSJSONWritingPrettyPrinted error:0]; [whirtData writeToFile:@"/Users/sunjiaqi/Desktop/appsTrain.json" atomically:YES]; NSLog(@"文件生成成功");
- (NSString *)removeSpaceAndNewline:(NSString *)str { NSString *temp = [str stringByReplacingOccurrencesOfString:@" " withString:@""]; temp = [temp stringByReplacingOccurrencesOfString:@" " withString:@""]; temp = [temp stringByReplacingOccurrencesOfString:@" " withString:@""]; temp = [temp stringByReplacingOccurrencesOfString:@" " withString:@""]; return temp; } - (NSString *)hexStringFromString:(NSString *)string{ NSData *myD = [string dataUsingEncoding:NSUTF8StringEncoding]; Byte *bytes = (Byte *)[myD bytes]; //下面是Byte 转换为16进制。 NSString *hexStr=@""; for(int i=0;i<[myD length];i++) { NSString *newHexStr = [NSString stringWithFormat:@"%x",bytes[i]&0xff];///16进制数 if([newHexStr length]==1) hexStr = [NSString stringWithFormat:@"%@0%@",hexStr,newHexStr]; else hexStr = [NSString stringWithFormat:@"%@%@",hexStr,newHexStr]; } return hexStr; }
注:这个里面有一个坑点,我这个JSON文件里面多了很多换行和空格,导致一直读取不出来里面的数据,折腾了很久。
处理好了之后的JSON文件长这样:
[ { "label" : "2", "text" : "e8bf99e983a8e794b5e5bdb1e79c9fe5a5bde79c8b" }, { "label" : "0", "text" : "e5a4aae78382e4ba86" }, { "label" : "1", "text" : "e4b880e888ace888acefbc8ce4b88de7ae97e5b7aee4b99fe4b88de7ae97e5a5bd" },
接下来就可以拿这份数据来训练生成模型了。
二、生成模型
打开playgroud,直接上代码运行
import Cocoa import CreateMLUI import CreateML var str = "Hello, playground" //let builder = MLImageClassifierBuilder() //builder.showInLiveView() //训练源地址 let data = try MLDataTable(contentsOf: URL(fileURLWithPath: "/Users/sunjiaqi/Desktop/appsTrain.json")) //导入训练源数据 let sentimentClassifier = try MLTextClassifier(trainingData: data, textColumn:"text", labelColumn: "label") //评估模型准确度 //let evaluationMetrics = sentimentClassifier.evaluation(on: data, textColumn: "text", labelColumn: "label") //let evaluationAccuracy = (1.0 - evaluationMetrics.classificationError) * 100 //print("evaluationAccuracy:\(evaluationAccuracy)") //导出模型 let metadata = MLModelMetadata(author: "命无双", shortDescription: "这是一个判断影评的模型", version: "1.0") try sentimentClassifier.write(to: URL(fileURLWithPath: "/Users/sunjiaqi/Desktop/导出的模型/SentimentClassifier.mlmodel"), metadata: metadata)
注:这就是最终的模型了,将它导入到demo里面就可以使用了。
三、使用模型
1、导入模型,直接拖入就好
2、构建模型工具类
#import <Foundation/Foundation.h> NS_ASSUME_NONNULL_BEGIN @interface SentimentClassifierModel : NSObject + (NSString *)judgeMoviceContentWith:(NSString *)content; @end NS_ASSUME_NONNULL_END
#import "SentimentClassifierModel.h" #import "SentimentClassifier.h" @implementation SentimentClassifierModel + (SentimentClassifier *)model { auto bundle = [NSBundle bundleForClass:SentimentClassifier.class]; auto mlmodelcURL = [bundle URLForResource:@"SentimentClassifier" withExtension:@"mlmodelc"]; if (mlmodelcURL) { return [SentimentClassifier new]; } auto modelPath = [bundle pathForResource:@"SentimentClassifier" ofType:@"mlmodel"]; if (!modelPath) return nil; auto modelURL = [NSURL fileURLWithPath:modelPath]; mlmodelcURL = [MLModel compileModelAtURL:modelURL error:nil]; if (!mlmodelcURL) return nil; auto model = [[SentimentClassifier alloc] initWithContentsOfURL:mlmodelcURL error:nil]; return model; } + (NSString *)judgeMoviceContentWith:(NSString *)content { NSString *judgeResult = @"未识别";//0-差评 ,1-中评, 2- 好评,3- 识别失败 auto model = [self model]; //处理content,转成16进制进行模型判断 content = [self hexStringFromString:content]; auto result = [model predictionFromText:content error:nil]; NSLog(@"result:%@",result.label); if ([result.label isEqual:@"0"]) { judgeResult = @"差评"; } else if ([result.label isEqual:@"1"]) { judgeResult = @"中评"; } else if ([result.label isEqual:@"2"]) { judgeResult = @"好评"; } return judgeResult; } + (NSString *)hexStringFromString:(NSString *)string{ NSData *myD = [string dataUsingEncoding:NSUTF8StringEncoding]; Byte *bytes = (Byte *)[myD bytes]; NSString *hexStr=@""; for(int i=0;i<[myD length];i++) { NSString *newHexStr = [NSString stringWithFormat:@"%x",bytes[i]&0xff];///16进制数 if([newHexStr length]==1) hexStr = [NSString stringWithFormat:@"%@0%@",hexStr,newHexStr]; else hexStr = [NSString stringWithFormat:@"%@%@",hexStr,newHexStr]; } return hexStr; }
注:这边注意一个点是传入的影评要先转成16进制再来判断。
四、总结
1、训练源里面的数据来测试是准确的,其他的数据的准确率有点感人,毕竟没有任何的算法,测试数据也少,优化的点很多。
2、考虑分词来优化模型准确率,在模型判断之前就对影评做一次判断,寻找文本之间的相同点。
3、模型只是给你一个判断,判断条件也就是特征要明确,这个关系到模型的准确率,。