机器学习笔记

最近在看吴恩达的机器学习视频,这里写下自己的理解。

 

第一阶段 Introduction

 

定义

机器学习最早的定义是由Arthur Samuel (1959)提出的:

Field of study that gives computers the ability to learn without being explicitly programmed. 

另一个更加准确的定义是

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 

比如在一个预测垃圾邮件的机器学习算法中,E代表学习垃圾/非垃圾邮件的标题,T代表将垃圾邮件分类,P代表分类的正确率

 

用处

里面列举了这四大类:

- Database mining .Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering

- Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.

- Self-customizing programs E.g., Amazon, Netflix product recommendations

- Understanding human learning (brain, real AI).

数据挖掘,不能手动编写的程序,产品推荐系统,理解人类的学习...

 

算法

然后将算法分成两大类:监督学习( Supervised Learning)和非监督学习(Unsupervised Learning)

监督学习意思是通过给定”正确的“数据集来预测输入数据对应的输出数据。比如预测房价(这是个连续集合,虽然房价本质上也是离散的,叫做回归算法),比如预测是否患有乳腺癌(这是个离散集合,只有几个可选答案,叫做分类算法)

 

非监督学习目的是将给定数据集分类,比如新闻聚合,基因分类,将目标客户分入细分市场,社交网络分析等。给了一个有意思的例子是”鸡尾酒会问题“,通过分离背景音乐和两种语言,讲两个人说话的声音分离开来。

 

posted @ 2018-05-15 17:33  andrew-chen  阅读(120)  评论(0编辑  收藏  举报