机器学习笔记
最近在看吴恩达的机器学习视频,这里写下自己的理解。
第一阶段 Introduction
定义
机器学习最早的定义是由Arthur Samuel (1959)提出的:
Field of study that gives computers the ability to learn without being explicitly programmed.
另一个更加准确的定义是
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
比如在一个预测垃圾邮件的机器学习算法中,E代表学习垃圾/非垃圾邮件的标题,T代表将垃圾邮件分类,P代表分类的正确率
用处
里面列举了这四大类:
- Database mining .Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
数据挖掘,不能手动编写的程序,产品推荐系统,理解人类的学习...
算法
然后将算法分成两大类:监督学习( Supervised Learning)和非监督学习(Unsupervised Learning)
监督学习意思是通过给定”正确的“数据集来预测输入数据对应的输出数据。比如预测房价(这是个连续集合,虽然房价本质上也是离散的,叫做回归算法),比如预测是否患有乳腺癌(这是个离散集合,只有几个可选答案,叫做分类算法)
非监督学习目的是将给定数据集分类,比如新闻聚合,基因分类,将目标客户分入细分市场,社交网络分析等。给了一个有意思的例子是”鸡尾酒会问题“,通过分离背景音乐和两种语言,讲两个人说话的声音分离开来。