机器学习——监督学习与无监督学习基础
1、Machine Learning
- Grow out of work in AI
- New capability for computers
2、Examples
- Database mining
Large database from growth of automation/web
Eg. Web click data, medical records,biology, engineering - Applications that can't program by hand
Eg. Autonomous helicopter, handwriting recognition,most of Natural Language Processing(NLP),Computer Vision - Self-customizing programs(私人定制程序)
Eg. Amzon, Netflix product recommendations
3、definition
- Field of study that gives computers the ablility to learn without being explicitly programed.
- learn from experience E, Task T, performance measure P.
if its performance on T, as measured by p, improves with experience E.
4、Machine learning algorithms
- Supervised learing(监督学习):程序编写者教计算机进行学习
The model or algorithm is presented with example inputs and their desired outputs and then finding patterns and connections between the input and the output. The goal is to learn a general rule that maps inputs to outputs. The training process continues until the model achieves the desired level of accuracy on the training data.- Image Classification
- Market Prediction/Regression
- Unsupervised learning(无监督学习):计算机自己进行学习
No labels are given to the learning algorithm, leaving it on its own to find structure in its input. It is used for clustering population in different groups. Unsupervised learning can be a goal in itself (discovering hidden patterns in data).- Clustering:You ask the computer to separate similar data into clusters, this is essential in research and science.
- High Dimension Visualization
- Generative Models
reinforcement learning:强化学习
rencommender systems:推荐系统
5、Supervised learning
数据集中的变量有定量描述
- regression problem(回归问题):定量输出,连续变量预测
- classification problem:定性输出,离散变量预测
6、Unsupervised learning
给一部分数据集,不知道数据集要用来干啥,让计算机自己去挖掘数据特征
- clustering algorithm(聚类算法):将给定的数据集分为一些簇(cluster)
Eg. 爬取数据(不知道数据是干啥的,不知道数据有哪些类型),让计算机找出其数据结构,将其分成一些cluster
Eg. Organize computer clusters, social network analysis, market segmentation, astronomical data analysis - cocktail party algorithm(鸡尾酒会算法)
- 练习
选项2,3为unsupervised learning, 选项1,4为supervised learning
7、编程软件Octave
GNU Octave是一种采用高级编程语言的主要用于数值分析的软件。Octave有助于以数值方式解决线性和非线性问题,并使用与MATLAB兼容的语言进行其他数值实验。它也可以作为面向批处理的语言使用。因为它是GNU计划的一部分,所以它是GNU通用公共许可证条款下的自由软件。