【笔记】三张图读懂机器学习:基本概念、五大流派与九种常见算法Chapter 1: A look at Machine learning1.What is it?2.How does machine learning relate to artificial intelligence?3.How machine learning works?4.How machine learning fits in?5.Machine learning in practiceChapter 2: A look at Machine learning evolutionWhat are the five tribes?Chapter 3: A look at Machine learning methods1.Decision trees2.Support vector machines3.Regression4.Naive Bayes classification5.Hidden Markow models6.Random forest7.Recurrent neural networks8.Long short-term memory & gated recurrent unit neural networks (门控循环单元神经网络)9.Convolutional neural networks (卷积神经网络)




Chapter 1: A look at Machine learning

1.What is it?

Machines can “learn” by analyzing large amounts of data.

2.How does machine learning relate to artificial intelligence?

Machine learning is a category of research and algorithms focused on finding patterns in data and using those patterns to make predictions. Machine learning falls within the artificial intelligence (AI) umbrella, which in turn intersects with the broader field of knowledge discovery and data mining.

intersect: 贯穿;横贯

3.How machine learning works?

Select data

Split the data you have into three groups: training data, validation data, and test data.

Model data

Use the training data to build the model using the relevant features.

Validate model

Assess the model with your validation data.

Test model

Check performance of the validated model with your test data.

Use the model

Deploy the fully trained model to make predictions on new data.

Tune model(调优模型)

Improve performance of the algorithm with more data, different features, or adjusted parameters.

validate: 确证;验证

deploy: 部署;利用

tune: 曲调;调整

4.How machine learning fits in?

Traditional programming

The software engineer writes a program that solves a problem.

Data => Software engineer writes a procedure that tells the machine what to do to solve the problem. => Computer follows the procedure and generates a result.


An analyst compares the relationships of variables.

Machine learning

A data scientist uses a training data set to teach the computer what to do, and the system carries out the tasks.

Big data => The machine learns to classify with the help of a training data set and tunes a specific alorithm to the desired classification. => The computer learns to identify relationships, trends, and patterns in the data.

Intelligent apps

Intelligent apps leverage the outputs of AI, as in this precision farming example that uses drone-based data collection.

carry out: 执行;履行;进行

leverage: use (something) to maximum advantage. 最大限度地利用,最优化使用

drone: 无人机

5.Machine learning in practice

For example:

Rapid 3D mapping and modelingEnhanced profiling to mitigate risksPredicting the top performers

profiling: (对个人心理、行为特征的)剖析研究(以评定或预测其在某领域潜力或认识某一种人)

mitigate: 减轻;使缓和

Chapter 2: A look at Machine learning evolution

For decades, individual “tribes” of artificial intelligence researchers have vied with one another for dominance. Is the time ripe now for tribes to collaborate? They may forced to, as collaboration and algorithm blending are the only ways to reach true artificial general intelligence (AGI). Here’s a look back at how machine learning methods have evolved and what the future may look like.

tribe: 部落;流派

vie: 竞争;相争

ripe: 成熟的;适合……的

blend: 使混合;使交融

What are the five tribes?


Use symbols, rules, and logic to represent knowledge and draw logical inferenceFavored algorithm: Rules and decision trees, inverse deduction


Assess the likelihood of occurrence for probabilisitic inferenceFavored algorithm: Naive Bayes (朴素贝叶斯) or Markov (马尔可夫)


Recognize and generalize patterns dynamically with matrices of probabilistic, weighted neurons.Favored algorithm: Neural networks, backpropagation


Generate variations and then assess the fitness of each for a given purposeFavored algorithm: Genetic programs (遗传算法)


Optimize a functionin light of constraints (“going as high as you can while staying on the road”)Favored algorithm: Support vectors

inference: 推理

likelihood: 可能性

occurrence: 发生的事;事件;发生频率;存在

probabilisitic: 盖然性的;可能性的; 概率的

matrices: matrix的复数

neuron: 神经元

generalize: 归纳

variation: 变化


Pedro Domingos总结了五大流派目前存在的问题和解决方案,但他也重点强调,我们真正需要的是可以一次性解决这些所有问题的统一算法。








Chapter 3: A look at Machine learning methods

Which machine learning algorithm should you use? A lot depends on the characteristics and the amount of the available data, as well as your training goals, in each particular use case. Avoid using the most complicated algorithms unless the end justifies more expensive means and resources. Here are some of the more common algorithms ranked by ease of use.

1.Decision trees

Decision tree analysis typically uses a hierarchy of variables or decision nodes that, when answered step by step, can classify a given customer as creditworthy or not, for example.


Decision trees are useful when evaluating lists of distinct features, qualities, or characteristics of people, places, or things.

Use cases

Rule-based credit risk assessment, horse race performance prediction

distinct: 可辩别的;有区别的;不同的;明显的;清楚无误的;明确的

2.Support vector machines

Support vector machines classify groups of data with the help of hyperplanes


Support vector machines are good for the binary classification of X versus other variables and are useful whether or not the relationship between variables is linear.

Use cases

News categorization, handwriting recognition

hyperplane: 超平面


Regression maps the behavior of a dependent variable relative to one or more dependent variables. In this example, logistic regression separates spam from non-spam text.


Regression is useful for identifying continuous (not necessarily distinct) relationships between variables.

Use cases

Traffic flow analysis, email filtering

map: v.勾画;绘制

dependent variable: 因变量

spam: 垃圾邮件

4.Naive Bayes classification

Naive Bayes classifiers compute probabilities, given tree branches of possible conditions. Each individual feature is “naive” or conditionally independent of, and therefore does not influence, the others. For example, what’s the probability you would draw two yellow marbles in a row, given a jar of five yellow and red marbles total? The probability, following the topmost branch of two yellow in a row, is one in ten. Naive Bayes classifiers compute the combined, conditional probabilities of multiple attributes.


Naive Bayes methods allow the quick classification of relevant items in small data sets that have distinct features.

Use cases

Sentiment analysis, consumer segmentation

classification: 分类器

marble: 弹子游戏

in a row: 连续地

segmentation: 分割;划分

5.Hidden Markow models

Observable Markov processes are purely deterministic–one given state always follows another given state. Traffic light patterns are an example.

Hidden Markov models, by contrast, compute the probability of hidden states occurring by analyzing observable data, and then estimating the likely pattern of future observation with the help of the hidden state analysis. In this example, the probability of high or low pressure (the hidden state) is used to predict the likelihood of sunny, rainy, or cloudy weather.


Tolerates data variability and effective for recognition and prediction.

Use cases

Facial expression analysis, weather prediction

observable: 显著的;显式的;可观察的

deterministic: 确定性

6.Random forest

Random forest algorithms improve the accuracy of decision trees by using multiple trees with randomly selected subsets of data. This example reviews the expression levels of various genes associated with breast cancer relapse and computes a relapse risk.


Random forest methods prove useful with large data sets and items that have numerous and sometimes irrelevant features.

Use cases

Customer churn analysis, risk assessment

subset: 子集

relapse: 重新恶化;复发

numerous: 许多的

churn: 搅;翻腾;流失

7.Recurrent neural networks

Each neuron in any neural network converts many inputs into single outputs via one or more hidden layers. Recurrent neural networks [RNNs] additionally pass values from step to step, making step-by-step learning possible. In other words, RNNs have a form of memory, allowing previous outputs to affect subsequent inputs.


Recurrent neural networks have predictive power when used with large amounts of sequenced information.

Use cases

Image classification and captioning, political sentiment analysis

caption: 给(插图)加标题(或说明)

8.Long short-term memory & gated recurrent unit neural networks (门控循环单元神经网络)

Older forms of RNNs can be lossy. While these older recurrent neural networks only allow small amounts of older information to persist, newer long short-term memory (LSTM) and gated recurrent unit (GRU) neural networks have both long- and short-term memory. In other words, these newer RNNs have greater memory control, allowing previous values to persist or to be reset as necessary for many sequences of steps, avoiding “gradient decay” or eventual degradation of the values passed from step to step. LSTM and GRU networks make this memory control possible with memory blocks and structures called gates that pass or reset values as appropriate.


Long short-term memory and gated recurrent unit neural networks have the same advantages as other recurrent neural networks and are more frequently used than other recurrent neural networks because of their greater memory capablilities.

Use cases

Natural language processing, translation

lossy: (压缩)有损的

persist: 坚持;持续;延续

gradient: (数学)梯度;斜率

decay: 衰减

degradation: 降级;退化

9.Convolutional neural networks (卷积神经网络)

Convolutions are blends of weights from a subsequent layer that are used to label the output layer.


Convolutional neural networks are most useful with very large data sets, large numbers of features, and complex classification tasks.

Use cases

Image recognition, text to speech, drug discovery




