Early Diagnosis of Alzheimer’s Disease using Deep Learning and Ensemble Learning Research
本篇文章为 Early Diagnosis of Alzheimer’s Disease using Deep Learning and Ensemble Learning 的 research 日志。
Week 1
考虑个人没有生物学基础,所以要先从 Alzheimer Disease (后文简称 AD) 本身开始了解。
通过阅读以下文章了解 AD 的诊断过程并做出总结:
- Alzheimer’s Disease facts and figures
- Alzheimer's Disease: How It’s Diagnosed
- Early Detection of Alzheimer’s Disease Using Magnetic Resonance Imaging: A Novel Approach Combining Convolutional Neural Networks and Ensemble Learning
- Wikipedia 上的 AD 相关内容
初步发现关于诊断 AD 最困难的问题,本质在于并不知道该病的根源,医生无法通过从一个固定的方向进行诊断。
具体总结在 AD background reading
那么对于这种情况,我们可以通过较大的数据集分析出病状的 common feature 去找到 AD 患者的常见症状以及身体状况。
此时便可以用到 deep learning 和 ensemble learning。
为了建立对于该项目的初步了解,我阅读了 A Deep Learning Model to Predict a Diagnosis of Alzheimer Disease by Using 18F-FDG PET of the Brain
该篇文章针对 75.8 个月的患者数据集,采集 18F-FDG PET brain images,通过 deep learning model 发现该放射性 元素关于 AD 的 Specificity 和 Sensitvity。
鉴于对专业术语和背景知识的不够了解,对本篇文章的解读并不够完善,可能还需要进一步学习。
Week 2
根据阅读之前那篇文章不够理解的部分进行学习
通过阅读 Evaluating Categorical Models II: Sensitivity and Specificity 我们学习到:
-
Sensitivity
- the metric that evaluates a model’s ability to predict true positives of each available category
- $ = \frac{\texttt{True Positives}}{\texttt{True Positives + False Negatives}}$
- the metric that evaluates a model’s ability to predict true positives of each available category
-
Specificity
- the metric that evaluates a model’s ability to predict true negatives of each available category.
- = \(\frac{\texttt{True Negatives}}{\texttt{True Negatives + False Positives}}\)
- the metric that evaluates a model’s ability to predict true negatives of each available category.
在之前那篇文章中还有 F1 Score 和 ROC curve 的概念:
-
F1 Score (F-score, F-measure)
- measure of a test's accuracy
- Precision: True Positive divided by all positive results
- \(=\frac{\texttt{True Positives}}{\texttt{True Positives+False Positives}}\)
- Recall: True Positive divided by results that should have been postive
- \(=\frac{\texttt{True Positives}}{\texttt{True Positives+False Positives}}\)
- \(F_1\) Score is calculated from the precision and recall of the test by the equation:
- \(F_1 = \frac{2}{\texttt{recall}^{-1}+\texttt{precision}^{-1}}=2\times \frac{\texttt{precision}\times\texttt{recall}}{\texttt{precision}+\texttt{recall}}=\frac{\text{tp}}{\text{tp}+\frac{1}{2}\text{(fp+fn)}}\)
- \(F_\beta\) Score is a more general F score, \(\beta\) is chosen such that recall is considered \(\beta\) times as important as precision.
- \(F_\beta=(1+\beta)^2\times\frac{\texttt{precision}\times\texttt{recall}}{(\beta^2\times\texttt{precision})+\texttt{recall}}\)
-
ROC curve (receiver operating characteristic curve)
- The curve plots two parameters: True Positive Rate & False Positive Rate
- TPR
- \(=\frac{\text{TP}}{\text{TP+FN}}\)
- FPR
- \(=\frac{\text{FP}}{\text{FP+TN}}\)
- A ROC curve plots TPR vs. FPR at different classification thresholds. If a point is more close to the top left corner, it has better prediction.
-
AUC (Area under the Roc Curve)
- AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.
-
Transfer Learning
- Generally, TL is to store knowledge gained while solving one problem and apply it to another related problem.
- Reusing data and transfering information from previously learned tasks for the learning of new tasks help improve the sample efficiency of a reinforcement learning agent.
- Generally, TL is to store knowledge gained while solving one problem and apply it to another related problem.
有一个 machine learning algorithms 相关的 python library 叫做 Scikit-Learn。
如果日后要做相关编程及研究,对该 package 的使用至关重要。
Tutorial: Introducing-Scikit-Learn.ipynb