我的毕业设计是关于机器学习的几种算法比较。最后几个月才开始弄,也不知道之前都干什么去了,原来的工作量要更大,最后做不完了才换了这么个题目。好歹动了一下脑子,混了个优秀毕业设计,感觉略有点心虚。亮点大概就是各种比较,实验数据比较多,各种图表之类的,有几个图还挺直观的。
摘要:
本文以经典的手写体阿拉伯数字识别为应用背景,研究基于线性分类器的手写字体特征提取和分类方法,重点研究并完成的主要工作有以下几个方面:
- 用主成份分析对数据集进行特征提取,用累计贡献率衡量降维效果,分析维数和分类器正确率之间的关系。
- 面向多类问题,比较一对一和一对多这两种分解方法对Fisher线性分类器、线性感知器、线性核支持向量机分类正确率和时间复杂度的影响。
- 比较学习因子和最大迭代数对线性感知器分类结果的影响。
- 比较Fisher线性分类器、线性感知器、线性核支持向量机对USPS、MNIST数据集的分类正确率和学习时间。
通过对USPS和MNIST数据集的实验结果,我们得到如下结论:
(A) Fisher线性分类器的学习速度最快;
(B) 线性感知器的学习时间和分类精度均随学习因子和最大迭代数的变化而变化;
(C) 支持向量机的学习时间和样本数关系很大;
(D) 在实现的上述线性分类器中,线性核支持向量机的分类性能相对最好。
关键词:手写字体识别,主成分分析,Fisher线性分类器,线性感知器,线性核支持向量机
Abstract
Taking the classical handwritten digit recognition as the application background, this thesis studies a variety of feature extraction and linear classification methods. The main contributions of this work are as follows:
- Employ the principal component analysis (PCA) as the feature extraction approach. We use the accumulative contribution ratios to measure the dimensionality-reduced effectiveness, and analyze the relationship between the dimensionality and the recognition rates of linear classifiers.
- Compare the influence of the one-against-one (OAO) and one-against-all (OAA) decomposition methods for the multi-class problems on the classification accuracies and time complexities of Fisher linear discriminants (FLDs), linear perceptrons and linear support vector machines (SVMs).
- Compare the influences of learning factors and maximum epochs on the performances of linear perceptrons.
- Compare the classification accuracies and learning time lengths of FLDs, linear perceptrons and linear SVMs for the USPS and MNIST datasets.
By the experimental results for the two above-mentioned datasets, we have come to the following conclusions:
(A) The learning speeds of FLDs are the fastest;
(B) The learning time lengths and accuracies of linear perceptrons change with the learning factors and the maximum epochs;
(C) The learning time durations of linear SVMs are seriously related to the numbers of samples;
(D) The linear SVMs have the best classification accuracies among the implemented linear classifiers.
Keywords:Handwritten digit recognition, Principal component analysis, Fisher linear discriminants, Linear perceptrons, Linear support vector machines