175x Filetype PDF File size 0.18 MB Source: www.cs.auckland.ac.nz
Review for Ensemble Methods in Machine Learning, Thomas G. Dietterich Summary Ensemble learning is method of combining a set of classifiers’ decision somehow in the sake of more accurate pronouncement. The criterions for ensemble methods work better than any individual combined of it are each individual hypothesis has to be accurate, at least 50% accurate, and to be diverse, therefore the error made by any classifier is uncommon within all of them, and so the majority vote will be able to correct this error. Three essences that make the ensemble method appearance: 1.Statisical- the finite number of samples cause the learning algorithm unable to solve some uncertainties to generate a concise hypothesis but rather a number of potential equally good hypotheses, choose the vote answer from a combination could avoid the risk of selecting one from a bad hypothesis. 2. Computational- many learning algorithm will be stuck in the local optima, a multiply search path in the hypotheses space may somewhat increase the chance of finding the global optima. 3. Representational- a hypothesis is limited by the knowledge representation of the learning algorithm, and the weighted combination of hypotheses mays extend the representative power. The author illustrates different methods for assembling ensembles, includes: Enumerating the hypotheses-syndicate the possible hypothesis to make a final decision, Manipulating Training examples- resampling the training to generate multiply hypotheses, Manipulating Input Features-selecting different subset for multiply training, Manipulating the Output Targets- creating multiply hypotheses with regard to the different grouped targets, Injecting Randomness- adding randomness into learning algorithms. The comparisons of the performance of C4.5, adaboost, bagging, and randomized tree ensemble method are shown. And the result explain that when the problem is not so complexity then the 3 reasons for ensemble is absent, therefore a single classifier can handle very well, otherwise the ensemble method could provide better results. In general, adaboost has best performance when the training set contains little noise, otherwise over-fit the noise, the author discussed that the nature of adaboost that aggressively extend the margin of the coverage should be easily overfitting, but the stage-wise prevents this happen more often. Critic This article is a survey more than a research paper, although it shows some experiment results regard to the performance of different ensemble methods. A survey is a paper that provides the new coming some helpful information of the particular topic. This article is presented nicely in a reasonable layout that will enhance its readability and informative. In the introduction, it explains what ensemble methods in machine learning are and how it may work. Then the three fundamental reasons shows the motivations of the ensemble which indicates the problems in most of machine learning, and hence increase the importance of ensemble method and attract the audiences to further reading. Then the methods of constructing the ensemble are illustrated therefore it provides the reader the information of the research achievements of ensemble method, and the information has practical usefulness. Then the comparisons of ensemble methods indicate the limitation and advantages of different kind of ensemble methods. For this well-formed structure, the readers can have a more concrete understanding of ensemble learning. Review for Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost, Vikas C. Raykar, Balaji Krishnapuram, Shipeng Yu Summary This research paper proposed a new method for training cascades of classifiers called soft cascades in contracted to traditional cascades. It stated that the conventional method has 3 problems that can be solved by using the proposed method: Joint training of all stages-a cascade is generally train sequentially, but for a soft cascade, it is available to train once, and the thresholds for each classifier can be trained as a post-processing step; Tradeoff between accuracy and cost – traditional cascade classifiers have no explicitly concerns about the accuracy and the cost, but this method can be used to stress different needs. Computation cost of training- the post-processing step for adjusting thresholds could reduce the computational, but a hard cascade has to be retrained for every new thresholds. In this paper, section 2 gives basis information of a cascade of classifier, and then the keys of soft cascade are shown: a soft cascade rejects instances based on the posterior class probability evidenced by the classifier for that stage, and the positive instance could only be classified after it passing through all the stages. As a soft cascade only trains once, the optimization of all stages at the same time requires that each stage emphasise different types of false positive in order to optimize the accuracy of the whole cascade. Then the writer showed the method for training the cascade, the training process is majorly involving of finding the maximum likelihood estimate for the parameters of linear classifier. To provide a better estimation, the maximum a-posteriori is used. In order to address the cost, a parameter for the expected cost is added to the maximum a-posteriori equation. Similarly a parameter for the accuracy is also inserted. To prove their novel method is more efficient, the writers conduct several experiments with medical datasets, which typically have high cost for feature acquisition. And the results show that the accuracy of soft cascade is generally little lower than the best one, but it can dramatically reduce the feature acquisition cost, in hundreds times. Critic There are several issues that would decrease the readability and comprehensibility of the paper: 1. the term soft cascade was not explained in the context of its first occurrence, the reader has to read several times back and forth, which could arise the difficulty of understanding. 2. The authors claim that the computational cost problem of hard cascade could be solved by the proposed method, which may not be necessary. The proposed in order to optimization all stages simultaneously, this could be required more complex computation, and the post-processing step for computing the thresholds does not occur in hard cascade, so the summation of these could excess the computation cost of a hard cascade. I assume that the datasets have little noises, because the Adaboost is very sensitive to noise, but the results show that it can achieve a high performance in these datasets. It is nice to provide a noisy dataset to prove that the accuracy and cost tradeoff mechanism could handle well in such situations, because in many causes the accuracy is heavily affected by noises, so a few tradeoff from accuracy to cost can result huge decreasing. Although some issues exist, this paper is informative. The experimental datasets chosen from a field that can emphasize the cost give the paper a better persuasion towards its importance. Tradeoff between Machine Learning and Pattern Recognition Before discussing the tradeoff, what is the difference between Machine learning and Pattern Recognition has to be identified. “Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field”, Christopher-‐M-‐Bishop [textbook]. Figure 1. Artificial Intelligence From the figure, we can see that pattern recognition is subfield of AI that applies machine learning and statistics methodology to solve the problems of finding hidden patterns in the targets. It generally has broader applications than machine learning. Wikipedia describes the pattern recognition is based on the probability theory; therefore most of its pattern recognition algorithms has the probabilistic nature. Other algorithms from machine learning’s outcome are deterministic. Probabilistic based Pattern recognition algorithms can output result with an associated confidence value that are mathematically grounded by probability theory, and this value can also be used by a different probability theory based algorithms. Sometimes, when it has a confidence value under some thresholds, it could decline to provide a valid output. In contrast, general machine learning algorithm would still provide the “best” decision, no matter the fact that it may be a decision little better than the worst assumption. Because it is probabilistic-‐based, it can naturally tackle the problems of uncertainty propagation better, especially for large tasks contain lots of uncertainties. But as this probability is generated out of some distribution function, the searching
no reviews yet
Please Login to review.