166x Filetype PDF File size 0.84 MB Source: www.stats.ox.ac.uk
Statistical Data Mining B. D. Ripley May2002 c c B.D.Ripley1998–2002. MaterialfromRipley(1996)isB.D.Ripley1996. c MaterialfromVenablesandRipley(1999,2002)isSpringer-Verlag,NewYork 1994–2002. i Introduction This material is partly based on Ripley (1996), Venables & Ripley (1999, 2002) andtheon-linecomplements availableat http://www.stats.ox.ac.uk/pub/MASS4/ Mycopyrightagreements allow me to use the material on courses, but no further distributionis allowed. The S code in this version of the notes was tested with S-PLUS 6.0 for Unix/Linuxand Windows,andS-PLUS 2000 release 3. With minorchanges it workswithRversion 1.5.0. Thespecific add-ons forthe material in thiscourse are available at http://www.stats.ox.ac.uk/pub/bdr/SDM2001/ All the other add-on libraries mentioned are available for Unix and for Win- dows. Compiledversions forS-PLUS 2000 are availablefrom http://www.stats.ox.ac.uk/pub/SWin/ and for S-PLUS 6.x from http://www.stats.ox.ac.uk/pub/MASS4/Winlibs/ ii Contents 1 OverviewofDataMining 1 1.1 Multivariateanalysis ........................ 2 1.2 Graphical methods ......................... 3 1.3 Clusteranalysis........................... 13 1.4 Kohonen’sself organizingmaps .................. 19 1.5 Exploratoryprojectionpursuit ................... 20 1.6 Anexampleofvisualization .................... 23 1.7 Categoricaldata........................... 30 2 Tree-based Methods 36 2.1 Partitioningmethods . . . ..................... 37 2.2 Implementation inrpart ...................... 49 3 Neural Networks 58 3.1 Feed-forwardneuralnetworks ................... 59 3.2 Multiplelogisticregression and discrimination .......... 68 3.3 Neuralnetworksinclassification.................. 69 3.4 Alookatsupportvector machines ................. 76 4 Near-neighbour Methods 79 4.1 Nearest neighbourmethods ..................... 79 4.2 Learningvectorquantization.................... 85 4.3 Forensicglass............................ 88 5 Assessing Performance 91 5.1 Practicalwaysofperformanceassessment............. 91 5.2 Calibrationplots........................... 93 5.3 PerformancesummariesandROCcurves ............. 95 5.4 Assessinggeneralization ...................... 97 References 99 Contents iii Index 105
no reviews yet
Please Login to review.