- Data Complexity Analysis: Linkage between Context and
Solution in Classification.
Pierre Devijver Award Lecture, The Joint IAPR International Workshops on
Structural and Syntactic Pattern Recognition (SSPR 2008)
and Statistical Techniques in Pattern Recognition (SPR 2008),
December 4-6, 2008, Orlando, FL, USA.
Slides (.ppt) |
Slides of an abbreviated talk at INFORMS 2009 (.ppt)
For a classification problem that is implicitly represented by a training data set, analysis of data complexity provides a linkage between context and solution. Instead of directly optimizing classification accuracy by tuning the learning algorithms, one may seek changes in the data sources and feature transformations to simplify the data geometry. Simplified class geometry benefits learning in a way common to many methods. We review some early results in data complexity analysis, compare these to recent advances in manifold learning, and suggest directions for further research.
- Mirage -- Interactive Pattern Discovery with Large
Invited lecture, Workshop on "Imaging and Optics: Research and Education", Montclair, November 19, 2004. Slides (.ppt) | Self-Running Demo (.ppt)
Advances in digital imaging technologies have led to accumulations of large data archives with rich multimedia contents, enabling both targeted pursuits and open-ended explorations of many kinds. A recent example is the Virtual Observatory that supports sharing of diverse and massive databases containing images, spectra, and catalogs among astronomical researchers. To maximize its advantages, flexible and effective data analysis tools that can handle large data volumes, diverse data types, a wide range of objectives, and highly variable demands on speed are in critical need. We discuss our experiences with Mirage (http://www.cs.bell-labs.com/who/tkh/mirage), a prototypical software for interactive pattern discovery, and its applications in the Virtual Observatory. We focus on how to organize the analysis tool to lay a solid foundation for meeting these requirements and enabling continuous growth.
- Learning with Random Guesses --
Principles of Stochastic Discrimination and Ensemble Learning
Tutorial, the 17th International Conference on Pattern Recognition, Cambridge, UK, August 22, 2004. Repeated in the 18th ICPR in 2006. (I) Principles (.ppt) | (II) Implementations (.ppt) | Notes (.pdf) | Example (.pdf)
Learning in everyday life is often accomplished by making many random guesses and synthesizing the feedback. Kleinberg's analysis of this process resulted in a new method for classifier design -- stochastic discrimination (SD). The method constructs an accurate classifier by combining a large number of very weak discriminators that are generated essentially at random. An important advantage is that classifiers designed by this way are insensitive to overtraining.
SD is an ensemble learning method in an extreme form. Studies on other ensemble methods for classification have long suffered from the difficulty of modeling the complementary strengths of the components. The SD theory addresses this rigorously via mathematical notions of enrichment, uniformity, and projectability.
In this tutorial we explain these concepts via a simple numerical example, with a focus on a fundamental symmetry in point set covering that is the key observation leading to the foundation of the SD theory. We illustrate, step by step, how the SD principle operates in this example. We then describe and discuss more sophisticated implementations for practical uses. We believe a basic understanding of the SD method will open the way to explorations of a new classifier technology, and lead to developments of better tools for analyzing other ensemble methods.