I believe deep insights that are critical for significant algorithmic advances can only come from intensive studies of real-world data originating in carefully chosen domains, and must undergo tests in serious applications. Over the years I have been following such a path.
On the algorithm side, I studied some geometrical and nonparametric statistical methods applicable to problems in very high dimensional feature spaces, and combinatorial approaches to classification. My topics include distribution maps, random decision forests, stochastic discrimination, and combination and coordination of multiple classifiers. These algorithms are applicable to classification tasks in any domain, such as image and speech processing, digital libraries, information retrieval, medical diagnosis, scientific data analysis, and financial engineering. Recently I have been looking into ways for characterizing the complexity of classification problems and relating that to classifier behavior, and finding unifying themes in various methods for unsupervised learning.
On the application side, I am looking into a number of data analysis problems covering several areas of science and engineering. The problems originated from network traffic analysis, telecommunication engineering, multimedia information processing, computational physics, and astronomy. Generally they involve modeling, visualization, and retrieval of numerical data in very high dimensional spaces. I seek to understand and meet with the unique challenges imposed by each application area, and develop algorithms and practical tools for both interactive and automated analyses.
Before, I worked on optical character recognition, concentrating on developing symbol classifiers and contextual analysis methods. I carried out several large-scale simulation studies to address issues like estimation of intrinsic error rate, asymptotic accuracy of classifiers, and systematic evaluation of classifiers. I also studied recognition strategies that exploit the contextual information in a text page. These include word-based recognition methods, and image enhancement by clustering and averaging. Later I focused on adaptive methods, like font learning by identifying stop words, and text recognition without shape training . I also studied text categorization as a way to organize documents for an information retrieval system.
Complementary to analysis of real-world observations is the attempt to model and simulate the physical processes that generate the data. I pursue an interest in this as well. A recent integration of these interests led to my use of pattern recognition methods in analyzing the classical mathematical models that describe complex photonics systems.