Neural Networks and Other Numerical Learning Techniques for Pattern Recognition in Large Data Sets

Yann LeCun, Yoshua Bengio, Leon Bottou, Corinna Cortes, and Vladimir Vapnik (AT&T Labs Research, Red Bank, NJ)


Among all the machine learning algorithms proposed in recent years,
only a few can cope with high-dimensional input vectors and very large
numbers of samples. Learning processes can usually be described as
minimizing an objective function that measures the performance of the
machine. Learning algorithms are characterized by the type of the
objective function they minimize (discrete, continuous, convex), and
the method they use to find the minimum (combinatorial,
gradient-based, direct). Gradient-based learning algorithms such as
Neural Networks and Support Vector Machines have been successfully
applied to pattern recognition problems with hundred of input
variables and hundreds of thousands of samples. Neural Networks
minimize a non-convex objective functions with gradient-based
techniques, while Support Vector Machines minimize a quadratic
objective function with linear constraints. Both methods can learn
high-dimensional, non-linear decision surfaces with tens of thousands
of free parameters.  However, large databases are often very
redundant, and the relevant information is difficult to
extract. Emphasizing methods, such as boosting and active set
techniques, address that problem by increasing the relative
statistical weight of difficult or non-typical samples. Applications
of those methods to pattern recognition, fault prediction, and
database marketing will be presented.