Efficient Learning using Constrained Sufficient Statistics

Authors:

Nir Friedman
Institute of Computer Science
The Hebrew University
Ross building, Givat Ram
Jerusalem 91904 ISRAEL
E-mail: nir@cs.huji.ac.il
Phone: 972-2-658-4720
Fax: 972-2-658-5439

Lise Getoor
Computer Science Department
Gates Building, 1A-126
Stanford University
Stanford CA 94305-9010
E-mail: getoor@cs.stanford.edu
Phone: 650-725-8784
Fax: 650-725-1449

Abstract:

Learning Bayesian networks is a central problem for pattern recognition, density estimation and classification. In this paper, we propose a new method for speeding up the computational process of learning Bayesian network structure. This approach uses constraints imposed by the statistics already collected from the data to guide the learning algorithm. This allows us to reduce the number of statistics collected during learning and thus speed up the learning time. We show that our method is capable of learning structure from data more efficiently than traditional approaches. Our technique is of particular importance when the size of the datasets is large or when learning from incomplete data. The basic technique that we introduce is general and can be used to improve learning performance in many settings where sufficient statistics must be computed. In addition, our technique may be useful for alternate search strategies such as branch and bound algorithms.

Keywords:

learning probabilistic models, learning bayesian networks, minimum description length, bayesian scoring metrics, branch and bound, data mining, sufficient statistics

Availability:

PostScript