Efficient Mining of Statistical Dependencies

Authors:

Tim Oates
Experimental Knowledge Systems Lab
Department of Computer Science
University of Massachusetts, Amherst
E-mail: oates@cs.umass.edu
Phone: 413-545-3638
Fax: 413-545-1249

Matthew D. Schmill
Experimental Knowledge Systems Lab
Department of Computer Science
University of Massachusetts, Amherst
E-mail: schmill@cs.umass.edu
Phone: 413-545-3638
Fax: 413-545-1249

Paul R. Cohen
Experimental Knowledge Systems Lab
Department of Computer Science
University of Massachusetts, Amherst
E-mail: cohen@cs.umass.edu
Phone: 413-545-3638
Fax: 413-545-1249

Casey Durfee
Experimental Knowledge Systems Lab
Department of Computer Science
University of Massachusetts, Amherst
E-mail: durfee@cs.umass.edu
Phone: 413-545-3638
Fax: 413-545-1249

Abstract:

The Multi-Stream Dependency Detection algorithm finds rules that capture statistical dependencies between patterns in multivariate time series of categorical data. Rule strength is measured by the G statistic, and an upper bound on the value of G for the descendants of a node allows MSDD's search space to be pruned. However, in the worst case, the algorithm will explore exponentially many rules. This paper presents and empirically evaluates two ways of addressing this problem. The first is a set of three methods for reducing the size of MSDD's search space based on information collected during the search process. Second, we discuss an implementation of MSDD that distributes its computations over multiple machines on a network.

Keywords:

data mining, distributed search, rules learning, statistical dependencies

Availability:

PostScript

Other information:

Click here to find additional papers on the MSDD algorithm.