Modeling Decision Tree Performance with the Power Law

Lewis J. Frey and Douglas H. Fisher, Jr.

Lewis J. Frey
Computer Science Department
Vanderbilt University
Village at Vanderbilt
Nashville, TN 37212
E-mail: frey@vuse.vanderbilt.edu
Phone: 615-322-3233
Fax: 615-343-8006

Douglas H. Fisher
Computer Science Department
Vanderbilt University
Village at Vanderbilt
Nashville, TN 37212
E-mail: dfisher@vuse.vanderbilt.edu
Phone: 615-343-4111
Fax: 615-343-8006

Abstract:

This paper discusses the use of a power law to predict decision tree performance. Power laws are fit to learning curves of decision trees trained on data sets from the UCI repository. The learning curves are generated by training C4.5 on different size training sets. The power law predicts diminishing returns in terms of error rate as training set size increase. By characterizing the learning curve with a power law, the error rate for a given size training set can be projected. This projection can be used in estimating the amount of data needed to achieve an acceptable error rate, and the cost effectiveness of further data collection.

Keywords:

decision trees, C4.5, power law correlation

Availability:

PostScript
PDF