Wenxin Jiang
Department of Statistics
Northwestern University
Evanston, IL 60208, USA
E-mail: wjiang@nwu.edu
Phone: 847-467-4533
Fax: 847-491-4939
Martin A. Tanner
Department of Statistics
Northwestern University
Evanston, IL 60208, USA
E-mail: tanm@neyman.stats.nwu.edu
Phone: 847-491-2700
Fax: 847-491-4939
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(a+x^T b)$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(x))$ where $h \in W_{2;K_0}^\infty$ (a Sobolev class over $[0,1]^{s}$). It is shown that the HME mean functions can approximate the true mean function, at a rate of $O(m^{-2/s})$ in $L_p$ norm. Moreover, the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance, and at a rate of $O(m^{-4/s})$ in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here $s$ is the dimension of the predictor $x$. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size $n$ and the number of experts $m$ both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed.
Approximation rate, exponential family, generalized linear models, Hellinger distance, Hierarchical Mixtures-of-Experts, Kullback-Leibler divergence, maximum likelihood estimation, mean square error.