The architectures of many neural networks rely heavily on the underlying grid associated with the variables, for instance, the lattice of pixels in an image. For general biomedical data without a grid structure, the multi-layer perceptron (MLP) and deep belief network (DBN) are often used. However, in these networks, variables are treated homogeneously in the sense of network structure; and it is difficult to assess their individual importance. In this paper, we propose a novel neural network called Variable-block tree Net (VtNet) whose architecture is determined by an underlying tree with each node corresponding to a subset of variables. The tree is learned from the data to best capture the causal relationships among the variables. VtNet contains a long short-term memory (LSTM)-like cell for every tree node. The input and forget gates of each cell control the information flow through the node, and they are used to define a significance score for the variables. To validate the defined significance score, VtNet is trained using smaller trees with variables of low scores removed. Hypothesis tests are conducted to show that variables of higher scores influence classification more strongly. Comparison is made with the variable importance score defined in Random Forest from the aspect of variable selection. Our experiments demonstrate that VtNet is highly competitive in classification accuracy and can often improve accuracy by removing variables with low significance scores.
|Original language||English (US)|
|State||Published - Dec 2021|
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty