Self-organizing network for variable clustering

Gang Liu, Hui Yang

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Advanced sensing and internet of things bring the big data, which provides an unprecedented opportunity for data-driven knowledge discovery. However, it is common that a large number of variables (or predictors, features) are involved in the big data. Complex interdependence structures among variables pose significant challenges on the traditional framework of predictive modeling. This paper presents a new methodology of self-organizing network to characterize the interrelationships among variables and cluster them into homogeneous subgroups for predictive modeling. Specifically, we develop a new approach, namely nonlinear coupling analysis to measure variable-to-variable interdependence structures. Further, each variable is represented as a node in the complex network. Nonlinear-coupling forces move these nodes to derive a self-organizing topology of the network. As such, variables are clustered into sub-network communities. Results of simulation experiments demonstrate that the proposed method not only outperforms traditional variable clustering algorithms such as hierarchical clustering and oblique principal component analysis, but also effectively identifies interdependent structures among variables and further improves the performance of predictive modeling. Additionally, real-world case study shows that the proposed method yields an average sensitivity of 96.80% and an average specificity of 92.62% in the identification of myocardial infarctions using sparse parameters of vectorcardiogram representation models. The proposed new idea of self-organizing network is generally applicable for predictive modeling in many disciplines that involve a large number of highly-redundant variables.

Original languageEnglish (US)
Pages (from-to)119-140
Number of pages22
JournalAnnals of Operations Research
Volume263
Issue number1-2
DOIs
StatePublished - Apr 1 2018

Fingerprint

Clustering
Self-organizing
Predictive modeling
Node
Interdependence
Interrelationship
Complex networks
Internet of things
Myocardial infarction
Topology
Specificity
Hierarchical clustering
Methodology
Predictors
Clustering algorithm
Principal component analysis
Simulation experiment
Knowledge discovery

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Management Science and Operations Research

Cite this

Liu, Gang ; Yang, Hui. / Self-organizing network for variable clustering. In: Annals of Operations Research. 2018 ; Vol. 263, No. 1-2. pp. 119-140.
@article{df68a9eace7c4bbe862d6d0bd38253be,
title = "Self-organizing network for variable clustering",
abstract = "Advanced sensing and internet of things bring the big data, which provides an unprecedented opportunity for data-driven knowledge discovery. However, it is common that a large number of variables (or predictors, features) are involved in the big data. Complex interdependence structures among variables pose significant challenges on the traditional framework of predictive modeling. This paper presents a new methodology of self-organizing network to characterize the interrelationships among variables and cluster them into homogeneous subgroups for predictive modeling. Specifically, we develop a new approach, namely nonlinear coupling analysis to measure variable-to-variable interdependence structures. Further, each variable is represented as a node in the complex network. Nonlinear-coupling forces move these nodes to derive a self-organizing topology of the network. As such, variables are clustered into sub-network communities. Results of simulation experiments demonstrate that the proposed method not only outperforms traditional variable clustering algorithms such as hierarchical clustering and oblique principal component analysis, but also effectively identifies interdependent structures among variables and further improves the performance of predictive modeling. Additionally, real-world case study shows that the proposed method yields an average sensitivity of 96.80{\%} and an average specificity of 92.62{\%} in the identification of myocardial infarctions using sparse parameters of vectorcardiogram representation models. The proposed new idea of self-organizing network is generally applicable for predictive modeling in many disciplines that involve a large number of highly-redundant variables.",
author = "Gang Liu and Hui Yang",
year = "2018",
month = "4",
day = "1",
doi = "10.1007/s10479-017-2442-2",
language = "English (US)",
volume = "263",
pages = "119--140",
journal = "Annals of Operations Research",
issn = "0254-5330",
publisher = "Springer Netherlands",
number = "1-2",

}

Self-organizing network for variable clustering. / Liu, Gang; Yang, Hui.

In: Annals of Operations Research, Vol. 263, No. 1-2, 01.04.2018, p. 119-140.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Self-organizing network for variable clustering

AU - Liu, Gang

AU - Yang, Hui

PY - 2018/4/1

Y1 - 2018/4/1

N2 - Advanced sensing and internet of things bring the big data, which provides an unprecedented opportunity for data-driven knowledge discovery. However, it is common that a large number of variables (or predictors, features) are involved in the big data. Complex interdependence structures among variables pose significant challenges on the traditional framework of predictive modeling. This paper presents a new methodology of self-organizing network to characterize the interrelationships among variables and cluster them into homogeneous subgroups for predictive modeling. Specifically, we develop a new approach, namely nonlinear coupling analysis to measure variable-to-variable interdependence structures. Further, each variable is represented as a node in the complex network. Nonlinear-coupling forces move these nodes to derive a self-organizing topology of the network. As such, variables are clustered into sub-network communities. Results of simulation experiments demonstrate that the proposed method not only outperforms traditional variable clustering algorithms such as hierarchical clustering and oblique principal component analysis, but also effectively identifies interdependent structures among variables and further improves the performance of predictive modeling. Additionally, real-world case study shows that the proposed method yields an average sensitivity of 96.80% and an average specificity of 92.62% in the identification of myocardial infarctions using sparse parameters of vectorcardiogram representation models. The proposed new idea of self-organizing network is generally applicable for predictive modeling in many disciplines that involve a large number of highly-redundant variables.

AB - Advanced sensing and internet of things bring the big data, which provides an unprecedented opportunity for data-driven knowledge discovery. However, it is common that a large number of variables (or predictors, features) are involved in the big data. Complex interdependence structures among variables pose significant challenges on the traditional framework of predictive modeling. This paper presents a new methodology of self-organizing network to characterize the interrelationships among variables and cluster them into homogeneous subgroups for predictive modeling. Specifically, we develop a new approach, namely nonlinear coupling analysis to measure variable-to-variable interdependence structures. Further, each variable is represented as a node in the complex network. Nonlinear-coupling forces move these nodes to derive a self-organizing topology of the network. As such, variables are clustered into sub-network communities. Results of simulation experiments demonstrate that the proposed method not only outperforms traditional variable clustering algorithms such as hierarchical clustering and oblique principal component analysis, but also effectively identifies interdependent structures among variables and further improves the performance of predictive modeling. Additionally, real-world case study shows that the proposed method yields an average sensitivity of 96.80% and an average specificity of 92.62% in the identification of myocardial infarctions using sparse parameters of vectorcardiogram representation models. The proposed new idea of self-organizing network is generally applicable for predictive modeling in many disciplines that involve a large number of highly-redundant variables.

UR - http://www.scopus.com/inward/record.url?scp=85014035610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014035610&partnerID=8YFLogxK

U2 - 10.1007/s10479-017-2442-2

DO - 10.1007/s10479-017-2442-2

M3 - Article

AN - SCOPUS:85014035610

VL - 263

SP - 119

EP - 140

JO - Annals of Operations Research

JF - Annals of Operations Research

SN - 0254-5330

IS - 1-2

ER -