Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints

Su Yan, Sofien Bouaziz, Dongwon Lee, Jesse Barlow

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

In this paper, we present a novel semi-supervised dimensionality reduction technique to address the problems of inefficient learning and costly computation in coping with high-dimensional data. Our method named the dual subspace projections (DSP) embeds high-dimensional data in an optimal low-dimensional space, which is learned with a few user-supplied constraints and the structure of input data. The method projects data into two different subspaces respectively the kernel space and the original input space. Each projection is designed to enforce one type of constraints and projections in the two subspaces interact with each other to satisfy constraints maximally and preserve the intrinsic data structure. Compared to existing techniques, our method has the following advantages: (1) it benefits from constraints even when only a few are available; (2) it is robust and free from overfitting; and (3) it handles nonlinearly separable data, but learns a linear data transformation. As a conclusion, our method can be easily generalized to new data points and is efficient in dealing with large datasets. An empirical study using real data validates our claims so that significant improvements in learning accuracy can be obtained after the DSP-based dimensionality reduction is applied to high-dimensional data.

Original languageEnglish (US)
Pages (from-to)114-124
Number of pages11
JournalNeurocomputing
Volume76
Issue number1
DOIs
StatePublished - Jan 15 2012

Fingerprint

Data structures
Learning
Datasets

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

@article{fd0552ffa08e4cf0845dee8ebc40b48d,
title = "Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints",
abstract = "In this paper, we present a novel semi-supervised dimensionality reduction technique to address the problems of inefficient learning and costly computation in coping with high-dimensional data. Our method named the dual subspace projections (DSP) embeds high-dimensional data in an optimal low-dimensional space, which is learned with a few user-supplied constraints and the structure of input data. The method projects data into two different subspaces respectively the kernel space and the original input space. Each projection is designed to enforce one type of constraints and projections in the two subspaces interact with each other to satisfy constraints maximally and preserve the intrinsic data structure. Compared to existing techniques, our method has the following advantages: (1) it benefits from constraints even when only a few are available; (2) it is robust and free from overfitting; and (3) it handles nonlinearly separable data, but learns a linear data transformation. As a conclusion, our method can be easily generalized to new data points and is efficient in dealing with large datasets. An empirical study using real data validates our claims so that significant improvements in learning accuracy can be obtained after the DSP-based dimensionality reduction is applied to high-dimensional data.",
author = "Su Yan and Sofien Bouaziz and Dongwon Lee and Jesse Barlow",
year = "2012",
month = "1",
day = "15",
doi = "10.1016/j.neucom.2011.03.057",
language = "English (US)",
volume = "76",
pages = "114--124",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",
number = "1",

}

Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints. / Yan, Su; Bouaziz, Sofien; Lee, Dongwon; Barlow, Jesse.

In: Neurocomputing, Vol. 76, No. 1, 15.01.2012, p. 114-124.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints

AU - Yan, Su

AU - Bouaziz, Sofien

AU - Lee, Dongwon

AU - Barlow, Jesse

PY - 2012/1/15

Y1 - 2012/1/15

N2 - In this paper, we present a novel semi-supervised dimensionality reduction technique to address the problems of inefficient learning and costly computation in coping with high-dimensional data. Our method named the dual subspace projections (DSP) embeds high-dimensional data in an optimal low-dimensional space, which is learned with a few user-supplied constraints and the structure of input data. The method projects data into two different subspaces respectively the kernel space and the original input space. Each projection is designed to enforce one type of constraints and projections in the two subspaces interact with each other to satisfy constraints maximally and preserve the intrinsic data structure. Compared to existing techniques, our method has the following advantages: (1) it benefits from constraints even when only a few are available; (2) it is robust and free from overfitting; and (3) it handles nonlinearly separable data, but learns a linear data transformation. As a conclusion, our method can be easily generalized to new data points and is efficient in dealing with large datasets. An empirical study using real data validates our claims so that significant improvements in learning accuracy can be obtained after the DSP-based dimensionality reduction is applied to high-dimensional data.

AB - In this paper, we present a novel semi-supervised dimensionality reduction technique to address the problems of inefficient learning and costly computation in coping with high-dimensional data. Our method named the dual subspace projections (DSP) embeds high-dimensional data in an optimal low-dimensional space, which is learned with a few user-supplied constraints and the structure of input data. The method projects data into two different subspaces respectively the kernel space and the original input space. Each projection is designed to enforce one type of constraints and projections in the two subspaces interact with each other to satisfy constraints maximally and preserve the intrinsic data structure. Compared to existing techniques, our method has the following advantages: (1) it benefits from constraints even when only a few are available; (2) it is robust and free from overfitting; and (3) it handles nonlinearly separable data, but learns a linear data transformation. As a conclusion, our method can be easily generalized to new data points and is efficient in dealing with large datasets. An empirical study using real data validates our claims so that significant improvements in learning accuracy can be obtained after the DSP-based dimensionality reduction is applied to high-dimensional data.

UR - http://www.scopus.com/inward/record.url?scp=80555131196&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80555131196&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2011.03.057

DO - 10.1016/j.neucom.2011.03.057

M3 - Article

AN - SCOPUS:80555131196

VL - 76

SP - 114

EP - 124

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

IS - 1

ER -