CARE: Finding local linear correlations in high dimensional data

Xiang Zhang, Feng Pan, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages130-139
Number of pages10
DOIs
StatePublished - Oct 1 2008
Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
Duration: Apr 7 2008Apr 12 2008

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other2008 IEEE 24th International Conference on Data Engineering, ICDE'08
CountryMexico
CityCancun
Period4/7/084/12/08

Fingerprint

Feature extraction

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Cite this

Zhang, X., Pan, F., & Wang, W. (2008). CARE: Finding local linear correlations in high dimensional data. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 (pp. 130-139). [4497421] (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2008.4497421
Zhang, Xiang ; Pan, Feng ; Wang, Wei. / CARE : Finding local linear correlations in high dimensional data. Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. pp. 130-139 (Proceedings - International Conference on Data Engineering).
@inproceedings{d12c09026e2e4bf88e83f97fe0aa367a,
title = "CARE: Finding local linear correlations in high dimensional data",
abstract = "Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.",
author = "Xiang Zhang and Feng Pan and Wei Wang",
year = "2008",
month = "10",
day = "1",
doi = "10.1109/ICDE.2008.4497421",
language = "English (US)",
isbn = "9781424418374",
series = "Proceedings - International Conference on Data Engineering",
pages = "130--139",
booktitle = "Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08",

}

Zhang, X, Pan, F & Wang, W 2008, CARE: Finding local linear correlations in high dimensional data. in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08., 4497421, Proceedings - International Conference on Data Engineering, pp. 130-139, 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, Cancun, Mexico, 4/7/08. https://doi.org/10.1109/ICDE.2008.4497421

CARE : Finding local linear correlations in high dimensional data. / Zhang, Xiang; Pan, Feng; Wang, Wei.

Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. p. 130-139 4497421 (Proceedings - International Conference on Data Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - CARE

T2 - Finding local linear correlations in high dimensional data

AU - Zhang, Xiang

AU - Pan, Feng

AU - Wang, Wei

PY - 2008/10/1

Y1 - 2008/10/1

N2 - Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.

AB - Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.

UR - http://www.scopus.com/inward/record.url?scp=52649097914&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52649097914&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2008.4497421

DO - 10.1109/ICDE.2008.4497421

M3 - Conference contribution

AN - SCOPUS:52649097914

SN - 9781424418374

T3 - Proceedings - International Conference on Data Engineering

SP - 130

EP - 139

BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08

ER -

Zhang X, Pan F, Wang W. CARE: Finding local linear correlations in high dimensional data. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. p. 130-139. 4497421. (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2008.4497421