TY - JOUR
T1 - Fast-RCM
T2 - Fast Tree-Based Unsupervised Rare-Class Mining
AU - Weng, Haiqin
AU - Ji, Shouling
AU - Liu, Changchang
AU - Wang, Ting
AU - He, Qinming
AU - Chen, Jianhai
N1 - Funding Information:
This work was supported in part by NSFC under Grant 61772466, Grant U1836202, and Grant 61472359, in part by the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under Grant LR19F020003, in part by the Provincial Key Research and Development Program of Zhejiang, China, under Grant 2017C01055, and in part by the Alibaba-ZJU Joint Research Institute of Frontier Technologies.
Publisher Copyright:
© 2013 IEEE.
PY - 2021/10/1
Y1 - 2021/10/1
N2 - Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
AB - Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=85117393529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85117393529&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2019.2924804
DO - 10.1109/TCYB.2019.2924804
M3 - Article
C2 - 31331902
AN - SCOPUS:85117393529
SN - 2168-2267
VL - 51
SP - 5198
EP - 5211
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 10
ER -