Learning with Small Data

Huaxiu Yao, Xiaowei Jia, Vipin Kumar, Zhenhui Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the era of big data, data-driven methods have become increasingly popular in various applications, such as image recognition, traffic signal control, fake news detection. The superior performance of these data-driven approaches relies on large-scale labeled training data, which are probably inaccessible in real-world applications, i.e., "small (labeled) data" challenge. Examples include predicting emergent events in a city, detecting emerging fake news, and forecasting the progression of conditions for rare diseases. In most scenarios, people care about these small data cases most and thus improving the learning effectiveness of machine learning algorithms with small labeled data has been a popular research topic. In this tutorial, we will review the trending state-of-the-art machine learning techniques for learning with small (labeled) data. These techniques are organized from two aspects: (1) providing a comprehensive review of recent studies about knowledge generalization, transfer, and sharing, where transfer learning, multi-task learning, and meta-learning are discussed. Particularly, we will focus more on meta-learning, which improves the model generalization ability and has been proven to be an effective approach recently; (2) introducing the cutting-edge techniques which focus on incorporating domain knowledge into machine learning models. Different from model-based knowledge transfer techniques, in real-world applications, domain knowledge (e.g., physical laws) provides us with a new angle to deal with the small data challenge. Specifically, domain knowledge can be used to optimize learning strategies and/or guide the model design. In data mining field, we believe that learning with small data is a trending topic with important social impact, which will attract both researchers and practitioners from academia and industry.

Original languageEnglish (US)
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages3539-3540
Number of pages2
ISBN (Electronic)9781450379984
DOIs
StatePublished - Aug 23 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: Aug 23 2020Aug 27 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
CountryUnited States
CityVirtual, Online
Period8/23/208/27/20

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Learning with Small Data'. Together they form a unique fingerprint.

Cite this