Comparative study on subject classification of academic videos using noisy transcripts

Hau Wen Chang, Hung Sik Kim, Shuyang Li, Jeongkyu Lee, Dongwon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

With the advance of Web technologies, the number of "academic" videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.

Original languageEnglish (US)
Title of host publicationProceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010
Pages67-72
Number of pages6
DOIs
StatePublished - Dec 1 2010
Event4th IEEE International Conference on Semantic Computing, ICSC 2010 - Pittsburgh, PA, United States
Duration: Sep 22 2010Sep 24 2010

Publication series

NameProceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010

Other

Other4th IEEE International Conference on Semantic Computing, ICSC 2010
CountryUnited States
CityPittsburgh, PA
Period9/22/109/24/10

Fingerprint

Comparative Study
Text Categorization
Classification Problems
Technical presentations
Classify
Transform
Sufficient
Semantics
Presentation

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Chang, H. W., Kim, H. S., Li, S., Lee, J., & Lee, D. (2010). Comparative study on subject classification of academic videos using noisy transcripts. In Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010 (pp. 67-72). [5628857] (Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010). https://doi.org/10.1109/ICSC.2010.91
Chang, Hau Wen ; Kim, Hung Sik ; Li, Shuyang ; Lee, Jeongkyu ; Lee, Dongwon. / Comparative study on subject classification of academic videos using noisy transcripts. Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010. 2010. pp. 67-72 (Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010).
@inproceedings{0db5d269961f482694c818cb1ff70f0f,
title = "Comparative study on subject classification of academic videos using noisy transcripts",
abstract = "With the advance of Web technologies, the number of {"}academic{"} videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.",
author = "Chang, {Hau Wen} and Kim, {Hung Sik} and Shuyang Li and Jeongkyu Lee and Dongwon Lee",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/ICSC.2010.91",
language = "English (US)",
isbn = "9780769541549",
series = "Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010",
pages = "67--72",
booktitle = "Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010",

}

Chang, HW, Kim, HS, Li, S, Lee, J & Lee, D 2010, Comparative study on subject classification of academic videos using noisy transcripts. in Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010., 5628857, Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010, pp. 67-72, 4th IEEE International Conference on Semantic Computing, ICSC 2010, Pittsburgh, PA, United States, 9/22/10. https://doi.org/10.1109/ICSC.2010.91

Comparative study on subject classification of academic videos using noisy transcripts. / Chang, Hau Wen; Kim, Hung Sik; Li, Shuyang; Lee, Jeongkyu; Lee, Dongwon.

Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010. 2010. p. 67-72 5628857 (Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Comparative study on subject classification of academic videos using noisy transcripts

AU - Chang, Hau Wen

AU - Kim, Hung Sik

AU - Li, Shuyang

AU - Lee, Jeongkyu

AU - Lee, Dongwon

PY - 2010/12/1

Y1 - 2010/12/1

N2 - With the advance of Web technologies, the number of "academic" videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.

AB - With the advance of Web technologies, the number of "academic" videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.

UR - http://www.scopus.com/inward/record.url?scp=79952044115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952044115&partnerID=8YFLogxK

U2 - 10.1109/ICSC.2010.91

DO - 10.1109/ICSC.2010.91

M3 - Conference contribution

AN - SCOPUS:79952044115

SN - 9780769541549

T3 - Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010

SP - 67

EP - 72

BT - Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010

ER -

Chang HW, Kim HS, Li S, Lee J, Lee D. Comparative study on subject classification of academic videos using noisy transcripts. In Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010. 2010. p. 67-72. 5628857. (Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010). https://doi.org/10.1109/ICSC.2010.91