Extracting author meta-data from web using visual features

Shuyi Zheng, Ding Zhou, Jia Li, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.

Original languageEnglish (US)
Title of host publicationICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
Pages33-38
Number of pages6
DOIs
StatePublished - Dec 1 2007
Event17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007 - Omaha, NE, United States
Duration: Oct 28 2007Oct 31 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
CountryUnited States
CityOmaha, NE
Period10/28/0710/31/07

Fingerprint

Metadata
Digital libraries
Electronic mail

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Zheng, S., Zhou, D., Li, J., & Giles, C. L. (2007). Extracting author meta-data from web using visual features. In ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops (pp. 33-38). [4476643] (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDMW.2007.59
Zheng, Shuyi ; Zhou, Ding ; Li, Jia ; Giles, C. Lee. / Extracting author meta-data from web using visual features. ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. pp. 33-38 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{d03fc98490bc4274b06b42b0f4ddcd26,
title = "Extracting author meta-data from web using visual features",
abstract = "Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.",
author = "Shuyi Zheng and Ding Zhou and Jia Li and Giles, {C. Lee}",
year = "2007",
month = "12",
day = "1",
doi = "10.1109/ICDMW.2007.59",
language = "English (US)",
isbn = "0769530192",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
pages = "33--38",
booktitle = "ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops",

}

Zheng, S, Zhou, D, Li, J & Giles, CL 2007, Extracting author meta-data from web using visual features. in ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops., 4476643, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 33-38, 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007, Omaha, NE, United States, 10/28/07. https://doi.org/10.1109/ICDMW.2007.59

Extracting author meta-data from web using visual features. / Zheng, Shuyi; Zhou, Ding; Li, Jia; Giles, C. Lee.

ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. p. 33-38 4476643 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Extracting author meta-data from web using visual features

AU - Zheng, Shuyi

AU - Zhou, Ding

AU - Li, Jia

AU - Giles, C. Lee

PY - 2007/12/1

Y1 - 2007/12/1

N2 - Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.

AB - Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.

UR - http://www.scopus.com/inward/record.url?scp=49549101342&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49549101342&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2007.59

DO - 10.1109/ICDMW.2007.59

M3 - Conference contribution

AN - SCOPUS:49549101342

SN - 0769530192

SN - 9780769530192

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 33

EP - 38

BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops

ER -

Zheng S, Zhou D, Li J, Giles CL. Extracting author meta-data from web using visual features. In ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. p. 33-38. 4476643. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDMW.2007.59