Efficient identification of web communities

Gary William Flake, Steve Lawrence, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

546 Citations (Scopus)

Abstract

We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be efficiently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink consists of well-known non-members. A focused crawler that crawls to a fixed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved filtering.

Original languageEnglish (US)
Title of host publicationProceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa, R. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa
Pages150-160
Number of pages11
StatePublished - Dec 1 2000
EventProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - Boston, MA, United States
Duration: Aug 20 2000Aug 23 2000

Other

OtherProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
CountryUnited States
CityBoston, MA
Period8/20/008/23/00

Fingerprint

Approximation algorithms
Search engines
Topology

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of web communities. In R. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa, R. Ramakrishnan, S. Stolfo, R. Bayardo, ... I. Parsa (Eds.), Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 150-160)
Flake, Gary William ; Lawrence, Steve ; Giles, C. Lee. / Efficient identification of web communities. Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / R. Ramakrishnan ; S. Stolfo ; R. Bayardo ; I. Parsa ; R. Ramakrishnan ; S. Stolfo ; R. Bayardo ; I. Parsa. 2000. pp. 150-160
@inproceedings{5ca58e3f491e4f4c867b65db7dca8459,
title = "Efficient identification of web communities",
abstract = "We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be efficiently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink consists of well-known non-members. A focused crawler that crawls to a fixed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved filtering.",
author = "Flake, {Gary William} and Steve Lawrence and Giles, {C. Lee}",
year = "2000",
month = "12",
day = "1",
language = "English (US)",
isbn = "1581132336",
pages = "150--160",
editor = "R. Ramakrishnan and S. Stolfo and R. Bayardo and I. Parsa and R. Ramakrishnan and S. Stolfo and R. Bayardo and I. Parsa",
booktitle = "Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

Flake, GW, Lawrence, S & Giles, CL 2000, Efficient identification of web communities. in R Ramakrishnan, S Stolfo, R Bayardo, I Parsa, R Ramakrishnan, S Stolfo, R Bayardo & I Parsa (eds), Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 150-160, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), Boston, MA, United States, 8/20/00.

Efficient identification of web communities. / Flake, Gary William; Lawrence, Steve; Giles, C. Lee.

Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / R. Ramakrishnan; S. Stolfo; R. Bayardo; I. Parsa; R. Ramakrishnan; S. Stolfo; R. Bayardo; I. Parsa. 2000. p. 150-160.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Efficient identification of web communities

AU - Flake, Gary William

AU - Lawrence, Steve

AU - Giles, C. Lee

PY - 2000/12/1

Y1 - 2000/12/1

N2 - We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be efficiently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink consists of well-known non-members. A focused crawler that crawls to a fixed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved filtering.

AB - We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be efficiently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink consists of well-known non-members. A focused crawler that crawls to a fixed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved filtering.

UR - http://www.scopus.com/inward/record.url?scp=0034592749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034592749&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0034592749

SN - 1581132336

SP - 150

EP - 160

BT - Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Ramakrishnan, R.

A2 - Stolfo, S.

A2 - Bayardo, R.

A2 - Parsa, I.

A2 - Ramakrishnan, R.

A2 - Stolfo, S.

A2 - Bayardo, R.

A2 - Parsa, I.

ER -

Flake GW, Lawrence S, Giles CL. Efficient identification of web communities. In Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, editors, Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000. p. 150-160