BASIL: Effective near-duplicate image detection using gene sequence alignment

Hung Sik Kim, Hau Wen Chang, Jeongkyu Lee, Dongwon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields - MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the "first" step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings
Pages229-240
Number of pages12
DOIs
StatePublished - May 20 2010
Event32nd European Conference on Information Retrieval, ECIR 2010 - Milton Keynes, United Kingdom
Duration: Mar 28 2010Mar 31 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5993 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other32nd European Conference on Information Retrieval, ECIR 2010
CountryUnited Kingdom
CityMilton Keynes
Period3/28/103/31/10

Fingerprint

Sequence Alignment
Information retrieval
Linkage
Genes
Gene
Information Retrieval
Biology
Multimedia
DNA sequences
DNA Sequence

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kim, H. S., Chang, H. W., Lee, J., & Lee, D. (2010). BASIL: Effective near-duplicate image detection using gene sequence alignment. In Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings (pp. 229-240). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5993 LNCS). https://doi.org/10.1007/978-3-642-12275-0-22
Kim, Hung Sik ; Chang, Hau Wen ; Lee, Jeongkyu ; Lee, Dongwon. / BASIL : Effective near-duplicate image detection using gene sequence alignment. Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings. 2010. pp. 229-240 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2a38599486f94aaca9c5d8c8801898d5,
title = "BASIL: Effective near-duplicate image detection using gene sequence alignment",
abstract = "Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields - MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the {"}first{"} step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.",
author = "Kim, {Hung Sik} and Chang, {Hau Wen} and Jeongkyu Lee and Dongwon Lee",
year = "2010",
month = "5",
day = "20",
doi = "10.1007/978-3-642-12275-0-22",
language = "English (US)",
isbn = "3642122744",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "229--240",
booktitle = "Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings",

}

Kim, HS, Chang, HW, Lee, J & Lee, D 2010, BASIL: Effective near-duplicate image detection using gene sequence alignment. in Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5993 LNCS, pp. 229-240, 32nd European Conference on Information Retrieval, ECIR 2010, Milton Keynes, United Kingdom, 3/28/10. https://doi.org/10.1007/978-3-642-12275-0-22

BASIL : Effective near-duplicate image detection using gene sequence alignment. / Kim, Hung Sik; Chang, Hau Wen; Lee, Jeongkyu; Lee, Dongwon.

Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings. 2010. p. 229-240 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5993 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - BASIL

T2 - Effective near-duplicate image detection using gene sequence alignment

AU - Kim, Hung Sik

AU - Chang, Hau Wen

AU - Lee, Jeongkyu

AU - Lee, Dongwon

PY - 2010/5/20

Y1 - 2010/5/20

N2 - Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields - MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the "first" step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

AB - Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields - MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the "first" step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

UR - http://www.scopus.com/inward/record.url?scp=77952318432&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952318432&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12275-0-22

DO - 10.1007/978-3-642-12275-0-22

M3 - Conference contribution

AN - SCOPUS:77952318432

SN - 3642122744

SN - 9783642122743

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 229

EP - 240

BT - Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings

ER -

Kim HS, Chang HW, Lee J, Lee D. BASIL: Effective near-duplicate image detection using gene sequence alignment. In Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings. 2010. p. 229-240. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-12275-0-22