A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing

Raunaq Malhotra, Daniel Elleder, Le Bao, David Russell Hunter, Mary Poss, Raj Acharya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Next Generation Sequencing (NGS) reads obtained by sequencing of the junction of a mobile element and the host flanking region from individuals in a population are typically mapped to a reference genome to determine the location of the mobile element-host junction. We propose a clustering pipeline for grouping such NGS data into clusters corresponding to the locations of integration sites in the genome. Our pipeline relies on the UCLUST clustering software, which clusters reads into groups using a clustering threshold, to cluster the integration sites NGS reads into groups based on their site of origin. An optimal clustering threshold is chosen based on a proposed clustering measure, I - index. We evaluate our pipeline on simulated integration sites data from the human genome and compare its performance to UCLUST clustering. Our pipeline is more accurate in recovering both the number and the correct sequence of the integration sites when compared to the other method. This pipeline can be beneficial in detecting the mobile element-host junctions in a population for species with no reference genome.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016
EditorsNurit Haspel, Thomas Ioerger
PublisherThe International Society for Computers and Their Applications (ISCA)
Pages63-68
Number of pages6
ISBN (Electronic)9781943436033
StatePublished - Jan 1 2016
Event8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 - Las Vegas, United States
Duration: Apr 4 2016Apr 6 2016

Publication series

NameProceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016

Other

Other8th International Conference on Bioinformatics and Computational Biology, BICOB 2016
CountryUnited States
CityLas Vegas
Period4/4/164/6/16

Fingerprint

Cluster Analysis
Pipelines
Genes
Genome
Human Genome
Population
Software

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Information Systems
  • Biomedical Engineering
  • Electrical and Electronic Engineering
  • Health Informatics

Cite this

Malhotra, R., Elleder, D., Bao, L., Hunter, D. R., Poss, M., & Acharya, R. (2016). A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. In N. Haspel, & T. Ioerger (Eds.), Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 (pp. 63-68). (Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016). The International Society for Computers and Their Applications (ISCA).
Malhotra, Raunaq ; Elleder, Daniel ; Bao, Le ; Hunter, David Russell ; Poss, Mary ; Acharya, Raj. / A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016. editor / Nurit Haspel ; Thomas Ioerger. The International Society for Computers and Their Applications (ISCA), 2016. pp. 63-68 (Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016).
@inproceedings{59510145afb947c590d1699ffaf72352,
title = "A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing",
abstract = "Next Generation Sequencing (NGS) reads obtained by sequencing of the junction of a mobile element and the host flanking region from individuals in a population are typically mapped to a reference genome to determine the location of the mobile element-host junction. We propose a clustering pipeline for grouping such NGS data into clusters corresponding to the locations of integration sites in the genome. Our pipeline relies on the UCLUST clustering software, which clusters reads into groups using a clustering threshold, to cluster the integration sites NGS reads into groups based on their site of origin. An optimal clustering threshold is chosen based on a proposed clustering measure, I - index. We evaluate our pipeline on simulated integration sites data from the human genome and compare its performance to UCLUST clustering. Our pipeline is more accurate in recovering both the number and the correct sequence of the integration sites when compared to the other method. This pipeline can be beneficial in detecting the mobile element-host junctions in a population for species with no reference genome.",
author = "Raunaq Malhotra and Daniel Elleder and Le Bao and Hunter, {David Russell} and Mary Poss and Raj Acharya",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016",
publisher = "The International Society for Computers and Their Applications (ISCA)",
pages = "63--68",
editor = "Nurit Haspel and Thomas Ioerger",
booktitle = "Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016",

}

Malhotra, R, Elleder, D, Bao, L, Hunter, DR, Poss, M & Acharya, R 2016, A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. in N Haspel & T Ioerger (eds), Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016. Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016, The International Society for Computers and Their Applications (ISCA), pp. 63-68, 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016, Las Vegas, United States, 4/4/16.

A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. / Malhotra, Raunaq; Elleder, Daniel; Bao, Le; Hunter, David Russell; Poss, Mary; Acharya, Raj.

Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016. ed. / Nurit Haspel; Thomas Ioerger. The International Society for Computers and Their Applications (ISCA), 2016. p. 63-68 (Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing

AU - Malhotra, Raunaq

AU - Elleder, Daniel

AU - Bao, Le

AU - Hunter, David Russell

AU - Poss, Mary

AU - Acharya, Raj

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Next Generation Sequencing (NGS) reads obtained by sequencing of the junction of a mobile element and the host flanking region from individuals in a population are typically mapped to a reference genome to determine the location of the mobile element-host junction. We propose a clustering pipeline for grouping such NGS data into clusters corresponding to the locations of integration sites in the genome. Our pipeline relies on the UCLUST clustering software, which clusters reads into groups using a clustering threshold, to cluster the integration sites NGS reads into groups based on their site of origin. An optimal clustering threshold is chosen based on a proposed clustering measure, I - index. We evaluate our pipeline on simulated integration sites data from the human genome and compare its performance to UCLUST clustering. Our pipeline is more accurate in recovering both the number and the correct sequence of the integration sites when compared to the other method. This pipeline can be beneficial in detecting the mobile element-host junctions in a population for species with no reference genome.

AB - Next Generation Sequencing (NGS) reads obtained by sequencing of the junction of a mobile element and the host flanking region from individuals in a population are typically mapped to a reference genome to determine the location of the mobile element-host junction. We propose a clustering pipeline for grouping such NGS data into clusters corresponding to the locations of integration sites in the genome. Our pipeline relies on the UCLUST clustering software, which clusters reads into groups using a clustering threshold, to cluster the integration sites NGS reads into groups based on their site of origin. An optimal clustering threshold is chosen based on a proposed clustering measure, I - index. We evaluate our pipeline on simulated integration sites data from the human genome and compare its performance to UCLUST clustering. Our pipeline is more accurate in recovering both the number and the correct sequence of the integration sites when compared to the other method. This pipeline can be beneficial in detecting the mobile element-host junctions in a population for species with no reference genome.

UR - http://www.scopus.com/inward/record.url?scp=84973594525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973594525&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84973594525

T3 - Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016

SP - 63

EP - 68

BT - Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016

A2 - Haspel, Nurit

A2 - Ioerger, Thomas

PB - The International Society for Computers and Their Applications (ISCA)

ER -

Malhotra R, Elleder D, Bao L, Hunter DR, Poss M, Acharya R. A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. In Haspel N, Ioerger T, editors, Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016. The International Society for Computers and Their Applications (ISCA). 2016. p. 63-68. (Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016).