Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics

Chen Sun, Paul Medvedev

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Motivation Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary informationSupplementary dataare available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)415-420
Number of pages6
JournalBioinformatics
Volume35
Issue number3
DOIs
StatePublished - Feb 1 2019

Fingerprint

Single nucleotide Polymorphism
Nucleotides
Polymorphism
Sequencing
Single Nucleotide Polymorphism
Diagnostics
Genome
Genes
Alignment
Databases
Inborn Genetic Diseases
Bioinformatics
Computational Biology
Genotype
Motivation
Availability
Efficiency
Data storage equipment
Evaluate

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

@article{3396d90b24704058a35ede5247a354a3,
title = "Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics",
abstract = "Motivation Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary informationSupplementary dataare available at Bioinformatics online.",
author = "Chen Sun and Paul Medvedev",
year = "2019",
month = "2",
day = "1",
doi = "10.1093/bioinformatics/bty641",
language = "English (US)",
volume = "35",
pages = "415--420",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "3",

}

Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics. / Sun, Chen; Medvedev, Paul.

In: Bioinformatics, Vol. 35, No. 3, 01.02.2019, p. 415-420.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics

AU - Sun, Chen

AU - Medvedev, Paul

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Motivation Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary informationSupplementary dataare available at Bioinformatics online.

AB - Motivation Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary informationSupplementary dataare available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85061119596&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061119596&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty641

DO - 10.1093/bioinformatics/bty641

M3 - Article

C2 - 30032192

AN - SCOPUS:85061119596

VL - 35

SP - 415

EP - 420

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 3

ER -