Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon

Kristoffer Sahlin, Marta Hoover, Kateryna Dmytrivna Makova, Paul Medvedev

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.

Original languageEnglish (US)
Article number4601
JournalNature communications
Volume9
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Multigene Family
genes
Genes
genome
Protein Isoforms
Y-Linked Genes
assaying
Human Genome
vertebrates
chromosomes
nucleotides
Vertebrates
Nucleotides
Genome
Chromosomes

All Science Journal Classification (ASJC) codes

  • Chemistry(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Physics and Astronomy(all)

Cite this

@article{b325d9c37570482692200e1d32c980ca,
title = "Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon",
abstract = "A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.",
author = "Kristoffer Sahlin and Marta Hoover and Makova, {Kateryna Dmytrivna} and Paul Medvedev",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41467-018-06910-x",
language = "English (US)",
volume = "9",
journal = "Nature Communications",
issn = "2041-1723",
publisher = "Nature Publishing Group",
number = "1",

}

Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon. / Sahlin, Kristoffer; Hoover, Marta; Makova, Kateryna Dmytrivna; Medvedev, Paul.

In: Nature communications, Vol. 9, No. 1, 4601, 01.12.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon

AU - Sahlin, Kristoffer

AU - Hoover, Marta

AU - Makova, Kateryna Dmytrivna

AU - Medvedev, Paul

PY - 2018/12/1

Y1 - 2018/12/1

N2 - A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.

AB - A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.

UR - http://www.scopus.com/inward/record.url?scp=85056075328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056075328&partnerID=8YFLogxK

U2 - 10.1038/s41467-018-06910-x

DO - 10.1038/s41467-018-06910-x

M3 - Article

C2 - 30389934

AN - SCOPUS:85056075328

VL - 9

JO - Nature Communications

JF - Nature Communications

SN - 2041-1723

IS - 1

M1 - 4601

ER -