Biofilter as a functional annotation pipeline for common and rare copy number burden

Dokyoon Kim, Anastasia Lucas, Joseph Glessner, Shefali S. Verma, Yuki Bradford, Ruowang Li, Alex T. Frase, Hakon Hakonarson, Peggy Peissig, Murray Brilliant, Marylyn Deriggi Ritchie

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter – a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record – total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis – now including copy number variants.

Original languageEnglish (US)
Pages (from-to)357-368
Number of pages12
JournalPacific Symposium on Biocomputing
StatePublished - Jan 1 2016
Event21st Pacific Symposium on Biocomputing, PSB 2016 - Big Island, United States
Duration: Jan 4 2016Jan 8 2016

Fingerprint

Biofilters
Pipelines
Genes
Phenotype
Gene Ontology
Computational Biology
Nucleoside-Triphosphatase
Cholesterol
Bioinformatics
Precision Medicine
Disease Resistance
Ontology
Electronic Health Records
Genomic Instability
Toll-Like Receptors
Disease Susceptibility
Hepatitis C
Autistic Disorder
Morphogenesis
Proteins

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this

Kim, D., Lucas, A., Glessner, J., Verma, S. S., Bradford, Y., Li, R., ... Ritchie, M. D. (2016). Biofilter as a functional annotation pipeline for common and rare copy number burden. Pacific Symposium on Biocomputing, 357-368.
Kim, Dokyoon ; Lucas, Anastasia ; Glessner, Joseph ; Verma, Shefali S. ; Bradford, Yuki ; Li, Ruowang ; Frase, Alex T. ; Hakonarson, Hakon ; Peissig, Peggy ; Brilliant, Murray ; Ritchie, Marylyn Deriggi. / Biofilter as a functional annotation pipeline for common and rare copy number burden. In: Pacific Symposium on Biocomputing. 2016 ; pp. 357-368.
@article{a14804ec793447eb9b99e7b2af67d9fe,
title = "Biofilter as a functional annotation pipeline for common and rare copy number burden",
abstract = "Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter – a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record – total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis – now including copy number variants.",
author = "Dokyoon Kim and Anastasia Lucas and Joseph Glessner and Verma, {Shefali S.} and Yuki Bradford and Ruowang Li and Frase, {Alex T.} and Hakon Hakonarson and Peggy Peissig and Murray Brilliant and Ritchie, {Marylyn Deriggi}",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "357--368",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",

}

Kim, D, Lucas, A, Glessner, J, Verma, SS, Bradford, Y, Li, R, Frase, AT, Hakonarson, H, Peissig, P, Brilliant, M & Ritchie, MD 2016, 'Biofilter as a functional annotation pipeline for common and rare copy number burden', Pacific Symposium on Biocomputing, pp. 357-368.

Biofilter as a functional annotation pipeline for common and rare copy number burden. / Kim, Dokyoon; Lucas, Anastasia; Glessner, Joseph; Verma, Shefali S.; Bradford, Yuki; Li, Ruowang; Frase, Alex T.; Hakonarson, Hakon; Peissig, Peggy; Brilliant, Murray; Ritchie, Marylyn Deriggi.

In: Pacific Symposium on Biocomputing, 01.01.2016, p. 357-368.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Biofilter as a functional annotation pipeline for common and rare copy number burden

AU - Kim, Dokyoon

AU - Lucas, Anastasia

AU - Glessner, Joseph

AU - Verma, Shefali S.

AU - Bradford, Yuki

AU - Li, Ruowang

AU - Frase, Alex T.

AU - Hakonarson, Hakon

AU - Peissig, Peggy

AU - Brilliant, Murray

AU - Ritchie, Marylyn Deriggi

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter – a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record – total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis – now including copy number variants.

AB - Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter – a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record – total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis – now including copy number variants.

UR - http://www.scopus.com/inward/record.url?scp=85012180600&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85012180600&partnerID=8YFLogxK

M3 - Conference article

C2 - 26776200

AN - SCOPUS:85012180600

SP - 357

EP - 368

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

ER -

Kim D, Lucas A, Glessner J, Verma SS, Bradford Y, Li R et al. Biofilter as a functional annotation pipeline for common and rare copy number burden. Pacific Symposium on Biocomputing. 2016 Jan 1;357-368.