Measuring the reproducibility and quality of Hi-C data

Galip Gürkan Yardımcı, Hakan Ozadam, Michael E.G. Sauria, Oana Ursu, Koon Kiu Yan, Tao Yang, Abhijit Chakraborty, Arya Kaul, Bryan R. Lajoie, Fan Song, Ye Zhan, Ferhat Ay, Mark Gerstein, Anshul Kundaje, Qunhua Li, James Taylor, Feng Yue, Job Dekker, William S. Noble

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.

Original languageEnglish (US)
Article number57
JournalGenome biology
Volume20
Issue number1
DOIs
StatePublished - Mar 19 2019

Fingerprint

reproducibility
Noise
Benchmarking
DNA Replication
Practice Guidelines
Software
Organizations
Genome
Cell Line
DNA replication
matrix
benchmarking
experiment
Population
Genes
methodology
cell lines
measuring
genome
assay

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

Yardımcı, G. G., Ozadam, H., Sauria, M. E. G., Ursu, O., Yan, K. K., Yang, T., ... Noble, W. S. (2019). Measuring the reproducibility and quality of Hi-C data. Genome biology, 20(1), [57]. https://doi.org/10.1186/s13059-019-1658-7
Yardımcı, Galip Gürkan ; Ozadam, Hakan ; Sauria, Michael E.G. ; Ursu, Oana ; Yan, Koon Kiu ; Yang, Tao ; Chakraborty, Abhijit ; Kaul, Arya ; Lajoie, Bryan R. ; Song, Fan ; Zhan, Ye ; Ay, Ferhat ; Gerstein, Mark ; Kundaje, Anshul ; Li, Qunhua ; Taylor, James ; Yue, Feng ; Dekker, Job ; Noble, William S. / Measuring the reproducibility and quality of Hi-C data. In: Genome biology. 2019 ; Vol. 20, No. 1.
@article{e60d7e0a653045ae940b10a45fe44ab4,
title = "Measuring the reproducibility and quality of Hi-C data",
abstract = "Background: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.",
author = "Yardımcı, {Galip G{\"u}rkan} and Hakan Ozadam and Sauria, {Michael E.G.} and Oana Ursu and Yan, {Koon Kiu} and Tao Yang and Abhijit Chakraborty and Arya Kaul and Lajoie, {Bryan R.} and Fan Song and Ye Zhan and Ferhat Ay and Mark Gerstein and Anshul Kundaje and Qunhua Li and James Taylor and Feng Yue and Job Dekker and Noble, {William S.}",
year = "2019",
month = "3",
day = "19",
doi = "10.1186/s13059-019-1658-7",
language = "English (US)",
volume = "20",
journal = "Genome Biology",
issn = "1474-7596",
publisher = "BioMed Central",
number = "1",

}

Yardımcı, GG, Ozadam, H, Sauria, MEG, Ursu, O, Yan, KK, Yang, T, Chakraborty, A, Kaul, A, Lajoie, BR, Song, F, Zhan, Y, Ay, F, Gerstein, M, Kundaje, A, Li, Q, Taylor, J, Yue, F, Dekker, J & Noble, WS 2019, 'Measuring the reproducibility and quality of Hi-C data', Genome biology, vol. 20, no. 1, 57. https://doi.org/10.1186/s13059-019-1658-7

Measuring the reproducibility and quality of Hi-C data. / Yardımcı, Galip Gürkan; Ozadam, Hakan; Sauria, Michael E.G.; Ursu, Oana; Yan, Koon Kiu; Yang, Tao; Chakraborty, Abhijit; Kaul, Arya; Lajoie, Bryan R.; Song, Fan; Zhan, Ye; Ay, Ferhat; Gerstein, Mark; Kundaje, Anshul; Li, Qunhua; Taylor, James; Yue, Feng; Dekker, Job; Noble, William S.

In: Genome biology, Vol. 20, No. 1, 57, 19.03.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Measuring the reproducibility and quality of Hi-C data

AU - Yardımcı, Galip Gürkan

AU - Ozadam, Hakan

AU - Sauria, Michael E.G.

AU - Ursu, Oana

AU - Yan, Koon Kiu

AU - Yang, Tao

AU - Chakraborty, Abhijit

AU - Kaul, Arya

AU - Lajoie, Bryan R.

AU - Song, Fan

AU - Zhan, Ye

AU - Ay, Ferhat

AU - Gerstein, Mark

AU - Kundaje, Anshul

AU - Li, Qunhua

AU - Taylor, James

AU - Yue, Feng

AU - Dekker, Job

AU - Noble, William S.

PY - 2019/3/19

Y1 - 2019/3/19

N2 - Background: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.

AB - Background: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.

UR - http://www.scopus.com/inward/record.url?scp=85063156719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063156719&partnerID=8YFLogxK

U2 - 10.1186/s13059-019-1658-7

DO - 10.1186/s13059-019-1658-7

M3 - Article

C2 - 30890172

AN - SCOPUS:85063156719

VL - 20

JO - Genome Biology

JF - Genome Biology

SN - 1474-7596

IS - 1

M1 - 57

ER -

Yardımcı GG, Ozadam H, Sauria MEG, Ursu O, Yan KK, Yang T et al. Measuring the reproducibility and quality of Hi-C data. Genome biology. 2019 Mar 19;20(1). 57. https://doi.org/10.1186/s13059-019-1658-7