TY - JOUR
T1 - Toward a Three-Dimensional Chromosome Shape Alphabet
AU - Soto, Carlos
AU - Bryner, Darshan
AU - Neretti, Nicola
AU - Srivastava, Anuj
N1 - Funding Information:
This work was supported by the NIH Common Fund Program, grant U01CA200147, as a Transformative Collaborative Project Award (TCPA) to TCPA-2017-NERETTI to N.N. and A.S., NIH R01 GM126558 to A.S., NSF CDS&E DMS 1953087 to A.S., and NIH/NIA R01AG050582-01A1 to N.N.
Publisher Copyright:
© Copyright 2021, Mary Ann Liebert, Inc., publishers 2021.
PY - 2021/6/1
Y1 - 2021/6/1
N2 - The study of the three-dimensional (3D) structure of chromosomes - the largest macromolecules in biology - is one of the most challenging to date in structural biology. Here, we develop a novel representation of 3D chromosome structures, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet from an ensemble of chromosome 3D structures inferred from Hi-C data - via SIMBA3D or other methods - by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs' 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof of concept for these CSLs by reconstructing independent test curves by using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize shapes in an ensemble of chromosome 3D structures by using generalized sequence logos.
AB - The study of the three-dimensional (3D) structure of chromosomes - the largest macromolecules in biology - is one of the most challenging to date in structural biology. Here, we develop a novel representation of 3D chromosome structures, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet from an ensemble of chromosome 3D structures inferred from Hi-C data - via SIMBA3D or other methods - by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs' 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof of concept for these CSLs by reconstructing independent test curves by using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize shapes in an ensemble of chromosome 3D structures by using generalized sequence logos.
UR - http://www.scopus.com/inward/record.url?scp=85108238966&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108238966&partnerID=8YFLogxK
U2 - 10.1089/cmb.2020.0383
DO - 10.1089/cmb.2020.0383
M3 - Article
C2 - 33720766
AN - SCOPUS:85108238966
SN - 1066-5277
VL - 28
SP - 601
EP - 618
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 6
ER -