Each cell type in the body (like brain or skin cells) uses a specific set of genes, even though all cells share the same genetic sequence. This unique gene usage is guided by the 3D structure of genetic material, which controls gene accessibility.
MIT chemists have developed a faster method to determine 3D genome structures using generative AI, predicting thousands of structures in minutes. This advancement allows researchers to study how 3D genome organization affects gene expression and cell functions more easily.
Bin Zhang, the study’s senior author, believes this could lead to many new opportunities in genomic research.
Inside the cell nucleus, DNA and proteins form chromatin, which allows cells to fit 2 meters of DNA into a tiny space. DNA strands wrap around proteins called histones, creating a “beads on a string” structure.
Epigenetic modifications are chemical tags on DNA that vary by cell type and affect chromatin folding and gene accessibility. These differences determine gene expression in different cells or at different times.
Analyzing the DNA, RNA, and chromatin simultaneously from a single cell(Opens
Over the past twenty years, techniques like Hi-C have been developed to determine chromatin structures. Hi-C links neighboring DNA strands and sequences them to identify which segments are near each other. It can be used on large populations or single cells, but it is labor-intensive and takes about a week to generate data from one cell.
To overcome these challenges, Zhang and his team created ChromoGen, an AI model that quickly and accurately predicts chromatin structures. ChromoGen has two parts:
- A deep learning model that reads the genome and chromatin accessibility data.
- A generative AI model trained on over 11 million chromatin conformations to predict accurate structures.
The first part helps the generative model understand the cell type-specific environment and capture sequence-structure relationships. Due to DNA’s disorderly nature, the model generates many possible structures for each DNA sequence.
Predicting the genome’s structure is complex because there isn’t just one solution but many possible structures. Schuette explains the difficulty of predicting this high-dimensional statistical distribution.
Once trained, the model can generate predictions faster than existing methods like Hi-C. It can produce a thousand structures in 20 minutes on one GPU, compared to taking six months to get a few dozen structures with current methods.
The researchers tested their model on over 2,000 DNA sequences and found that the predictions matched experimental data. The model showed a wide range of possible structures for each sequence, reflecting the diversity in different cells.
The model also accurately predicted data from other cell types, suggesting it could help study how chromatin structures vary between cell types and affect their function. It could explore different chromatin states within a single cell and how mutations change chromatin conformation, potentially revealing how mutations cause disease.
According to Zhang, there are many exciting questions this model can help address.
Journal Reference:
- Greg Schuette, Zhuohan Lao and Bin Zhang. ChromoGen: Diffusion model predicts single-cell chromatin conformations. Science Advances. DOI:10.1126/sciadv.adr8265
Source: Tech Explorist