Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in HAL Thesis, 2024
Recommended citation: Sébastien Gradit. Using statistical profiling to decipher hidden chromatin contacts resulting from repeated sequences. Genetics. Sorbonne Université, 2024. English. ⟨NNT : 2024SORUS494⟩. ⟨tel-05010244⟩
Download Paper
Published in Pending, 2025
This paper introduces Hicberg, a new method to more accurately map the 3D organization of genomes by resolving ambiguities caused by repetitive DNA sequences in Hi-C data, revealing potentially hidden structural features in various organisms.
Published:
Increasingly detailed investigations of the spatial organization of genomes reveal that chromosome folding influences or regulates dynamic processes such as transcription, DNA repair and segregation. Hi-C approach is commonly used to characterize genome architecture by quantifying physical contacts’ frequency between pairs of loci through high-throughput sequencing. These sequences cause challenges during the analysis’ alignment step, due to the multiplicity of plausible positions to assign sequencing reads. These unknown parts of the genome architecture, that may contain biological information, remains hidden throughout downstream functional analysis. To overcome these limitations, we have developed HiC-BERG, a method combining statistical inference with input from DNA polymer behavior characteristics and features of the Hi-C protocol to assign with robust confidence repeated reads in a genome and “fill-in” empty vectors in contact maps. HiC-BERG is intended to be applicable to different types of organisms. We will present the program and key validation tests, before applying it to unveil hidden parts of the genomes of E.coli, Saccharomyces cerevisiae and Plasmodium falciparum. HiC-BERG shows that repeated sequences may be involved in singular genomic architectures. Our method can provide an alternative visualization of genomic contacts under a wide variety of biological conditions allowing a more complete view of genome plasticity.
Published:
During their evolution, the genomes of micro-organisms can acquire quantities of different repeated elements such as retrotransposons, duplicated genes or tandem repeats. This type of sequence within genomes cannot be processed directly by NGS technologies because they generate short reads that cannot be located unambiguously on reference genomes. This information is filtered out by most current pipelines, leading to incomplete genomic tracks resulting in a significant loss of information on biological functions, processes and genomic structures involving repeated elements. We developed Hicberg, an algorithm that uses statistical inference and pseudo-random generators to predict the positions of repeated sequence reads from different omics paired-end data (including Hi-C, Mnase-seq or ChIP-seq). After developing the method and calibrating it on a test bench, we explored during this PhD project how it improves genomic data interpretability of various species, starting with microbial ones such as Saccharomyces cerevisiae. Reconstruction of Hi-C and ChIP-seq genomic tracks with Hicberg revealed how some retrotransposons in this model organism contribute in the positioning of cohesin, a molecular motor involved in the formation of chromatin loops. A new role for retrotransposons sequences as contact points for the elusive yeast 2 micron episomal molecule was also identified. Overall, these results underline the power of the approach to discover new novel molecular relationships, and the interest in applying this tool more widely to larger genomes with greater quantities of repeats. The proposed method can therefore provide an alternative visualization of genomic signals in a wide variety of biological conditions and allow a more comprehensive view of genome organization and plasticity. Importantly, existing datasets can be revisited using the approach to unveil overlook features.
Published:
Repeated genomic elements (like retrotransposons) are poorly mapped by standard NGS pipelines, resulting in a significant loss of information regarding biological functions and genomic structures. To overcome this, we developed Hicberg, an algorithm that uses statistical inference trained on unambiguous genome regions to predict the positions of reads from these repeated sequences. Hicberg successfully reconstructs complete genomic tracks from various paired-end omics data, including Hi-C and ChIP-seq, significantly improving data completeness and interpretability.
Published:
During their evolution, the genomes of micro-organisms can acquire quantities of different repeated elements such as retrotransposons, duplicated genes or tandem repeats. This type of sequence within genomes cannot be processed directly using high throughput sequencing technologies because they generate short sequences that cannot be located unambiguously on reference genomes. This type of data is filtered by most current standard pipelines, resulting in a significant loss of information on biological functions, processes and genomic structures involving repeated elements. We developed Hicberg, an algorithm that uses statistical inference and pseudo-random generators to predict the positions of repeated sequence reads from different omics paired-end data (including contact technologies like Hi-C, nucleosomes or proteins positioning technologies like Mnase-seq or ChIP-seq). After developing the method and calibrating it on a test bench, we explored during this PhD project how it improves genomic data interpretability of various species, starting with microbial ones such as Saccharomyces cerevisiae.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.