Talks, presentations and posters

Using statistical profiling to decipher hidden chromatin contacts resulting from repeated sequences

November 26, 2024

Talk, Institut Pasteur, Paris, France

During their evolution, the genomes of micro-organisms can acquire quantities of different repeated elements such as retrotransposons, duplicated genes or tandem repeats. This type of sequence within genomes cannot be processed directly using high throughput sequencing technologies because they generate short sequences that cannot be located unambiguously on reference genomes. This type of data is filtered by most current standard pipelines, resulting in a significant loss of information on biological functions, processes and genomic structures involving repeated elements. We developed Hicberg, an algorithm that uses statistical inference and pseudo-random generators to predict the positions of repeated sequence reads from different omics paired-end data (including contact technologies like Hi-C, nucleosomes or proteins positioning technologies like Mnase-seq or ChIP-seq). After developing the method and calibrating it on a test bench, we explored during this PhD project how it improves genomic data interpretability of various species, starting with microbial ones such as Saccharomyces cerevisiae.

Hicberg : Prediction of omics signals from repeated elements

June 26, 2024

Poster, Paul Sabatier University, Toulouse, France

Repeated genomic elements (like retrotransposons) are poorly mapped by standard NGS pipelines, resulting in a significant loss of information regarding biological functions and genomic structures. To overcome this, we developed Hicberg, an algorithm that uses statistical inference trained on unambiguous genome regions to predict the positions of reads from these repeated sequences. Hicberg successfully reconstructs complete genomic tracks from various paired-end omics data, including Hi-C and ChIP-seq, significantly improving data completeness and interpretability.

Hicberg – Hi-C Biological Estimation of Reapeated elements in Genomes

April 22, 2024

Poster, Palais des Congrès, Saint-Malo, France

During their evolution, the genomes of micro-organisms can acquire quantities of different repeated elements such as retrotransposons, duplicated genes or tandem repeats. This type of sequence within genomes cannot be processed directly by NGS technologies because they generate short reads that cannot be located unambiguously on reference genomes. This information is filtered out by most current pipelines, leading to incomplete genomic tracks resulting in a significant loss of information on biological functions, processes and genomic structures involving repeated elements. We developed Hicberg, an algorithm that uses statistical inference and pseudo-random generators to predict the positions of repeated sequence reads from different omics paired-end data (including Hi-C, Mnase-seq or ChIP-seq). After developing the method and calibrating it on a test bench, we explored during this PhD project how it improves genomic data interpretability of various species, starting with microbial ones such as Saccharomyces cerevisiae. Reconstruction of Hi-C and ChIP-seq genomic tracks with Hicberg revealed how some retrotransposons in this model organism contribute in the positioning of cohesin, a molecular motor involved in the formation of chromatin loops. A new role for retrotransposons sequences as contact points for the elusive yeast 2 micron episomal molecule was also identified. Overall, these results underline the power of the approach to discover new novel molecular relationships, and the interest in applying this tool more widely to larger genomes with greater quantities of repeats. The proposed method can therefore provide an alternative visualization of genomic signals in a wide variety of biological conditions and allow a more comprehensive view of genome organization and plasticity. Importantly, existing datasets can be revisited using the approach to unveil overlook features.

Statistical inference of repeated sequence contacts in Hi-C maps

July 25, 2023

Conference talk, Intelligent Systems for Molecular Biology (ISMB) / Europen Conference on Computational Biology (ECCB) 2023, Lyon, Auvergne Rhône Alpes, France

Increasingly detailed investigations of the spatial organization of genomes reveal that chromosome folding influences or regulates dynamic processes such as transcription, DNA repair and segregation. Hi-C approach is commonly used to characterize genome architecture by quantifying physical contacts’ frequency between pairs of loci through high-throughput sequencing. These sequences cause challenges during the analysis’ alignment step, due to the multiplicity of plausible positions to assign sequencing reads. These unknown parts of the genome architecture, that may contain biological information, remains hidden throughout downstream functional analysis. To overcome these limitations, we have developed HiC-BERG, a method combining statistical inference with input from DNA polymer behavior characteristics and features of the Hi-C protocol to assign with robust confidence repeated reads in a genome and “fill-in” empty vectors in contact maps. HiC-BERG is intended to be applicable to different types of organisms. We will present the program and key validation tests, before applying it to unveil hidden parts of the genomes of E.coli, Saccharomyces cerevisiae and Plasmodium falciparum. HiC-BERG shows that repeated sequences may be involved in singular genomic architectures. Our method can provide an alternative visualization of genomic contacts under a wide variety of biological conditions allowing a more complete view of genome plasticity.