C40: Enhancing Gene Expression Quantification through Non-Reference Structural Variant Identification

The release of the human reference genome (GRCh38) in 2003 was a significant milestone in genetics, offering potential advancements in understanding and treating genetic diseases and traits. Despite having been gradually improved for two decades, GRCh38 still possesses limitations in measuring gene expressions mainly due to a lack of diversity derived from structural variants (SVs).  SVs are large DNA differences, usually characterized by at least 50 base pairs, between different genomes that include insertions, deletions, or rearrangements. These variations contribute to greater diversity among human populations. Gene structures consist of introns, non-coding sequences, and exons, the protein coding regions of the gene. Gene structure can be altered by SVs and have a significant effect on phenotypes, which could lead to disrupting gene function and regulation, or modifying gene dosage. Historically, SVs have been challenging to detect due to their complex nature. However, recent technological breakthroughs have revolutionized our ability to survey SVs with enhanced precision and sensitivity, operating at the level of fully reconstructed haplotypes. In this study, we aim to identify SVs that cause non-reference gene structure using high-quality haplotype-resolved genomes and corresponding RNA sequencing data from a diverse sample group to address the limitation of the reference genome. To this end, this project utilized High Performance Computing for processing raw sequence reads, transferring gene annotation and SV information from GRCh38 to each haplotype-resolved genome, and identifying candidate SVs. This RNA sequencing analysis allowed us to identify 202 SV candidates that potentially change the gene structure of the reference genome in protein-coding exons. Our findings in this study exemplify that SVs indeed contribute to a more diverse gene structure, suggesting that diversity is needed in the annotation as well as in the genome to better represent the landscape of gene expression across human populations.
Author(s): Grace Koo, Kwondo Kim, PhD., Charles Lee, PhD. FACMG

Mentor: Donghyung Lee, Statistics

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top