Transformer-based modeling to study repetitive sequences of the human genome

Andres D. Chamorro Parejo, Jaime Seguel, Kenneth S. Ramos

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Technological breakthroughs in high-throughput sequencing platforms have triggered a revolution in genomics, leading to an exponential growth in the volume and size of genomic datasets. However, this rapid increase in data poses significant challenges in terms of processing and analysis. Traditional alignment and mapping methods are no longer sufficient or optimal for certain bioinformatics tasks, considering the scale and complexity of the data. An illustrative example is the identification of LINE-1 elements in the genome, where accurate detection of associated variations within a sample poses considerable challenges. As a result, there is a pressing need for advanced methodologies that can effectively handle the intricacies of large-scale genomic datasets, such as the incorporation of innovative approaches like transformer-based models to improve the identification and analysis of transposable elements. This chapter describes the basic computational aspects of developing a transformer-based model in a genomics task. In particular, the chapter shows the power of transformer-based models in classifying sequences with presence or absence of LINE-1.

Original languageEnglish (US)
Title of host publicationComprehensive Precision Medicine, First Edition, Volume 1-2
PublisherElsevier
Pages75-82
Number of pages8
Volume1
ISBN (Electronic)9780128240106
DOIs
StatePublished - 2024

Keywords

  • Human genome
  • LINE-1
  • Machine learning models
  • Natural language processing (NLP)
  • RNA-binding protein
  • Single nucleotide polymorphisms
  • Transformers
  • Transposable elements

ASJC Scopus subject areas

  • General Agricultural and Biological Sciences
  • General Biochemistry, Genetics and Molecular Biology

Fingerprint

Dive into the research topics of 'Transformer-based modeling to study repetitive sequences of the human genome'. Together they form a unique fingerprint.

Cite this