Sequence analysis tasks (e.g. read alignment, genome and transcriptome assembly, sequence sketching and indexing) have traditionally relied on combinatorial optimization and discrete algorithmic approaches. In recent years, we have witnessed increasing success in applying machine learning (ML) and artificial intelligence (AI) to these fundamental problems. Notable advances include the learning of hashing functions and sketches, ML-based variant calling, transcript discovery and scoring, and the design of functional sequence elements, to just name a few. With the exponential growth of sequencing data and the rapid development of ML/AI techniques capable of learning from large-scale data efficiently, we anticipate continued breakthroughs in using ML/AI to both existing and emerging problems in sequence analysis.
This Collection solicits work that develops ML/AI methods (e.g., deep learning, reinforcement learning, generative AI, etc) for sequence analysis.
We particularly welcome submissions that address, but not limited to, the following themes:
- Formulating sequence analysis tasks as learnable problems
- Designing new ML models for various sequence analysis applications
- Integrating machine learning with combinatorial optimization to enhance performance
- Training new models or adapting/tuning pre-trained models for sequence analysis tasks
We encourage submissions across the full spectrum of sequence analysis, including (but not limited to) the following topics:
- Sequence indexing, sketching, seeding, compression, and storage
- Sequence alignment and alignment-free sequence comparison
- Sequence similarity search and classification
- Phylogenetic reconstruction from sequencing data
- Long-read data analysis and error correction
- Genome assembly
- Variant calling
- Transcript and isoform reconstruction and quantification
- Alternative splicing and gene fusion analysis
- Sequence design (e.g., codon optimization, RNA design, regulatory element engineering)