I had the pleasure of attending the Nextflow Summit: Boston 2024 last week. It was a fantastic experience catching up with friends in the Boston bioinformatics community and finally meeting up in person with some of my collaborators from the nf-core community that I mostly just know by their GitHub handles. There were a number of excellent talks, but here some key details and themes that stood out to me.
1) Nextflow adoption and impact is accelerating
In Evan Flodan's welcome address he shared some convincing data from telemetry and other sources showing that Nextflow is the fastest growing workflow management for Biology. One piece of data came from a recent biorxiv pre-print by the EuroFAANG and nf-core teams:
This chart is exciting to me for two reasons:
In comparison to download stats alone, scientific publications likely represent successful scientific output.
Publications are likely a lagging indicator. If the trend in Nextflow citations from 2022 to 2023 holds, we may be seeing a transformational shift in the adoption of containerized workflows for science.
To me, it seems like Nextflow's foundational hypothesis that better software enables better science is increasingly being validated by the numbers.
2) Imaging analysis has a strong and growing presence in the nf-core community.
Although the early years of Nextflow saw adoption most strongly for processing next-generation sequencing (NGS) data, there is a growing diversity in the scientific disciplines that have been taking advantage of the utility it provides. So as someone with a strong scientific interest in image analysis (see pycytominer), I was thrilled to see several talks focused specifically on new Nextflow pipelines designed for large-scale processing of various forms of imaging data. Pipelines were demoed focused on multiple imaging data modalities, including:
Whole slide imaging of tissue sections (Parallelization of computer vision over large corpora of gigapixel biomedical images)
Single-cell spatial omics (SpaFlow: A Nextflow pipeline for single cell phenotyping in spatial omics)
Diffusion MRI Processing (Mining diamonds with wooden hammers : getting diffusion MRI processing into the age of steel)
The recently launched combinatorial fluorescence in-situ hybridization analysis pipeline nf-core/molkart also had a prominent role in Seqera Lab's demo of their new Data Studios feature:
3) Multiple companies are improving their Nextflow infrastructure offerings
As expected, Seqera Labs announced and demoed several new features for Seqera Platform including the previously-mentioned Data Studios. One exciting free-to-everyone feature is Seqera Containers, which provides a dead-simple interface for building on-demand, multi-architecture docker images for any combination of pypi and conda packages.
Beyond that, I was excited to see other companies expanding their Nextflow infrastructure tooling.
Memverge discussed their Memory Machine Cloud offering which can act as a control plane for launching Nextflow pipelines. It has the very clever capability for a spot instance to pause mid-processing and resume on different instance when the spot allocation is reclaimed. Memverge also discussed their effort into addressing the I/O bottleneck of many data-intensive bioinformatics pipelines with their development of the open-source JuiceFS file system.
Re-scale announced their support for an executor plugin that allows Nextflow to orchestrate tightly coupled jobs on Re-scale's cloud-based High-Performance Computing (HPC) or High-Throughput Computing (HTC) platforms.
Finally, Colby Ford from Tuple discussedahab, which can manage Nextflow, Snakemake, WDL and CWL-based in Kubernetes clusters deployed in Azure.
Conclusions
It's clear that the Nextflow ecosystem is rapidly evolving, with growing adoption, expanding use cases in imaging analysis, and significant advancements in infrastructure tooling. It's an exciting time to be part the Nextflow community!
Changelog:
- 2024-06-14 - Updated cover image to picture of me meeting up with nf-core collaborator Maxime Garcia.