<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ken Brewer's BioDev Blog]]></title><description><![CDATA[Musings and ideas from the intersection of biology, software development, and machine learning.

Opinions shared here are strictly my own.]]></description><link>https://kenbrewer.com</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 10:30:50 GMT</lastBuildDate><atom:link href="https://kenbrewer.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Nextflow, nf-core, Seqera and more...]]></title><description><![CDATA[I am excited to announce that starting next week I will be joining Seqera as a Senior Developer Advocate supporting the Nextflow, nf-core and Seqera Platform user communities. For this role, I'm going to be relocating to the San Francisco Bay area to...]]></description><link>https://kenbrewer.com/nextflow-nf-core-seqera-and-more</link><guid isPermaLink="true">https://kenbrewer.com/nextflow-nf-core-seqera-and-more</guid><category><![CDATA[seqera]]></category><category><![CDATA[nf-core]]></category><category><![CDATA[#nextflow]]></category><category><![CDATA[bioinformatics]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[openscience]]></category><category><![CDATA[biotechnology]]></category><category><![CDATA[workflow-orchestration]]></category><dc:creator><![CDATA[Ken Brewer]]></dc:creator><pubDate>Mon, 12 Aug 2024 12:05:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1723481629594/b0f30a0b-1e52-477d-b8b8-a142265db87d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I am excited to announce that starting next week I will be joining <a target="_blank" href="https://seqera.io/">Seqera</a> as a Senior Developer Advocate supporting the Nextflow, nf-core and Seqera Platform user communities. For this role, I'm going to be relocating to the San Francisco Bay area to help build community within the vibrant BioPharma and Tech scenes in that city. As part of introducing myself to the broader community, I wanted to share a little about my journey as a scientist and engineer and why I am so passionate about the powerful open-source and commercial tools that are being built by Seqera.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723213349245/3149ae8f-ac01-46f7-96f9-e2e34a6f1fe0.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-complexity-of-scientific-workflows">Complexity of scientific workflows</h2>
<p>As a PhD student in the <a target="_blank" href="https://breaker.yale.edu/"><strong>Breaker Lab at Yale</strong></a>, my research focus was building computational pipelines for the discovery of novel noncoding RNA motifs. This involved computationally-intensive sifting through terabytes of bacterial genomes looking for RNA motifs with certain structural homologies and gene associations. As the number of different tools I wanted to incorporate into my research pipelines grew, I quickly began dealing with the <strong>complexity of programming large, multi-step pipelines</strong>.</p>
<p>While iterating on a complex, multi-step workflow, minor matters like the naming of intermediate files and discrepancies between my laptop and HPC cluster compute environments can quickly become sources of major frustration. I never ended up finding Nextflow during this time period, but I became strongly interested in software engineering best practices. I was convinced there had to be better ways to address these challenges than bash spaghetti with python meatballs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723212582932/893af285-9c12-4053-938b-cd3d7b1cf9b3.png" alt="Bash spaghetti with python metaballs" class="image--center mx-auto" /></p>
<h2 id="heading-moving-to-biotech-finding-nextflow">Moving to biotech, finding Nextflow</h2>
<p>For my first role out of academia, I joined ProFound Therapeutics in 2021. ProFound was then a stealth-mode biotech startup in the Flagship Pioneering family of companies. I was their second computational hire, and my first project was to scale certain computational analyses to a massive collection of novel proteins they were studying. Before I started building pipelines, I took time to do a careful assessment of modern tooling for orchestrating complex scientific workflows.</p>
<p>There are a number of excellent bio-specific and general tech tools for building reproducible workflows, but <a target="_blank" href="https://www.nextflow.io">Nextflow</a> stood out to me for two key reasons:</p>
<ul>
<li><p>Nextflow had excellent portability, where the same pipeline logic could be executed on a local computer, in AWS Cloud, or in an HPC cluster.</p>
</li>
<li><p><a target="_blank" href="https://nf-co.re/">nf-core</a>, a vibrant community of Nextflow users across the globe, were collaborating on building a collection of gold-standard, open-source bioinformatic pipelines for all kinds of relevant bioinformatic analyses.</p>
</li>
</ul>
<p>After bringing my analysis of Nextflow's advantages to my computational lead, <strong>we decided to make Nextflow a core technology in our platform</strong> and were off to the races.... sort of.</p>
<h2 id="heading-building-with-seqera-platform">Building with Seqera Platform</h2>
<p>It turned out that there was a lot more to setting up a scalable bioinformatic platform than simply choosing to use Nextflow. We ran into headaches setting up our AWS Batch executors, setting up automations, and the need to train scientists who wanted to analyze their own data on the complexities of AWS.</p>
<p>Luckily, Seqera had recently come onto the block with a solution that seemed tailor-made to many of our biggest headaches. Seqera was founded by the original developers of Nextflow and their flagship product was a platform we could deploy into our own AWS account. Not only could it quickly deploy/modify some of the tricky parts of AWS infrastructure, Seqera Platform could set up automations, observability, and provide a user-friendly GUI for non-technical folks to run our Nextflow pipelines.</p>
<p>Thanks to my advocacy and the buy-in of my technical lead, we signed up with Seqera as one of their earliest commercial customers. <strong>The rock-solid combination of Nextflow and Seqera ended up fully living up to its promise and more</strong> as the Seqera team continued to add game-changing new features to both Nextflow and Seqera Platform over the nearly two years I worked with their team as a customer.</p>
<h2 id="heading-super-scaling-bioinformatics">Super-scaling bioinformatics</h2>
<p>For my next role after ProFound Tx, I joined GeneDx as a Senior Software Engineer on the bioinformatics platform team. The opportunity at GeneDx appealed to me for two reasons:</p>
<ul>
<li><p>With hundreds of patients' genetic sequencing tests passing through bioinformatics pipelines every day, the data quality and reliability of my team's work was going to be critically important from Day 1.</p>
</li>
<li><p>I had the opportunity to act as technical lead for parts of the "Cloudflow" project: a major migration of bioinformatic pipelines from WDL running on-prem to Nextflow running in the cloud.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723217345875/6b0dd966-766b-4712-a890-7d793a21116e.jpeg" alt class="image--center mx-auto" /></p>
<p>While the original migration plan involved setting up an in-house Nextflow orchestrator and building a variant-calling pipeline from scratch, we made two changes to the plan not long after I joined:</p>
<ul>
<li><p>Instead of building a Nextflow pipeline from scratch, we decided to try <strong>configuring the open-source</strong> <a target="_blank" href="https://nf-co.re/sarek"><strong>nf-core/sarek</strong></a> <strong>pipeline as the starting point</strong> for our planned production pipeline.</p>
</li>
<li><p>We started a proof-of-concept agreement with Seqera to explore <strong>using Seqera Platform as our pipeline orchestration</strong> solution.</p>
</li>
</ul>
<p>With the combination of Seqera Platform and a gold-standard nf-core pipeline to build on, our small CloudFlow project team began delivering on project milestones at an incredible pace. Within a few short months of our change in direction, we had setup scalable bioinformatics infrastructure connected to Seqera Platform in not one, but two public clouds. We also developed proof-of-concept durable automations that handled the entire process from data coming off the sequencer to having processed genomic variants uploaded to our data portal.</p>
<h2 id="heading-what-next">What next?</h2>
<p>Given the passion I've developed for Nextflow and Seqera products over the past 3+ years as a user and customer, it was a natural fit for me to join Seqera's community team when a position opened up. While the specifics of my role as developer advocate are still to-be-determined, here are a couple of topics that I'm incredibly passionate about that you'll likely hear me talk about in the coming months:</p>
<ul>
<li><p>Highlighting the benefits for biopharma and healthcare companies that choose to build on and contribute to open-source projects like nf-core pipelines.</p>
</li>
<li><p>Expanding the range of scientific disciplines that are choosing to build sharable, reproducible pipelines using Nextflow.</p>
</li>
<li><p>Diversifying the bioinformatics talent and leadership pool by offering training and support to individuals from marginalized and underrepresented communities.</p>
</li>
<li><p>Bringing the power of modern software best practices like continuous integration and continuous deployments to complex data pipelines.</p>
</li>
</ul>
<p>I'm thrilled to be getting more deeply involved in the incredibly vibrant open-source science communities that are built around Nextflow, nf-core, Seqera, and more! I'm also very much looking forward to meeting more of you in person and virtually over the coming months!</p>
]]></content:encoded></item><item><title><![CDATA[Personal highlights from Nextflow Summit: Boston 2024]]></title><description><![CDATA[I had the pleasure of attending the Nextflow Summit: Boston 2024 last week. It was a fantastic experience catching up with friends in the Boston bioinformatics community and finally meeting up in person with some of my collaborators from the nf-core ...]]></description><link>https://kenbrewer.com/personal-highlights-from-nextflow-summit-boston-2024</link><guid isPermaLink="true">https://kenbrewer.com/personal-highlights-from-nextflow-summit-boston-2024</guid><category><![CDATA[microscopy]]></category><category><![CDATA[#nextflow]]></category><category><![CDATA[image processing]]></category><category><![CDATA[MRI ]]></category><category><![CDATA[containers]]></category><category><![CDATA[infrastructure]]></category><dc:creator><![CDATA[Ken Brewer]]></dc:creator><pubDate>Mon, 27 May 2024 14:51:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718369313756/dddddc65-2539-43f9-a1b4-178eac74f111.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I had the pleasure of attending the Nextflow Summit: Boston 2024 last week. It was a fantastic experience catching up with friends in the Boston bioinformatics community and finally meeting up in person with some of my collaborators from the nf-core community that I mostly just know by their GitHub handles. There were a number of excellent talks, but here some key details and themes that stood out to me.</p>
<h2 id="heading-1-nextflow-adoption-and-impact-is-accelerating">1) Nextflow adoption and impact is accelerating</h2>
<p>In Evan Flodan's <a target="_blank" href="https://youtu.be/mvhMnNl9lsQ?si=sX3ntWZZTmRNIvs0&amp;t=82">welcome address</a> he shared some convincing data from telemetry and other sources showing that <strong>Nextflow is the fastest growing workflow management for Biology</strong>. One piece of data came from <a target="_blank" href="https://www.biorxiv.org/content/10.1101/2024.05.10.592912v1.full.pdf+html">a recent biorxiv pre-print</a> by the EuroFAANG and nf-core teams:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716635894940/a18ecdd0-4c88-43e6-ad22-6394923b33df.png" alt class="image--center mx-auto" /></p>
<p>This chart is exciting to me for two reasons:</p>
<ol>
<li><p>In comparison to download stats alone, scientific publications likely represent <strong>successful scientific output</strong>.</p>
</li>
<li><p>Publications are likely a lagging indicator. If the trend in Nextflow citations from 2022 to 2023 holds, we may be seeing a <strong>transformational shift in the adoption of containerized workflows for science</strong>.</p>
</li>
</ol>
<p>To me, it seems like Nextflow's foundational hypothesis that better software enables better science is increasingly being validated by the numbers.</p>
<h2 id="heading-2-imaging-analysis-has-a-strong-and-growing-presence-in-the-nf-core-community">2) Imaging analysis has a strong and growing presence in the nf-core community.</h2>
<p>Although the early years of Nextflow saw adoption most strongly for processing next-generation sequencing (NGS) data, there is a growing diversity in the scientific disciplines that have been taking advantage of the utility it provides. So as someone with a strong scientific interest in image analysis (see <a target="_blank" href="https://github.com/cytomining/pycytominer">pycytominer</a>), I was thrilled to see several talks focused specifically on new Nextflow pipelines designed for large-scale processing of various forms of imaging data. Pipelines were demoed focused on multiple imaging data modalities, including:</p>
<ul>
<li><p>Whole slide imaging of tissue sections (<a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-23--parallelization-of-computer-vision-over/">Parallelization of computer vision over large corpora of gigapixel biomedical images</a>)</p>
</li>
<li><p>Single-cell spatial omics (<a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-23--spaflow-a-nextflow-pipeline-for/">SpaFlow: A Nextflow pipeline for single cell phenotyping in spatial omics</a>)</p>
</li>
<li><p>Diffusion MRI Processing (<a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-23--mining-diamonds-with-wooden-hammers/">Mining diamonds with wooden hammers : getting diffusion MRI processing into the age of steel</a>)</p>
</li>
</ul>
<p>The recently launched combinatorial fluorescence in-situ hybridization analysis pipeline <a target="_blank" href="https://nf-co.re/molkart/1.0.0">nf-core/molkart</a> also had a prominent role in <a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-24--practical-data-studios/">Seqera Lab's demo of their new Data Studios feature</a>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716651568058/990534d3-a8f9-44cb-8d33-bce65a3d0be8.png" alt="Rob Syme from Seqera demoing nf-core/molkart in Data Studios" class="image--center mx-auto" /></p>
<h2 id="heading-3-multiple-companies-are-improving-their-nextflow-infrastructure-offerings">3) Multiple companies are improving their Nextflow infrastructure offerings</h2>
<p>As expected, Seqera Labs announced and demoed several new features for Seqera Platform including the previously-mentioned Data Studios. One exciting free-to-everyone feature is <a target="_blank" href="https://seqera.io/containers/">Seqera Containers</a>, which provides a dead-simple interface for building on-demand, multi-architecture docker images for any combination of pypi and conda packages.</p>
<p>Beyond that, I was excited to see other companies expanding their Nextflow infrastructure tooling.</p>
<p><a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-23--cloud-empowerment-unleashing-the-potential/">Memverge discussed</a> their <a target="_blank" href="https://www.mmcloud.io/solutions/nextflow">Memory Machine Cloud</a> offering which can act as a control plane for launching Nextflow pipelines. It has the very clever capability for a spot instance to pause mid-processing and resume on different instance when the spot allocation is reclaimed. Memverge also discussed their effort into addressing the I/O bottleneck of many data-intensive bioinformatics pipelines with their development of the open-source <a target="_blank" href="https://www.mmcloud.io/blog/juiceflow-a-next-generation-solution-for-nextflow">JuiceFS</a> file system.</p>
<p><a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-23--using-nextflow-to-orchestrate-tightly/">Re-scale announced</a> their support for an <a target="_blank" href="https://rescale.com/blog/automating-scientific-workflows-with-nextflow-on-rescale-for-accelerated-rd-processes/">executor plugin</a> that allows Nextflow to orchestrate tightly coupled jobs on Re-scale's cloud-based High-Performance Computing (HPC) or High-Throughput Computing (HTC) platforms.</p>
<p>Finally, <a target="_blank" href="https://summit.nextflow.io/2024/boston/agenda/05-24--building-enterprise-grade-bioinformatics-pipelines/">Colby Ford from Tuple discussed</a><a target="_blank" href="https://tuple.xyz/solutions/ahab/index.html">ahab</a>, which can manage Nextflow, Snakemake, WDL and CWL-based in Kubernetes clusters deployed in Azure.</p>
<h2 id="heading-conclusions">Conclusions</h2>
<p>It's clear that the Nextflow ecosystem is rapidly evolving, with growing adoption, expanding use cases in imaging analysis, and significant advancements in infrastructure tooling. It's an exciting time to be part the Nextflow community!</p>
<p><em>Changelog:</em></p>
<ul>
<li>2024-06-14 - Updated cover image to picture of me meeting up with nf-core collaborator Maxime Garcia.</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Frictionless development using platforms]]></title><description><![CDATA[When I first started this blog I did so with the idea of trying to exemplify some of the DevOps principles I was excited about within the building and publishing of the website itself. Although neat in principle, this ran into a couple of major heada...]]></description><link>https://kenbrewer.com/frictionless-development-using-platforms</link><guid isPermaLink="true">https://kenbrewer.com/frictionless-development-using-platforms</guid><category><![CDATA[Blogging]]></category><category><![CDATA[Platform Engineering ]]></category><category><![CDATA[development]]></category><category><![CDATA[Developer Tools]]></category><category><![CDATA[Hashnode]]></category><category><![CDATA[Strategy]]></category><dc:creator><![CDATA[Ken Brewer]]></dc:creator><pubDate>Sun, 14 Apr 2024 19:00:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/s4d_ESS0ylA/upload/bdfee42f5fa9d8502e689405d014ba9c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I first started this blog I did so with the idea of trying to exemplify some of the DevOps principles I was excited about within the building and publishing of the website itself. Although neat in principle, this ran into a couple of major headaches in practice. Chief among them was the high maintenance burden and the high activation energy required for writing content.</p>
<h3 id="heading-maintenance-burden">Maintenance Burden</h3>
<p>Over the past year I've ended up tinkering a great deal with the underlying deployment pattern of my website. This included:</p>
<ul>
<li><p>Migrating from Github Pages to Cloudflare Pages for content hosting</p>
</li>
<li><p>Migrating my DNS configuration (and then separately my domain registration) from Google Domains to Cloudflare</p>
</li>
<li><p>Setting up previews of development branches in Cloudflare</p>
</li>
<li><p>Troubleshooting issues with all of the above.</p>
</li>
</ul>
<p>These were all necessary and/or useful improvements, and I had a long list of additional things I wanted to improve further around setting up edge caching and improving load times, but none of these improvements resulted in me writing more content.</p>
<h3 id="heading-activation-energy">Activation Energy</h3>
<p>Besides the maintenance burden I described above, the other reason that I didn't find myself writing a lot of content was the high-activation energy involved in adding new content. Each new article required:</p>
<ul>
<li><p>Creating a new branch in my Github repo</p>
</li>
<li><p>Writing the content</p>
</li>
<li><p>Making a commit</p>
</li>
<li><p>Pushing the commit</p>
</li>
<li><p>Creating a PR</p>
</li>
<li><p>Previewing the PR build</p>
</li>
<li><p>Merging the PR</p>
</li>
</ul>
<p>While the version-control and CI/CD process provides substantial value within the context of building software, it felt like unnecessary overhead. And when I had an idea or thought I wanted to share, I would often get mentally blocked with all the little things that needed to happen.</p>
<p>In the end this high maintenance burden and high activation energy made it too hard to do the thing that my blog was intended to accomplish: provide a platform for me to share my ideas and interests.</p>
<h2 id="heading-going-with-a-platform">Going with a Platform</h2>
<p>In the end, I decided to replace my burdensome self-hosted publishing solution with a good commercial platform: <a target="_blank" href="https://hashnode.com/">Hashnode</a>. I'm not going to do a breakdown of the features and rationale for choosing this platform in particular, but I do want to mention a bit of the "strategic" value of this decision because I think there are some valuable learnings here for teams building computational platforms in biopharma.</p>
<p>Choosing to use a good quality commercial platform to handle undifferentiated parts of your software stack can mean massive time savings from:</p>
<ul>
<li><p>building the features you need now</p>
</li>
<li><p>building features that you don't know you need, but will need in the future</p>
</li>
<li><p>visual tooling and an optimized user interface for routine tasks</p>
</li>
</ul>
<p>That last bullet point is always particularly hard to justify building for an internal tool but can provide massive time savings when routine tasks become less burdensome, frustrating, or fully automated.</p>
<p>I've got a backlog of article ideas that I've been kicking around for the last year, and I expect this new platform-based approach makes it a lot easier to get those out into the world.</p>
]]></content:encoded></item><item><title><![CDATA[A simpler Nextflow template]]></title><description><![CDATA[Nextflow is the the go-to tool for many people in the bioinformatics community who are working on developing data pipelines. Unfortunately, there is a pretty steep learning curve as there is a whole new Groovy-based syntax and framework for code orga...]]></description><link>https://kenbrewer.com/2023-04-07-simple-nextflow-template</link><guid isPermaLink="true">https://kenbrewer.com/2023-04-07-simple-nextflow-template</guid><category><![CDATA[nf-core]]></category><category><![CDATA[#nextflow]]></category><category><![CDATA[template]]></category><dc:creator><![CDATA[Ken Brewer]]></dc:creator><pubDate>Fri, 07 Apr 2023 16:00:00 GMT</pubDate><content:encoded><![CDATA[<p>Nextflow is the the go-to tool for many people in the bioinformatics community who are working on developing data pipelines. Unfortunately, there is a pretty steep learning curve as there is a whole new Groovy-based syntax and framework for code organization to learn. The steepest part of this learning curve in my experience happened when I tried to move from a simple pipeline structure like those present in the main Nextflow documentation, to the fully-featured, best-practice templates used by the open-source nf-core community. To address this steep learning curve I've created a new, slimmed down Nextflow <a target="_blank" href="github.com/kenibrewer/simplenextflow">project template</a> based off nf-core's main template. I hope it can be a stepping stone for intermediate Nextflow developers looking to learn best practices for pipeline development and for experienced Nextflow developers looking for a leaner codebase that can start generating outputs quicker than using the full template.</p>
<h2 id="heading-what-is-nextflow">What is Nextflow?</h2>
<p><a target="_blank" href="https://www.nextflow.io/">Nextflow</a> is a powerful workflow management tool that I frequently use to build, execute, and automate complex scientific pipelines. Its main strength lies in the ability to modularize virtually any program or custom code. Those modular of compute (called processes) can then be strung together into a workflow that can be run identically on a variety of computing infrastructures, including local computers, cloud-based platforms, and high-performance computing clusters. That modularity and portability are certainly two of the features that make Nextflow so popular among the bioinformatics community, but the most useful aspect of working with Nextflow is the open-source community that has developed around it.</p>
<h2 id="heading-the-nf-core-community">The nf-core community</h2>
<p><a target="_blank" href="https://nf-co.re/">nf-core</a> is a community-driven project that provides a set of standardized Nextflow pipelines for some of the most common bioinformatics analyses. Some of these pipelines are very complex, so much so that they are visualized with metro map inspired diagrams like this one from <a target="_blank" href="https://nf-co.re/rnaseq">nf-core/rnaseq</a>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713108984111/ba715711-93f1-4c1e-842b-2cbb4ac0d1e6.png" alt class="image--center mx-auto" /></p>
<p>To manage these complex pipelines, nf-core has also developed a powerful python package called <a target="_blank" href="https://nf-co.re/tools">nf-core/tools</a> that provides a set of command-line tools for creating, linting, testing, and syncing pipelines that adhere to nf-core standards.</p>
<h2 id="heading-challenges-of-working-with-the-default-nf-core-template">Challenges of working with the default nf-core template</h2>
<p>nf-core provides a <a target="_blank" href="https://nf-co.re/tools#creating-a-new-pipeline">template</a> that can be used to create a new pipeline from scratch. This template is a great foundation for building a complex multi-step pipeline, but it can be overwhelming for beginning-to-intermediate Nextflow developers who are just trying to get a simple pipeline up and running while familiarizing themselves with Nextflow best practices. Even for experienced Nextflow developers, the nf-core template can be a bit of a pain to work with because of the many different areas of the template that need to be configured/modified to get some outputs for a new process.</p>
<h2 id="heading-a-simpler-nf-core-based-template">A simpler nf-core-based template</h2>
<p>To address some of the challenges of working with the default nf-core template, I created a <a target="_blank" href="https://github.com/kenibrewer/simplenextflow">simplenextflow</a> template based on the nf-core template that is much simpler to work with. Here are some of the main changes I made:</p>
<ol>
<li><p><strong>Fewer files to search for relevant code</strong></p>
<p> To ensure that the nf-core template is as flexible as possible, it is broken up into several different files that are used to configure various aspects of the pipeline. When I started developing pipelines using nf-core best practices, this was one of the most confusing aspects of the template. In <code>simple-nextflow</code>, I have moved the vast majority of configuration logic back into the <code>nextflow.config</code> file instead of having it imported from other files. Additionally, I have moved all of the workflow and subworkflow logic back into the <code>main.nf</code> file.</p>
</li>
<li><p><strong>Instructions for adapting the pipeline</strong></p>
<p> At the top of the README file, I also added some basic instructions of the core pieces of the pipeline that need to be modified in order to get change the template from the default fastqc example to a new pipeline.</p>
</li>
<li><p><strong>Added config profile and templating for Wave containers</strong></p>
<p> One of the most exciting new features of Seqera Labs has introduced is the ability to use <a target="_blank" href="https://www.nextflow.io/docs/latest/wave.html">Wave containers</a> to run processes in a containerized environment without having to build a new container image for each process. Instead you can simply include a <code>Dockerfile</code> or <code>environment.yml</code> file in the process directory and Nextflow Tower will build a container image for that process on the fly. This is a great feature for developing pipelines because it allows you to quickly test out new code without having to build a new container image for each change. You can access this feature immediately in this template by running the pipeline with the <code>-profile wave</code> flag.</p>
</li>
<li><p><strong>Removal of check_versions and MultiQC</strong></p>
<p> The <code>check_versions</code> process and the <code>MultiQC</code> process are both great tools for ensuring that version information is captured accurately and you can visualize your results. However, they also fall in the category of features that are overwhelming for most new Nextflow developers. I've chosen to remove them from the template, to help make it easier to provide a simpler logic in the main.nf file. Best practices for capturing version information is still very important, and should be added back in for any pipeline that is intended for production use.</p>
</li>
<li><p><strong>Keeping samplesheet logic</strong></p>
<p> The concept of using a samplesheet to define the inputs was something that I considered cutting, because modifying the <code>bin/check_samplesheet.py</code> script to work with a new pipeline is one of the most time-consuming parts of adapting the template for a new pipeline. However, I think that setting up proper associations between files and their metadata is one of the most important practices in good pipeline development, so I decided to keep it in the template. I've been thinking about how to make the process of customizing the <code>bin/check_samplesheet.py</code> script easier, but I haven't come up with a good solution yet. Let me know if you have any ideas!</p>
</li>
<li><p><strong>Things that are still in the template</strong></p>
<p> Many of the wonderful quality of life features included in nf-core pipelines are still present in this slimmed down version. These are all features of the nf-core template that essentially work "out-of-the-box" with no additional configuration needed that would slow down the process to begin generating outputs. Some of my favorite features I was able to keep in the template are:</p>
<ul>
<li><p>Reproducible development environments using Codespaces and Gitpod</p>
</li>
<li><p>Colorful logging and output of non-default parameters</p>
</li>
<li><p>Email and slack notifications</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>I hope this template is useful to anyone looking to familiarize themselves with nf-core best practices in more bite-sized chunks or are just looking for a simple template to get them to their desired outputs more quickly. Let me know if you have any suggestions for improving the template in the comments below or by making an issue/feature request in the <a target="_blank" href="github.com/kenibrewer/simplenextflow">GitHub repo</a>.</p>
<h2 id="heading-references">References</h2>
<ul>
<li><p><a target="_blank" href="https://www.nextflow.io/">Nextflow</a></p>
</li>
<li><p><a target="_blank" href="https://nf-co.re/">nf-core</a></p>
</li>
<li><p><a target="_blank" href="https://nf-co.re/docs/contributing/tutorials/creating_with_nf_core">Creating pipelines with nf-core</a></p>
</li>
</ul>
<h3 id="heading-blog-post-changelog">Blog post changelog</h3>
<ul>
<li>2023-04-08 - Added new intro section</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Setting up a personal blog with CI/CD]]></title><description><![CDATA[Why blog?
Based on the recommendation of multiple colleagues in the Flagship Pioneering informatics community, I recently listened to The Phoenix Project on audiobook. This novel focuses on a struggling IT department within the fictional company Part...]]></description><link>https://kenbrewer.com/setting-up-a-personal-blog-with-cicd</link><guid isPermaLink="true">https://kenbrewer.com/setting-up-a-personal-blog-with-cicd</guid><category><![CDATA[Devops]]></category><category><![CDATA[biotechnology]]></category><category><![CDATA[GitHubPages]]></category><category><![CDATA[jekyll]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[GitHub Actions]]></category><dc:creator><![CDATA[Ken Brewer]]></dc:creator><pubDate>Sun, 02 Apr 2023 16:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1713108073520/53d75293-d214-4748-8da2-cc01b246d792.avif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-why-blog">Why blog?</h2>
<p>Based on the recommendation of <em>multiple</em> colleagues in the Flagship Pioneering informatics community, I recently listened to <a target="_blank" href="https://itrevolution.com/product/the-phoenix-project/">The Phoenix Project</a> on audiobook. This novel focuses on a struggling IT department within the fictional company Parts Unlimited, and how the main character, Bill, turns things around by implementing the core principles of the DevOps movement. It is an engaging and educational read, and I highly recommend it to anyone interested in DevOps.</p>
<p>Although Bill's story is set in a medium-sized manufacturing company, I found many of the challenges he faced relatable as a computational biologist working in a small biotech startup. His frantic efforts to ensure integrity of critical data for his colleagues in HR and finance reminded me of the urgency I feel when delivering bioinformatic analyses to bench scientists who need them for their experiments. The novel inspired me to consider how I could apply DevOps principles to the specific problems I face at work.</p>
<p>As I started diving deeper into DevOps, I was somewhat surprised to discover that I already possessed all the core technical skills. I had simply lacked a framework to apply those technical skills in an integrated, synergistic way. That led to my decision to set up a blog focused on applying the principles of DevOps to specific problems faced by computational biologists, bioinformaticians, machine learning engineers and data scientists who work in the biotech and pharmaceutical industries.</p>
<h2 id="heading-criteria-for-the-blog">Criteria for the blog</h2>
<p>If I wanted to write a blog that would in part be focused on DevOps, I wanted to set up the website using that same set of principles I wanted to write about. The DevOps principles I wanted to apply to the blog were:</p>
<ul>
<li>Automate everything</li>
<li>Use version control</li>
<li>Use a CI/CD pipeline to deploy the site</li>
</ul>
<p>I also had a few other criteria for the blog:</p>
<ul>
<li>Use a static site generator to make the blog easy to maintain.</li>
<li>Be able to write new posts in markdown.</li>
<li>Keep costs as low as possible, preferably free.</li>
<li>Be able to use a custom domain name.</li>
<li>Have sufficient flexibility to customize the look and feel of the site.</li>
</ul>
<h2 id="heading-static-site-generator">Static site generator</h2>
<p>I decided to use <a target="_blank" href="https://jekyllrb.com/">Jekyll</a> as the static site generator for the blog. Jekyll is a popular choice for static site generators, and is well supported by GitHub Pages. Because this was my first time using Jekyll and I don't believe in re-inventing the wheel, I decided to use a pre-built theme to get started. I chose <a target="_blank" href="https://beautifuljekyll.com/">Beautiful Jekyll</a> because it is well documented, and has a clean, modern look.</p>
<h2 id="heading-setting-up-the-blog-using-cicd">Setting up the blog using CI/CD</h2>
<p>I largely followed the instructions on the Beautiful Jekyll website to set up the blog, but I made a few modifications in-line with this blog's focus on DevOps and automation.</p>
<h3 id="heading-setting-up-a-devcontainer">Setting up a DevContainer</h3>
<p>Earlier this year, I led an effort at ProFound to set up standardized development environments. I'll write more about that effort in a future post, but I wanted to use the same approach for this blog. VsCode makes it trivially easy to <a target="_blank" href="https://code.visualstudio.com/docs/devcontainers/create-dev-container#_automate-dev-container-creation">set up a DevContainer</a> for a project even for a project that uses a framework/language that you are not familiar with. Using the "Add Development Container Configuration Files" command, I was able to quickly generate a `devcontainer.json` file that would launch a Docker container with all the necessary dependencies for Beautiful Jekyll.</p>
<pre><code class="lang-json">{ 
    <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Jekyll"</span>,
    <span class="hljs-attr">"image"</span>: <span class="hljs-string">"mcr.microsoft.com/devcontainers/jekyll:0-buster"</span>,
    <span class="hljs-attr">"features"</span>: { <span class="hljs-attr">"ghcr.io/devcontainers/features/node:1"</span>: {} },
    <span class="hljs-attr">"forwardPorts"</span>: [<span class="hljs-number">4000</span>],
    <span class="hljs-attr">"postCreateCommand"</span>: <span class="hljs-string">"bundle exec jekyll serve --watch"</span> 
}
</code></pre>
<p>As soon as this file was saved, VSCode automatically prompted me to re-open the project in the container.</p>
<h3 id="heading-setting-up-a-cloud-ide">Setting up a Cloud IDE</h3>
<p>One of the benefits of using a DevContainer is that you can use the same container to develop locally or in the cloud using <a target="_blank" href="https://github.com/features/codespaces">Github Codespaces</a>. Github Codespaces has a generous free tier, and is a great way to reduce the friction of setting up a new project. You can read more about how to set up a Github Codespace in the <a target="_blank" href="https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace">Github documentation</a>.</p>
<h2 id="heading-setting-up-a-cicd-pipeline">Setting up a CI/CD pipeline</h2>
<p>The original repo for Beautiful Jekyll included a simple <a target="_blank" href="https://github.com/daattali/beautiful-jekyll/blob/e1facea35a0a8ee81bc204db10039d5b53837a39/.github/workflows/ci.yml">GitHub Actions workflow</a> While enabling the Github Pages feature in the repo settings, I found a template Github Actions pipeline that can be used to build and deploy the site instead: </p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Sample workflow for building and deploying a Jekyll site to GitHub Pages</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">Jekyll</span> <span class="hljs-string">site</span> <span class="hljs-string">to</span> <span class="hljs-string">Pages</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span> [<span class="hljs-string">"main"</span>]
  <span class="hljs-attr">workflow_dispatch:</span>

<span class="hljs-comment"># Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages</span>
<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>
  <span class="hljs-attr">pages:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>

<span class="hljs-comment"># Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.</span>
<span class="hljs-comment"># However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.</span>
<span class="hljs-attr">concurrency:</span>
  <span class="hljs-attr">group:</span> <span class="hljs-string">"pages"</span>
  <span class="hljs-attr">cancel-in-progress:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-comment"># Build job</span>
  <span class="hljs-attr">build:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Setup</span> <span class="hljs-string">Ruby</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">ruby/setup-ruby@ee2113536afb7f793eed4ce60e8d3b26db912da4</span> <span class="hljs-comment"># v1.127.0</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">ruby-version:</span> <span class="hljs-string">'3.1'</span> <span class="hljs-comment"># Not needed with a .ruby-version file</span>
          <span class="hljs-attr">bundler-cache:</span> <span class="hljs-literal">true</span> <span class="hljs-comment"># runs 'bundle install' and caches installed gems automatically</span>
          <span class="hljs-attr">cache-version:</span> <span class="hljs-number">0</span> <span class="hljs-comment"># Increment this number if you need to re-download cached gems</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Setup</span> <span class="hljs-string">Pages</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">pages</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/configure-pages@v3</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">with</span> <span class="hljs-string">Jekyll</span>
        <span class="hljs-comment"># Outputs to the './_site' directory by default</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">bundle</span> <span class="hljs-string">exec</span> <span class="hljs-string">jekyll</span> <span class="hljs-string">build</span> <span class="hljs-string">--baseurl</span> <span class="hljs-string">"$<span class="hljs-template-variable">{{ steps.pages.outputs.base_path }}</span>"</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">JEKYLL_ENV:</span> <span class="hljs-string">production</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upload</span> <span class="hljs-string">artifact</span>
        <span class="hljs-comment"># Automatically uploads an artifact from the './_site' directory by default</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/upload-pages-artifact@v1</span>

  <span class="hljs-comment"># Deployment job</span>
  <span class="hljs-attr">deploy:</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">github-pages</span>
      <span class="hljs-attr">url:</span> <span class="hljs-string">${{</span> <span class="hljs-string">steps.deployment.outputs.page_url</span> <span class="hljs-string">}}</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">needs:</span> <span class="hljs-string">build</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">to</span> <span class="hljs-string">GitHub</span> <span class="hljs-string">Pages</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">deployment</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/deploy-pages@v2</span>
</code></pre>
<h2 id="heading-setting-up-a-custom-domain">Setting up a custom domain</h2>
<p>I wanted to use a custom domain name for the blog, so I followed the instructions on the <a target="_blank" href="https://docs.github.com/en/pages/configuring-a-custom-domain-for-your-github-pages-site/managing-a-custom-domain-for-your-github-pages-site">Github Pages documentation</a> to set up a custom domain.
In order to use a custom domain, you need to create a CNAME record in your DNS settings that points to <code>&lt;username&gt;.github.io</code>.
I did this manually in the console for Cloudflare, but I plan to integrate these settings into a Terraform configuration file that I'll build into this repo in the future.
I could have waited until I had the Terraform setup ready to go too, but getting a minimum viable product up and running quickly is a key part of the DevOps philosophy.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>I'm really happy with how with how simple it was to set up this blog using DevOps principles.
It only took a few hours to set up the blog, and I have something simple and robust enough for me to write posts and iterate on with little-to-no overhead.
I'm looking forward to writing more posts in the future, and I hope you'll join me on this journey!</p>
<h2 id="heading-references">References</h2>
<ul>
<li><a target="_blank" href="https://beautifuljekyll.com/">Beautiful Jekyll</a></li>
<li><a target="_blank" href="https://jekyllrb.com/">Jekyll</a></li>
<li><a target="_blank" href="https://pages.github.com/">Github Pages</a></li>
<li><a target="_blank" href="https://docs.github.com/en/actions">Github Actions</a></li>
</ul>
<h4 id="heading-image-credits">Image credits</h4>
<p>Cover image credit by <a target="_blank" href="https://unsplash.com/@silvawebdesigns">Nathan da Silva</a> on <a target="_blank" href="https://unsplash.com/photos/k-rKfqSm4L4">Unsplash</a></p>
]]></content:encoded></item></channel></rss>