I made these notes in 2024 I think. Polished it and made a writeup in 2026-01.
Most biotech software is low quality
Most data formats are not yet standardised across the field. There is not sufficient incentive for people to coordinate and agree on data formats.
Biotech developers often don't know how to write technical docs, and assume that the people using their software are all non-technical people. For example, biopython docs are low quality, scikit-bio docs are better. I think providing guides for non-technical people is fine. But I think you should also write docs for technical people, and a spec for any file formats or standards you invent.
Atleast looking at the code, I get the sense that biotech developers often don't seem to understand concepts like why modularity is good or complexity is bad or why more complex abstraction should be built over simpler ones. Many biotech softwares are basically one commandline tool with like 15 different flags that have nothing to do with each other. Calling any flag runs like 5 different pipeline steps in sequence, and the entire process is a black box. This is just a bad way to design an API or a CLI tool. (I picked 15 and 5 as arbitrarily numbers, but the actual numbers are similar.)
A lot of tools are just written to process raw input from a handful of big machines, often DNA/RNA/protein sequencers (rnaseq, nanopore, NGS etc).
Similarly, a lot of tools are basically implementing the same alignment algorithms many times. I would not be surprised if github has atleast 100 repeated implementations of the same alignment algo, and not even one is good enough to become universal in the field.
Money rules everything
Most biotech work is not profitable yet. Hence software developers don't get paid well.
Therefore, it is funded by academic grantmaking bodies. Academic grantmaking are not willing to pay well for software developers to write good quality codebases or maintain it. (This is obviously a bad decision IMO. Even paying 10-30 software developers who are at a high skill level is enough to accelerate the whole field.)
As a result, often one grad student writes a codebase, publishes their paper or thesis or whatever, and then stops maintaining the code. Grantmaking bodies don't respect developers who publish code that is used by thousands of researchers. But they will respect peer review or citiation counts on academic papers.
Academic papers famously have lower replicability than codebases. This whole system makes it easier to hide behind bad results IMO.
Companies manufacturing these machines are often in a comfortable monopolistic position, and sometimes don't really care to make it easier for developers to work with their tools.
I don't know how companies manufacturing sequencers negotiate contracts with large academic grantmaking bodies. But I am assuming usability of their software is not the primary criteria.
(This is surprising to me. My guess would have been that if there are two companies manufacturing the exact same equipment with the exact same specs, then usability of the software should be number one thing they compete on. I don't understand this industry well enough to comment on why this is not the case.)
Gene cloning is expensive. Hence limited data gets collected. Hence theoreticians who come up with good research hypothesis are more important for progress, as compared to experimentalists or the software developers supporting the experimentalists.
Gene cloning cost atleast $100 in reagents plus multiple days of a researcher's manual labour
Most biotech experiments today are manual labour done by researchers who have practice with many of the failure modes of doing that type of experiment. Inspite of this, many experiments fail.
Most experiments are just pipetting liquids from one vial to another vial, plus a few big machines (PCR machine, DNA sequencer, electron microscope, etc)
I wouldn't be surprised if AI can be used to automate a lot of biotech experiments. If biotech experiments got automated, we would be able to collect a lot more data every time someone had some hypothesis to test.
If we were collecting more data per research iteration, I can see why software developers would be more valuable in analysing that data.
As of today, many research breakthroughs still happen in biotech without relying on a big dataset.
I also no longer think accelerating biotech research in a generic way is a good idea. (This is compared to back when I first looked at this topic.) I would want humanity to figure out some answers for how human genetic engineering, genetically engineered bioweapons and gene drives are governed, before I support general-purpose acceleration of biotech research.