Omic.ly Weekly 83

July 14, 2025

Hey There!

Thanks for spending part of your week with Omic.ly!


This Week's Headlines

1) Fuzzy Sequencing: Sometimes close enough is good enough

2) The "Microbiome" isn't just one thing

3) How Linus Pauling used electrophoresis to characterize sickle cell anemia in 1949

4) Weekly Reading List


DNA is coded by 4 bases, but no one said that's how we have to sequence it. Ready to have your mind bent?

My brain was never more broken than when I tried to learn about SOLiD's color space but "Fuzzy Sequencing" is a close contender!

If you remember the ABI SOLiD sequencer, it operated VERY differently than the sequencing by synthesis technology that has come to dominate sequencing for the past 2 decades.

SOLiD was a "sequencing by ligation" technology that ligated fluorescently labeled nucleotide pairs together to determine the sequence of a target DNA molecule.

It also employed an incomprehensible color encoding scheme where 4 dyes were used to label the 16 (4x4) nucleotide pairs and the sequence was determined by deconvoluting the sequences obtained using 3 different sequencing primers (each shifted by one base).

If that sounds confusing, it is - but it helped introduce this concept that just because there's 4 bases, it doesn't mean you have to sequence them individually in a strictly linear fashion.

Oxford Nanopore (ONT) has done something similar with its sequence detection scheme where their pores aren't actually sequencing individual bases at any given time...they're predicting what Kmer (an ~ 6 base sequence) is present in the pore.

It's much easier to try to detect a Kmer than to detect the signal of a single base as it occludes a nanopore!

But both SOLiD and ONT use(d) their sequencing schemes to try to identify an accurate sequence of the individual bases contained in the DNA fragment being sequenced.

What if you didn't care what the sequence actually was?

Are there more efficient ways to identify a target molecule than by sequencing it at single base resolution?

Those might sound like insane questions, but some of the most lucrative applications of sequencing technology DO NOT require single base resolution to give us the answers we're seeking.

For example, most pathogen detection or microbial identification tests, non-invasive prenatal testing for chromosomal anomalies, or whole transcriptome profiling - basically anything where we're counting/detecting fragments - don't need single base resolution to accurately identify those things!

So, it could make sense to create a new sequencing technology that encodes genetic information more cost effectively and efficiently for those kinds of applications.

Enter Fuzzy Sequencing!

Instead of reading out individual bases, Fuzzy Sequencing provides a "Fuzzier" read out which is depicted in the figure above not as bases but as an encoded flowgram! (see d)

a) It uses one (BitSeq) or two (SuperBitSeq) fluorgenic dyes

b) tagged bases are added to each cycle as 'flowgrams' (eg K = G/T and M = A/C) are flowed together and unlike Illumina sequencing that uses reversible terminators, this chemistry does not, so multiple bases can be added every cycle (similar to how IonProton pyrosequencing works) but that's ok because the number of bases added can be determined by the fluorescence intensity of the signal.

c) the sequencing reaction is based on bead emulsion PCR where beads capture a single sequence, amplify it on the bead surface and then that bead is deposited in a nanowell for the sequencing reaction - here the sequencing cycles proceed by sealing and unsealing the wells with oil to prevent reagent diffusion or signal mixing

d) compares the sequencing efficiency of a single base addition per cycle to that of BitSeq and SuperBitSeq - which is to say that adding multiple bases per cycle is faster and cheaper with a readout of not A,C,T, and G but as K and M!

And if you're having trouble understanding how this encoding works to identify a genetic sequence, the researchers provided this fun diagram:

Obviously encoding information this way means that there will be base ambiguity at some positions (see 'missed' above), but for the applications where you'd be using this, you don't care!

The researchers show in follow-up experiments that Fuzzy Sequencing is able to accurately detect large copy number variations in prenatal testing cases (including chromosomal microdeletions) and identify microbes from throat and anal swabs.

While I don't see Fuzzy Sequencing taking the sequencing community by storm, it's an interesting thought experiment and if developed commercially could be a nice alternative to multiplex digital or real-time PCR.

It could also make prenatal testing, transcriptomics, and even oncology testing significantly more cost effective!


“The microbiome” isn't just one thing. It's lots of things and each habitat has its own.

To say the human microbiome is complicated is an understatement.

It's generally agreed that a microbiome is the community of organisms (bacteria, fungi, viruses, archaea) that live and interact in a particular habitat inclusive of the environmental, physical and chemical properties of that habitat.

This means that we're surrounded by microbiomes and if we're talking about them from the human perspective, there are at least 6 that we need to discuss:

Respiratory tract - The upper airway, your nose and the cavity behind it are colonized by specific microbes that thrive in the mucosa that helps to warm, humidify and filter the air that we breathe. Similarly, the lower respiratory tract and lungs host their own crew of microbes. The microbiomes in these regions change significantly during respiratory infections or after the development of respiratory diseases like asthma.

Eye - Anyone who has ever suffered from conjunctivitis (infection of the eye) understands the importance of having a healthy eye microbiome. But the eye has three distinct habitats which include the outer eye, the conjunctiva (the membrane that covers the eye), and the meibum (the fatty, slippery, liquid that lubricates our eyes!). Each of these are inhabited by specific microbes that thrive in these ocular environments!

Oral - Our mouth is one of the microbiomes we’re most conscious of. Dental hygiene is extremely important for human health, and a dysfunctional oral microbiome is typified by unpleasant odors as a result of infections of our gums and teeth.

Skin - This is one of the more complicated microbiomes to classify since “skin” is present on every outer surface of our body. The skin microbiome can differ significantly depending on whether we’re talking about our armpit, scalp, or face! But be aware, this surface that you scrub clean every day is host to trillions of microbes!

Urogenital - This primarily refers to the vaginal microbiome which changes throughout a woman’s cycle and during pregnancy. The microbial community here is sensitive to changes in pH (and can cause changes in pH!). Dysfunction can lead to infertility or even miscarriages.

Digestive - The “gut” microbiome is probably the one that we’ve heard the most about. But the digestive tract is made up of the stomach, small intestine, large intestine, and colon. Each of these has a distinct physical and chemical environment, and you guessed it, each supports its own unique microbial community!

While we contain multitudes of microbes that form communities on and in various parts of our body, we’re still at the earliest stages of trying to tease out the cause and effect relationships here that impact human health.

While it’s pretty easy to define an unhealthy microbiome when we have an infection, identifying the factors that create and maintain a healthy one is still a huge work in progress!


Sickle Cell Anemia was the first inherited disease to be molecularly characterized. It was done in 1949 using a revolutionary new method: electrophoresis.

Sickle cell anemia affects 4.4m people, and 43m are carriers of the trait.

It is characterized by the crescent, or sickle shape, of the red blood cells of those affected by the disease.

James Herrick first discovered sickle-shaped blood cells in a patient suffering from severe anemia in 1910.

Through subsequent observation it was realized that there was an asymptomatic form of the disease, sickle cell trait.

In those individuals it appeared that they had a mixture of normal and sickle blood cells.

Further study within the families of these individuals in 1923 revealed that sickle cell was hereditary or passed down from parents to their offspring.

And because those with sickle cell trait appeared to have a 50/50 mix of sickle/normal blood cells, it was determined that this was a recessive Mendelian disease.

Linus Pauling, a titan of early molecular biology, was no stranger to blood or the protein hemoglobin and spent many years in the 1930’s studying hemoglobin’s interactions with oxygen.

Pauling had a suspicion that the structure of proteins played a vital role in their function and was first introduced to sickle cell anemia in 1945.

He hypothesized that the sickling of cells could be related to a change in the structure of hemoglobin since red blood cells are literally just bags that contain a boat load of hemoglobin protein.

So he and his team, Harvey Itano and John Singer, tried to figure out a way that they could show that a difference in the structure of hemoglobin was the cause of sickle cell anemia.

After a bit of trial and error, they stumbled on the use of electrophoresis, a brand-new technique at the time, that allowed for the separation of molecules based on their electrical charge.

The results of their experiments can be seen in the figure below.

They separated and quantified 4 sets of blood samples using Longsworth scanning diagrams. A) shows normal hemoglobin, B) is hemoglobin from a sickle cell patient, C) is hemoglobin from a patient with sickle cell trait, and D) is a mixture of A and B. The arrow denotes a point of reference for comparing the diagrams.

This work demonstrates that there is a molecular basis for sickle cell anemia and that changes to a gene can alter the structure of a protein.

In the case of sickle cell, this functional relationship extends further because in the 1950’s, the trait was shown to be protective of malaria.

This explains evolutionarily why this disease is found in individuals of African descent; however, it fueled an unfounded fear of 'black blood' throughout the early 1900's.

###

Pauling L et al. 1949. Sickle Cell Anemia, a Molecular Disease. Science. DOI: 10.1126/science.110.2865.543


Weekly Reading List

Skeletal editing: How close are we to true cut-and-paste chemistry?
Reactions that alter organic scaffolds by a single atom are already proving useful, but time will tell if they’ll fundamentally change how molecules are made
Florida Sunshine Genetics Act Becomes Law, Funding Newborn Genome Sequencing Pilot
Florida Gov. Ron DeSantis signed the Sunshine Genetics Act into law this week.
A class of benzofuranoindoline-bearing heptacyclic fungal RiPPs with anticancer activities - Nature Chemical Biology
Here the authors report asperigimycins, fungal ribosomally synthesized and post-translationally modified peptides with a heptacyclic scaffold. After chemically modifying them for nanomolar anticancer activity, CRISPR screening identifies SLC46A3 as a key transporter for their uptake in cells.
‘It’s a nightmare.’ U.S. funding cuts threaten academic science jobs at all levels
“There is a lot of pressure to essentially leave the country or not pursue research,” one Ph.D. student says
Prior Authorization Demo in Fee for Service Medicare !!
CMS has announced a regional, pilot program to test Prior Authorization of targeted services in Fee for Service Medicare. While none of th…
Inside the staff exodus and tanking morale that threaten Makary’s FDA
FDA staff are struggling and missing deadlines, and the future consequences could be even more dire.
Scalable emulation of protein equilibrium ensembles with generative deep learning
Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. We introduce BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single GPU. BioEmu integrates over 200 milliseconds of molecular dynamics (MD) simulations, static structures and experimental protein stabilities using novel training algorithms.
mRNA vaccines make an elephant-sized advance
New immunization shows success against a deadly herpesvirus threatening young, endangered elephants
What the VAF? A guide to the interpretation of variant allele fraction, percent mosaicism, and copy number in cancer - Molecular Cytogenetics
The evolution of techniques used to identify structural variants (SVs) and copy number variants (CNVs) in genomes have seen significant development in the last decade. With the growing use of more technologies including chromosomal microarray, genome sequencing and genome mapping in clinical cytogenetics laboratories, reporting the frequency of SVs and CNVs has increased the complexity of genomic results. In conventional testing (e.g. karyotype or FISH) individual cells are analyzed and abnormalities are reported at the single cell level directly as a proportion of the analyzed cells. Whereas for bulk genome assays structural and sequence changes are often reported as variant allele frequencies and fractional copy number states. The International System of Cytogenomic Nomenclature (ISCN) recommends converting these values into a “proportion of the sample”, which requires different calculations and underlying assumptions based on the data type. This review illustrates how the different methods of interpreting and reporting data are performed and identifies challenges in the conversion of these values to a proportion of the sample. We stress the need for careful interpretation of data with consideration for factors that may alter how proportions are reported including overlapping SVs and CNVs or regions with acquired homozygosity. We also demonstrate, using validation data of SVs and CNVs tested by multiple techniques how results are largely consistent across methodologies, but can show dramatic differences in rare circumstances. This review focuses on illustrating many of the challenges with aligning reporting using different techniques and their underlying assumptions. As hematologic disease classifications start to incorporate numeric limits (e.g. VAF defining thresholds), it is important for laboratory geneticists, pathologists and clinicians to appreciate the differences in methodologies, potential pitfalls and the nuances when comparing bulk genome analyses to the more conventional single cell techniques.
The Scientists Who Got Ghosted by the NIH
Under Trump, their grant applications disappeared. What now?
How Republicans sidelined the health care industry and pushed through historic Medicaid cuts
The passage of President Trump’s signature tax cut bill shows how the health care industry’s influence is waning.
RNA liquid biopsy via nanopore sequencing for novel biomarker discovery and cancer early detection
Liquid biopsies detect disease noninvasively by profiling cell-free nucleic acids that are secreted into the circulation. However, existing methods exhibit low sensitivity for detecting early stages of diseases such as cancer. Here we show that long-read nanopore sequencing of full-length cell-free RNA in plasma from healthy individuals, precancerous Barrett’s esophagus patients with high-grade dysplasia, or patients with esophageal adenocarcinoma reveals a diverse cell-free RNA transcriptome that can be leveraged for detecting and treating disease. We discovered 270,679 novel, intergenic cell-free RNAs, which we used to build a custom transcriptome reference for quantification, feature selection, and machine learning to accurately classify both precancer and cancer. Moreover, we found potential therapeutic targets, including metabolic, signaling, and immune checkpoint pathways, that are highly upregulated in both precancer and cancer patients. Our findings highlight the utility of our RNA liquid biopsy platform technology for discovering and targeting early stages of disease with molecular precision. ### Competing Interest Statement D.H.K., V.P., and A.H. are inventors on patent applications regarding RNA liquid biopsy platform technologies submitted by the Regents of the University of California. R.C.F. is named on patents related to Cytosponge and related assays which have been licensed by the Medical Research Council to Covidien GI Solutions (now Medtronic) and is a co-founder and share holder (<2%) of CYTED Ltd. D.H.K. is a founder, shareholder, and board member of LincRNA Bio and has received research support/reagents from Oxford Nanopore Technologies, Tempus AI, nRichDX, and Takara Bio, travel support from Oxford Nanopore Technologies and Tempus AI, and honoraria from Veracyte and Genentech/Roche.
A 37,000-Year Chronicle of What Once Ailed Us
In a new genetic study, scientists have charted the rise of 214 human diseases across ancient Europe and Asia.
Genome sequencing is critical for forecasting outcomes following congenital cardiac surgery - Nature Communications
The authors use artificial intelligence approaches to explore the predictive value of whole exome sequencing in forecasting clinical outcomes following surgery for congenital heart defects. Findings include that damaging genotypes in chromatin-modifying and cilia-related genes are associated with an increased risk of adverse post-operative outcomes such as mortality, cardiac arrest, and prolonged mechanical ventilation.
Quality of scientific papers questioned as academics ‘overwhelmed’ by the millions published
Widespread mockery of AI-generated rat with giant penis in one paper brings problem to public attention
AAP suing HHS over vaccine policy ‘because we believe children deserve better’
The AAP and other leading medical groups are suing Health and Human Services Secretary Robert F. Kennedy Jr. for making unilateral, unscientific changes to federal vaccine policy they say are an “assault on science, public health and evidence-based medicine.”
Man’s ghastly festering ulcer stumps doctors—until they cut out a wedge of flesh
The man made a full recovery, but this tale is not for the faint of heart.
China's GeneMind Eyes International Market With Fleet of Sequencing Platforms
Chinese sequencing technology company GeneMind Biosciences aims to go global with a fleet of sequencing platforms that it has launched in China over the last several years.
GREmLN: A Cellular Regulatory Network-Aware Transcriptomics Foundation Model
The ever-increasing availability of large-scale single-cell profiles presents an opportunity to develop foundation models to capture cell properties and behavior. However, standard language models such as transformers are best suited for sequentially structured data with well defined absolute or relative positional relationships, while single cell RNA data have orderless gene features. Molecular-interaction graphs, such as gene regulatory networks (GRN) or protein-protein interaction (PPI) networks, offer graph structure-based models that effectively encode both non-local gene-gene dependencies, as well as potential causal relationships. We introduce GREmLN ( G ene R egulatory Em bedding-based L arge N eural model), a foundation model that leverages graph signal processing to embed gene-regulatory network structure directly within its attention mechanism, producing biologically informed single cell specific gene embeddings. Our model faithfully captures transcriptomics landscapes and achieves superior performance relative to state-of-the-art baselines on both cell type annotation and graph structure understanding tasks. It offers a unified and interpretable framework for learning high-capacity foundational representations that capture complex, long-range regulatory depen-dencies from high-dimensional single-cell transcriptomic data. Moreover, the incorporation of graph-structured inductive biases enables more parameter-efficient architectures and accelerates training convergence. ### Competing Interest Statement The authors have declared no competing interest. NCI, R35 CA 197745 CZ Biohub, New York NIH, S10 OD012351, S10 OD021764, S10 OD032433
Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs - Nature Genetics
Classes of autism are uncovered with a generative mixture modeling approach leveraging matched phenotypic and genetic data from a large cohort, revealing different genetic programs underlying their phenotypic and clinical traits.
Proteins with cognition-associated structural changes in a rat model of aging exhibit reduced refolding capacity
Cognitive decline during aging represents a major societal burden, causing both personal and economic hardship in an increasingly aging population. Many studies have found that the proteostasis network, which functions to keep proteins properly folded, is impaired with age, suggesting that there may be many proteins that incur structural alterations with age. Here, we used limited proteolysis mass spectrometry, a structural proteomic method, to globally interrogate protein conformational changes in a rat model of cognitive aging.
Scientists hide messages in papers to game AI peer review
Some studies containing instructions in white text or small font — visible only to machines — will be withdrawn from preprint servers.
U.S. drops charges against doctor accused of throwing away Covid shots, selling fake vaccine cards
The doctor allegedly destroyed $28,000 of Covid shots and provided fake vaccine cards.
Foreign medical residents fill critical positions at US hospitals, but are running into visa issues
Some hospitals in the U.S. are without essential staff because international medical residents set to start their training this week were delayed by the Trump administration’s travel and visa restrictions.

Were you forwarded this newsletter?

LOVE IT.

If you liked what you read, consider signing up for your own subscription here:

Subscribe to Omic.ly