The Proteome: It's a bit more complex than any of us would like

Proteoforms: This is where things start to get complicated.

The Proteome: It's a bit more complex than any of us would like

You might be asking yourself, “WTF is a proteoform?”

The term ‘proteoform’ was first coined in 2013 by Lloyd Smith and Neil Kelleher.

It’s used to describe all of the different ‘forms’ of proteins that are created through the process of gene expression.

Our genomes contain information to code for approximately 20,000 genes.

That sounds like a big number, right?

Except, before we sequenced the human genome, we estimated that there should be about 100,000 genes to perform all of the complex functions that we see within our cells and tissues.


So, how do we get 100,000 genes worth of effort out of 20,000 actual genes?

1) Alleles:

Our cells contain two copies of our genomes. One from our mom, one from our dad.

The differences between these two copies are referred to as alleles and they can produce slightly different proteins depending on the variants contained in each allele.

2) Alternative Splicing, Start, and Stop Sites:

Each gene is composed of exons and introns.

During the process of DNA transcription, the exons are connected together (spliced) to create messenger RNA and the introns are thrown away.

However, ‘alternative splicing,’ or the combination of different exons from the same gene, can create new isoforms.

This leads to the production of multiple different proteins from the same gene!

Over 90% of RNAs undergo alternative splicing and about 80% of genes have a minor isoform that represents 15% or more of expression from that gene!

Additional combinations of exons can be made by starting or stopping transcripts at different places which increases the combinatorial options even further!

3) Post-Translational Modifications (PTMs):

The process of converting that RNA message into protein is called Translation.

Once a protein chain is created it folds into its final structure and goes off to perform whatever function it needs to do in the cell.

God, I wish it was that simple...

Proteins are often post-translationally modified to control how they function, when they function, where they function or how well they function.

The most common of these modifications include:


But this is by no means an exhaustive list and proteins can be modified with multiple different combinations of these (and other) PTMs.

‘Got it, but I'm still not clear on WTF a proteoform is?’

It is each unique version of a protein taking into consideration differences in a protein sequence, alternative splicing/Start/Stop and PTMs.

Theoretically, there are trillions of proteoforms for each gene (mostly due to all of the possible PTMs!).

But, practically (and in the proteins we’ve looked at closely), it’s in the tens to hundreds of proteoforms per gene.

This makes sense, because only the functional ones are going to be useful to a cell.