Why These Amino Acids, Anyway?

Here’s a new paper that tries to answer the sort of question that anyone who thinks about the evolution of living systems has asked: why do we have the biochemistry that we have? And by “we”, I mean “every single living thing on Earth”. You can draw all sorts of phylogenetic trees with different arrangement of entire kingdoms of living creatures, and these are very valuable schemes. But when you get down to fundamentals, every creature on earth – archaeans, fungi, segmented worms, spirochetes, slime molds, corals, crustaceans, jellyfish, salamanders, pillbugs, liverworts, lemurs, the lot – all share the same basic chemistry, and we seem to have shared it for the duration of life on earth. As least as far as we can tell. We have the canonical amino acids, carbohydrates, and nucleotide bases, and we use them in the same ways. So where did that setup come from, and could it be otherwise?

In this new paper, the authors look at some of the “unused” prebiotic amino acids (and I’ll have some recently published work that bears on those to discuss later on this week!) There are plenty of simple amino acids and other organic compounds that you can find in meteorites and in many other astronomical sources, indicating that these compounds form abiotically as the simple chemical compounds of the universe are mixed and heated. You can do that experiment yourself, and starting with the Miller-Urey electric-discharge work, we’ve found that all sorts of plausible abiotic soups can produce biomolecules, including amino acids, under many different conditions (although not all!) But we Earthlings only use about twenty of those amino acids, with many others that look superficially reasonable that have ended up totally ignored in protein construction. There’s also a huge overarching question of why we only use the “handedness” (chirality) that we do with these molecules as well, which has inspired a corresponding huge amount of speculation that is nowhere near coming to a conclusion.

But that set of building blocks is probably even more simple than it looks. As the paper notes, there’s increasing evidence that the earlier set of amino acids could have had as few as ten members, with the other ten coming on later as the result of biosynthetic pathways once life had already taken off.  It’s an interesting set, because as far as we can tell, it didn’t include any positively charged ones, for one thing, and it included branched compounds like valine, leucine, and isoleucine, but their very similar unbranched alternatives such as aminobutyric acid, norvaline, and norleucine. Those are apparently much more abundant in prebiotic mixtures than stuff like valine. So why those ten?

In the paper, a huge number of 25-mer test peptides are prepared as a series of combinatorial libraries through solid-phase synthesis. There’s a set of all the modern canonical ones (minus Cys for simplicity), which is designated 19F (F for full, not for fluorine, confusingly), a set of the 10 early ones (10E), a set with those ten but with the branched ones substituted by the closely related unbranched ones (10U), and set swith the ten early ones plus Arg (as a modern positively-charged addition, 11R) or diaminobutyric acid (as an available ancient one that didn’t get picked up, 11D), or the ten early ones plus Tyr (as a modern aromatic amino acid that also didn’t appear back in the day, 11Y). The resulting libraries were assayed for solubility at different pH and different ionic strengths, and then with the addition of trifluoroethanol, which is a known trick to induce folding and secondary structure.

The results are striking. The ten-unbranched library, for example, was very soluble indeed under pretty much all conditions. But that solubility is paid for by a distinct lack of secondary structure. The likely explanation is that the side chains have too many degrees of freedom, and settling down into a defined structure is too entropically costly. The 11D library with the floppy diaminobutyric acid shows very similar behavior (high solubility, low secondary structure). Meanwhile, adding in arginine (11R) or tyrosine (11Y) increases the amount of secondary structure as compared to the original 10E set. But the 19F (modern) set has the worse solubility but the highest propensity to form structured proteins (some of which are clearly so damn structured that they start coming out of solution).

As for positively charged residues, the hypothesis is that the presence of these along with things like aspartic and glutamic acid (part of the early ten) will naturally lead to a lot of salt-bridge formation, which could be hard to square with the hydrophobic residues of the other members of the set. Longer-chain positively charged residues (like the Lys and Arg we have today) may well be able to salt-bridge in more useful ways, whereas the prebiotic ones were either too floppy (diaminobutyric acid) or too short, and were avoided entirely. Lys and Arg came in later via evolved biosynthetic pathways and landed in a protein world that already had its main features coming together.

So the authors propose that “foldability” may well have been the evolutionary constraint that led us to the amino acids that we have today, both at the “primordial ten” stage and in the additions to the library in the eons since then. Now, it’s true that there are disordered proteins that are extremely important in cell biology as we know it today, but this work makes it seem as if those are rather carefully titrated into an intrinsically more orderly protein world. They are the hot sauce of the current protein world, but you cannot make a usable stew out of nothing but umpteen types of hot sauce, can you?