The Other Guys

Writing the other day about the lipid formulations used in the current mRNA vaccines makes me want to highlight something else that I hit on from time to time around here. When you learn in school about the major classes of biomolecules, you hear about proteins, lipids, carbohydrates, and nucleic acids. That’s a reasonable classification, but you hear a lot more about the first and last items on that list than you ever hear about the middle two.

That’s understandable, because after all, nucleic acids (as DNA) code (through the intermediacy of RNA) for proteins. And it’s those proteins that are involved in either synthesizing the lipids or carbohydrates (through a long list of enzymatic pathways) or bringing them in from our diets, as transporter proteins and the like. So proteins really do take the lead, but those other two categories are still their own worlds. Lipids, of course, make up the cellular membranes that surround (that have to surround) bacteria, archaea, and every single cell with a nucleus. And they’re used on a large scale for things like myelin sheaths around nerve tissue, and on a molecular scale as energy feedstocks (two carbons at a time, being peeled off all those fatty acids) and starting materials for the synthesis of a long list of hormones and cytokines. The carbohydrates are of course also at the center of that cellular metabolism energy landscape, with glucose in the Krebs cycle cranking out the common currency of ATP (itself a carbohydrate derivative). But they also decorate the surfaces of nearly every protein in the body, and are intimately involved in regulating their localization and function. The surfaces of entire cells are no different – the immune system’s recognition of self depends on complex carbohydrates. And I haven’t even mention how ribose and 2-deoxyribose are crucial to the structures of all those nucleic acids.

No, neither of these compounds classes are exactly junior partners. But they don’t get the attention that the others do, and that goes for the laboratories as well as the popular imagination. The biggest problem is that both lipids and carbohydrates are just intrinsically harder to work with than are proteins and nucleic acids. Both of those biopolymers are hooked up (for the most part) in repetitive ways from a rather small list of common building blocks. That’s given us the chance both to work out ways to read off those sequences and to build machines that will synthesize them for us. Those either co-opt the exquisite cellular machinery (which we’ve adapted to work in vitro) or use automated organic chemistry reactions that we’ve come up with ourselves. The list of Nobels and other awards for all the discoveries in those fields is a very long one indeed, as it should be.

But look at carbohydrates. Those also form long polymeric structures, and the list of building blocks is no shorter than it is for the proteins. But the ways in which they can be linked – now, that’s a headache. Good ol’ glucose has five OH groups (counting the anomeric center at the first carbon) that can be involved in such linkages. In addition, that anomeric center can be “up” or “down” (alpha and beta glycosides), and glucose can tie itself into either a six-membered ring or a five-membered one, depending on conditions.

Now run through the list of all the six-carbon and five-carbon sugars (and the smaller carbohydrates too, can’t forget those), the various oxidation state changes like the sugar alcohols and acids, the keto sugars, and the important variants where one or more of those OH groups they’re all decorated with is missing or replaced with an amine and. . .well, you’re looking at a lot of compounds that can be fitted together into an insanely large number of different compounds. The number of possible proteins that can be assembled out of twenty amino acids gets out of hand pretty quickly, but it’s nothing compared to how fast the number of possible complex carbohydrates will take off on you.

And as you’d figure, the chemistry needed to make those things is all over the place, too. With proteins, it’s just amide bond, amide bond, amide bond in an endless series. Don’t get me wrong, that’s bad enough, what with the reactive side chains and the chances of messing up the single chiral center if you use synthetic conditions that are too severe. But even just putting a single substituent at C1 of a given sugar, glycoside formation. . .well, there are entire books full of conditions for the Koenigs-Knorr reaction and Fischer glycosidation, and those are the two simplest and most direct techniques we have. There’s been a lot of work (and a lot of progress) over the years, but there is still no general set-it-and-forget-it carbohydrate synthesis machine, the way that there is for proteins and for nucleic acids. The number of different chemistries (and the number of ways in which they can go wrong) is still too big a problem. That goes for sequencing them back down, too: if you isolate some new complex carbohydrate and want to figure out its composition, you do not pop it into some sort of sequencer and go to lunch. No, you settle down for weeks, months (possibly years) of work. The problem is exacerbated by the huge molecular weight of some of the biological polysaccharides. And as for three-dimensional structures, from what I can see the field is just barely lifting itself up off the ground compared to what we know about protein and oligonucleotide tertiary structure.

Lipids suffer from many of the same problems. They don’t have as much of the multiple-sites-of-attachment thing going, until you consider that large important classes of them are hooked up to (you guessed it) the OH groups of various carbohydrates. So that plunges you right back into the structure determination problems and the synthetic problems we were just talking about. But another tricky part of lipids is that they can be just beastly to handle and characterize via the usual tools. We have a lot of chromatographic tricks to separate out polar compounds (and that includes the proteins and the carbohydrates). But making fine distinctions between different varieties of, well, grease is something else again. The physical interactions are just not as well-defined as you have with (say) hydrogen bonds and charged groups, so you end up chromatographically with broader peaks and worse resolution. Techniques like NMR suffer as well: the long carbon chains of many lipid molecules can be hard to distinguish and assign. Steroid backbones, now those are well worked out and you can find your way around them. But a 22-carbon chain with a double bond somewhere in the middle of it looks an awful lot like a 24-carbon chain with that double bond moved a couple of spaces down, and that goes for NMR, for HPLC, and many other techniques.

It doesn’t necessarily look so similar to your body, though. Think about the state of human nutritional advice, with all the conflicting evidence about the desirability of various saturated, unsaturated, monounsaturated, and polyunsaturated fats. Biochemically, our cells make a lot of fine distinctions between structures that we ourselves have to stare at closely to see any differences at all. The mention of nutrition is to emphasize that we don’t understand a lot of these biochemical effects very well, at either the micro or the macro level.

There are drugs that target lipid and carbohydrate pathways, to be sure, but these are generally small-molecular inhibitors of enzymes that process the compounds. Making compounds that bind to protein sites like those is something we’ve learned a lot about of the years, but compounds that bind directly to complex carbohydrates or to lipids are far more uncommon.

So we have a bit of a blind spot when it comes to both classes of compounds. That blindness is in no small part technological, because it’s been easier to develop tools to analyze, modify, and synthesize proteins and nucleic acids. But we shouldn’t let that make us think that lipids and carbohydrates are somehow ancillary, just because we can’t manipulate them as well. That’s our problem – not theirs.