A Nobel for CRISPR

The 2020 Chemistry Nobel has gone to Jennifer Doudna and Emmanuelle Charpentier for the discovery of CRISPR. An award in this area has been expected for some time – it’s obviously worthy – so the main thing people have been waiting for is to see when it would happen and who would be on it. We’ll get to that second part, but let’s start off with a CRISPR explainer. What is it, how does it work and why has everyone been so sure that it’s a lock for a Nobel?

The short answer is that CRISPR is the easiest, most versatile method for gene editing that has yet been discovered. It’s important to note that those discoveries are still coming; the fireworks have not stopped going off by any means. We’ll do “basic CRISPR” first, though, since everything builds off of that. The story of the discovery is actually a very good illustration of how science works – the good, the bad, and the baffling – and with that in mind, I’m going to spend more time on that part.

The acronym is not going to help very much, I fear: it stands for Clustered Regularly Interspaced Short Palindromic Repeats, which refers to some odd features found in the DNA sequences of many single-celled organisms. In 1987 these were discovered by Yoshizumi Ishino and his group at Osaka. They were cloning a completely unrelated bacterial gene and found these weirdo repeated short stretches of DNA all clustered together (but still separated by unrelated sequences), and no one had seen anything quite like them. This clearly wasn’t an accidental feature, but no one really knew what to make of them, either. It wasn’t until 1993 that the story was really picked up again, when J. D. A. van Empden and co-workers in the Netherlands were looking at repetitive DNA sequences as a way to tell different strains of M. tuberculosis bacteria apart, and noted the same sort of odd patterns.

That same year, Francisco Mojica (a grad student at the time) and co-workers at Alicante in Spain reported the same sort of thing in a really unrelated organism, Haloferax mediterranei. That’s an Archaea, the weird non-bacteria single-cell creatures that are found in many extreme environments, and Mojica was looking at gene transcription changes in that organism under various high-salt conditions (H. mediterranei is the sort of creature that gets stressed out when the salt concentration gets too low, to give you the idea). He was actually doing that (as this retrospective notes!) because he got totally scooped on his original project with the organism and then set out for the less-studied parts of its genome. Out there in what looked like a non-coding region, he found these same sort DNA repeats, 14 of them, each about 30 base pairs long and regularly spaced along in the organism’s genome. Bacterial sequencing was pretty strenuous work back in 1992, and these things were first thought to be an artifact of something that had gone wrong. But no, they were real. It was starting to become clear that this stuff (whatever it was) might have broader implications, but no one knew what those were.

Mojica kept digging into the problem: here’s a 2002 note that showed that these features (which he and his co-authors were then calling SRSRs, for Short Regularly Spaced Repeats) were found in dozens of bacteria and archaea and were probably the most widely distributed repeat sequences in prokaryotes in general. When they transcribed such a repeat region, they got an oddly wide array of proteins, which suggested that there was a lot of RNA processing going on downstream (confirmed by this 2002 work from another team on a different archaeon species).

Meanwhile, Ruud. Jansen at Utrecht, along with van Empden and others, worked out another piece of the puzzle. These repeats had been noted next to the open reading frames (ORFs) of proteins of unknown function, and they found that these were in fact always associated with them across different organisms. That paper suggested the “CRISPR” acronym to clear up the number of different terms that were appearing in the literature, and it stuck, as did the term for those “CRISPR-associated” genes: cas. But the origin and function of the repeats and their associated proteins were still a mystery. All of these were still confined to rather specialized microbiology journals and it was Just One of Those Things that no one had a handle on.

That changed in 2005. Three groups (including Mojica’s) found that those spacers between the repeat elements actually came (in some cases) from bacteriophage sequences or plasmids from other organisms. That rearranged people’s thinking, because it suggested that these weirdo repeat things were somehow involved in infection and defense mechanisms for the bacteria and Archaea themselves. But there were some hair-pulling difficulties along the way to getting the word out, because the discovery itself had been made some time before. Eric Lander’s history of the field (an article not without controversies of its own) illustrates what happened:

Mojica went out to celebrate with colleagues over cognac and returned the next morning to draft a paper. So began an 18-month odyssey of frustration. Recognizing the importance of the discovery, Mojica sent the paper to Nature. In November 2003, the journal rejected the paper without seeking external review; inexplicably, the editor claimed the key idea was already known. In January 2004, the Proceedings of the National Academy of Sciences decided that the paper lacked sufficient “novelty and importance” to justify sending it out to review. Molecular Microbiology and Nucleic Acid Research rejected the paper in turn. By now desperate and afraid of being scooped, Mojica sent the paper to Journal of Molecular Evolution. After 12 more months of review and revision, the paper reporting CRISPR’s likely function finally appeared on February 1, 2005. 

Here’s Mojica’s view on the history, for reference. This sort of thing has happened many a time in the history of science, in case you had any doubts. The other two groups trying to publish such results ran into similar difficulties: Gilles Vergnaud and his co-workers had their paper rejected by four journals in a row, and Alexander Bolotin’s paper lost months with a slow rejection as well.. But by 2007, this idea had been nailed down: if you challenged bacteria with various types of virus (phage), repeats showed up in their genomes with spacers between them based on chunks of that phage DNA. And in turn, if you went in and messed with those repeats and spacers, you altered the resistance profile of the bacterium to different phage infections. There was no doubt: this was part of an adaptive immune system in bacteria and Archaea.

And it was one that actually rewrote their genomes in order to work – that was the startling thing. There was some sort of mechanism that chopped up bacteriophage DNA and inserted pieces of it into the bacterial sequence in order to remember it for the next time it might show up. People had thought originally that such a system might work at the RNA level, but here it was operating on the DNA sequence instead. The number of papers in the field was taking off at this point – there was something new under the sun, and the uses for a completely new genome-editing tool were becoming apparent to anyone who spent a few minutes looking out the window and thinking about the possibilities.

Some of those cas proteins, in fact, turned out to be the endonucleases that did the double-stranded DNA cutting needed for these splices to occur. They needed more than one RNA species to guide them in that job, but Jennifer Doudna and Emmanuelle Charpentier re-engineered one (Cas9) from the bacterium S. pyogenes to make it more simple. Now it just needed one “guide RNA”, the sequence of which determined where in the genome the DNA would be cut. At the same time, Virginijus Šikšnys in Vilnius was working out the same sorts of details. He submitted his own paper to Cell, but it was rejected without review (!) It then spent months in review at PNAS, during which time Doudna and Charpentier’s work appeared in Science, and many are the people who will tell you that had preprint servers been more of a thing back then that his name might be on today’s Nobel citation as well.

This refashioning of the CRISPR system was very significant. The native bacterial system works, but it’s quite complex. The Doudna/Charpentier work opened up its use as a research tool, and molecular biology has never been the same. So there’s a story about the recognition of something strange in bacteria, a story about figuring out what that was and how it might work, and then a story about extending that into something new that could be used in other organisms entirely.

But there’s more – there’s generally more. The first people to get this to work in mammalian and indeed human cells (as opposed to bacteria and the like) were (in basically simultaneous publications) Feng Zhang and co-workers at the Broad Institute and George Church and co-workers at Harvard. Neither of them is on today’s citation, either, of course, and not everyone is happy about that, either (especially considering that there was a third slot left open). But that slot could also have been filled by Mojica, by Šikšnys, or by. . .well, you pick. Making the jump out of bacterial systems was non-trivial, and there were plenty of people who weren’t sure that it would even be possible. A human genome is a lot more complicated than a bacterial one, and its DNA is packaged and sequestered in totally different ways. Getting a bacterial enzyme into the right place in a human cell nucleus at the right time and in the right concentrations took a lot of work – just getting Cas9 and a guide RNA reliably into cells in the first place took a lot of work. But in the end, it could be done.

All these discoveries lead one to thoughts on the patent situation in this area, but it is too complex for human summary. I mean that nearly literally. In various jurisdictions, there are filings from multiple different institutions and companies (up to maybe eight at once) all fighting it out over the claim language and scope, and I just refuse to try to sort it out in my head. To add to the confusion, new variations and improvements on the technique are emerging constantly, leading a person to wonder what the eventual most valuable IP rights may turn out to be. There have already been a lot of twists and reversals in this area, but I have deliberately decided not to give it space in my head, for fear of crowding out something else.

But the reasons that such patent rights are so valuable, and that this area was Nobel-worthy to start with, are perfectly clear. CRISPR is the slickest way yet found to edit genome sequences in living organisms. You can silence particular genes, you can increase their expression, you can stick completely new things in pretty much wherever you feel like it. You can (in later variations) swap individual nucleotides around with extreme selectivity, and so on – it’s really like having magic powers. There are a lot of different Cas enzymes, and some of them do double-strand DNA breaking, while others do single-strand nicking and all sorts of things. They work with variation degrees of efficiency, selectivity, and fidelity, and the hunt is still very much on for improved versions.

So like any other molecular biology technique, CRISPR also has its hidden limitations that are still being worked out. That’s something to keep in mind when you hear about CRISPR babies, such as the hideously unethical human experiments in China in 2018. We are already trying to use CRISPR techniques to attack inborn genetic diseases such as sickle cell anemia, but think about that one: all those defective red blood cells come from a single tissue (the bone marrow) and we have already worked out techniques to transplant it (and, along the way, to kill off the original tissue in preparation for the new cells). That means that we have a much better chance of doing a clean swap, with cells that have been edited ex vivo and carefully sequenced to make sure that they’re what we think they are – and this on a disease whose genetic profile has been exhaustively studied for decades (and indeed was the first genetic disease ever characterized). The difference between that and stepping in to rewrite human embryos is huge, and we’re not going to be safely leaping that gap for a while yet.

That’s not least because in many cases we’re not sure what to rewrite. Inborn protein errors like sickle cell are clearly the place to start, but in many (most) cases the instructions are not quite so clear. Then you think about the genetic basis for (say) height, and it’s time to look out the window again. No, it’s going to be a while before we start cranking out the designer babies to order.

But where we’re using CRISPR every single day of the week is back in the research labs. It’s an astonishingly useful tool for producing new cell lines and looking at the phenotypes in organisms when you do such selective editing – you can accomplish things that were nearly (or completely) impossible. These new abilities have accelerated molecular biology noticeably, and it’s not like the field was lounging around much before. No, it’s hard to overstate the importance of CRISPR to basic research, and that’s where the clinical breakthroughs are born.

This was, then, one of those fields that has been recognized for years now as Definitely Going to Get a Nobel, No Doubt About It. And that day has come! Congratulations to Jennifer Doudna and Emmanuelle Charpentier, who are very deserving indeed and part of one of the great discoveries of 21st century biology so far.