Thread Your Proteins Through

One of the big advances in genetic sequencing over the years has been the refinement of nanopore technologies. Back in the Ancient Days, you sequenced more of a bulk DNA or RNA sample, and what you got was the majority opinion on what the sequence read out as. Nanopores go after single individual oligonucleotide strands, and read off the sequence as they pass through the channel. There are several ways to do this; the most developed (from what I can see) are the techniques that do a sort of single-molecule electrophoresis. An electric potential is applied across a nanoporated membrane, and local changes in current density are measured as the oligonucleotide strand moves through it. This calls for getting several things to work simultaneously – the speed at which the sample threads through versus your ability to measure the current changes, your ability to interpret those for each base in the sequence, the stability of the pore itself and the stability of the oligo chain under these conditions.

All of these have changed substantially in just the past few years, and are continuing to evolve. Solving these problems gives you the ability to do “long reads” of DNA or RNA sequences, as opposed to reading off the short pieces that other sequencing methods tend to rely on. Current technologies average in the low tens of thousands of bases per read, with the record being up in the low million range. And that gives you a richer data set to work with, as long as you pay careful attention to error correction. The “next generation sequencing” tools that have been the standard for some years now have better accuracy, and the nanopore systems have been trying relentlessly to creep up on them.

How about doing this for proteins, then? Plenty of people have been working on that, and there are a couple of new papers that illustrate how that’s going. Try this one out: it details a rather startling assembly of proteins from several different sources to produce a functional protein sequence nanopore. That red part poking through the membrane in the (c) part of the drawing is from anthrax protective antigen. That’s mated with the REG part, a mouse proteasome activator protein (PA28 alpha). Proteasome alpha- and beta-subunits from T. acidophilium are then brought in, and up at the top of the whole contraption is an “unfoldase” enzyme from the same archaeon organism, which feeds proteins in as a single strand.

I am amazed that this works, and you can be sure that a lot of fiddling was required to put all this together. Keep in mind, though, that no one is designing any of these proteins from scratch. We don’t know how to make our own unfoldases or anything of the kind. What’s going on here is in the best traditions of molecular and chemical biology: using the amazing tools provided by a couple of billion years of evolution, breaking them off from their moorings, and gluing them together in new ways to serve our own purposes. I mean, these things are just sitting there, waiting for our bright ideas, right?

And it does work, albeit roughly. For their “thread and read” approach, the authors fed in a model 123-amino-acid protein (with the correct C-terminal tag for the system to recognize it), one that had residue repeats selected to make it likely to move through the channel and interact with the proteasome subunits.  They also tried a variety of everyone’s favorite Green Fluorescent Protein, which is a challenge, since those things tend to be pretty resistant to unfolding. In each case, they inactivated the key proteasome residues so that they wouldn’t rip the proteins up as they passed. These both threaded through, albeit with different behaviors. The GFP, for example, seemed to want to partially refold as it went through, while the model protein didn’t show as much evidence of that behavior. Meanwhile, the model protein was at times a bit too happy in the proteasome region, and blocked things up. All of these behaviors could be modified by feeding the unfoldase more ATP, changing the voltage across the system, or adding urea to assist with unfolding. But it has to be noted that the current density reads, while real, were still too noisy to distinguish individual amino acids.

They also looked at another possible sequencing mode, the so-called “chop and drop”. WIth that one, you use a functional proteasome that will chop the peptide strands up into smaller pieces as they come through. Under those conditions, the big blockage events in the proteasome largely disappeared (as well they might), and the size of the peptides coming out varied along with the amount of ATP driving the unfoldase – the slower the proteins went through the shredder, the smaller the pieces. The hope is that these can be used in (for example) mass spectrometry protein ID methods, which already rely on breaking proteins into chunks.

And then there’s this approach. The authors decide to try to use the existing DNA nanopore technology and try to adapt it to proteins, by using a “click” linker to attach a protein strand to one end of a DNA strand, to see what the nanopore will make of things when the protein part starts threading through.  You can see a schematic of this at right – DNA is on top, peptide below, and the click linker is in green. The transmembrane nanopore is MspA, a porin protein from mycobacteria that’s already used in commercial DNA nanopore equipment, and the light blue blob is the DNA-binding helicase protein Hel308.

This idea shows promise, too: the 80-nucleotide DNA part went through just fine, producing ion-current steps with each residue as it should. When the 26-amino acid peptide hit the pore, there were distinct current steps as well, with even higher average ion current. Multiple reads were done to try to see if things were occasionally back-stepping back up at the helicase ratchet motor, etc., along with sequence alterations of the peptide chain. After getting their bearings, the team could distinguish single-amino-acid changes between glutamate, glycine, and tryptophan. As is often the case with nanopore experiments, such point mutations don’t just alter one current step in the data, but change the ones just upstream and downstream of it as well, so you have to really see what you’re working with in defined systems at first.

And by ingeniously using two helicases instead of one, the paper demonstrates re-sequencing of the same peptide. You let the first helicase do its thing , then dissociate it and let the second one settle in (it would be located “above” that blue Hel308 in the scheme). That sends the protein back down through the narrow nanopore, and the second helicase now pulls it back up for a repeat sequence. This “rewind” is very useful for error correction, as you would imagine – the average was about 17 amino acid residues that would get resequenced. Turning this into a mighty high-speed sequencer is still going to take a lot of refinements, but the principle is demonstrated very convincingly here. And there are applications where you might just need to “fingerprint” different proteins rather than read them off completely de novo, and this system might well be ready to try out in the MspA-containing nanopore equipment right now.

So there’s definitely progress here. One way or another, protein nanopore sequencing looks like it can happen, which pretty much means that it will happen, because the potential uses are very interesting indeed. It’ll be a few years, but we’ll have more options, more protein data, and more experiments to run that we can’t even think about running now – that much seems clear. Bring on the nanopores!