Virtual Screening’s Latest

This is a really interesting look at the current state of virtual screening, and it shows what I think it’s best at doing right now: making the best of of what we already know and what’s new. It’s from a large team of academic researchers (several departments across UCSF, Yale, UNC, Yonsei, Stanford, Duke, Washington U, Wisconsin, and Minas Gerais), and it’s an attempt to find new subtype-selective serotonin receptor ligands de novo. Well, sort of de novo, because one thing you have going for you in small-amine G-protein-coupled-receptor space is a huge amount of screening data and reported ligands. Now, that can be working against you, too, because you don’t want to recapitulate things that people have already discovered (and you might be interested in patentable chemical matter as well, who knows). So what’s the best strategy?

In this case, the plan was to start with some of the structures that we know tend to hit such receptors (basic amine heterocycles) but focus on lesser-used ones. Tetrahydropyridines are a good choice (piperdine with one double bond in the ring – there are plenty of natural products that have this structure in them, and we know it can be part of physiologically active compounds. But it’s not a ring system that’s well represented in screening libraries, certainly not as compared to piperidines, pyridines, and piperazines. So you’re starting in favorable chemical space, but you still have room to maneuver. The team looked into the most robust synthetic methods for producing these compounds and chose the condensation between an alkyne, an amine, and an unsaturated carbonyl synthon (done in two steps – first imine formation and then Rh-catalyzed reaction with the alkyne). That gives you plenty of commercially available starting materials and provides you with all the virtual screening you can possibly handle (as you’ll see). 

As a target, they chose 5-HT2A, which is famous as a site of action for psychedlic drugs such as LSD. But it’s widely believed that you can have ligands for that receptor subtype without bringing on hours of intense hallucinations (for example, by setting off different downstream signaling pathways), and such compounds might be quite valuable in treatment of depression and other psychiatric disorders. But it’s not easy finding these things; if it were, we’d have a pile of them already. Selectivity against 5-HT2BÂ is not trivial, and there’s the serotonin transporter to worry about, too, as is often the case with the whole 5-HT class. Plus, small molecular weight amine heterocycles like this can easily set off fireworks among the dopamine and muscarinic receptor subtypes, and more.

The library generation was done with achiral starting materials only, and things like nitro groups were excluded from the start. Allowances were made for TMS protecting groups on the alkynes and N-protecting groups on the amines, and the final compounds were all filtered to have a logP of 3.5 or lower. If you set the cutoff for MW at 400, the virtual library of tetrahydropyridines comes in at about four billion compounds, which is a bit much to handle even if you have really good funding (!), so they set that back to 350 to provide a 75 million member library. Docking started with a structure built from the X-ray data of LSD bound the 5-HT2BÂ (there were no 2A structures of the sort available at the time), and this was reality-checked by seeing if it could pull known ligands out of a test set and recapitulate what were believed to be key ligand-protein interactions. The 75 million compounds were docked, each in 92 conformations and approximately 23,000 orientations, which (you’re already doing this part in your head, I’ll bet) led to scoring of about 7.5 trillion complexes. Not a typo. That took close to 9 hours of grinding away on a 1000-core computing cluster, and we can assume that the air conditioning was turned up to full blast.

The top 300,000 molecules were reduced to about 15,000 structurally similar clusters, and the top 4,000 clusters were inspected for “unfavorable features” that docking programs are often willing to overlook, but medicinal chemists and structural biologists aren’t (weirdly strained ligands, dangling H-bond opportunities, and more). 205 of the best remaining ones were then filtered for dissimilarity to existing ligands. And that must have cleared out a lot of stuff, because it led to the synthesis of a mere 17 compounds. And testing those against a panel of the three 5-HT2 receptor subtypes led to four active compounds. 

Then it was time for good ol’ chemical optimization, because this isn’t one of those stories where the computational approach spit out ready-to-use ligands. All of the four starting points had activity against all three 5-HT2 subtypes, for starters. I won’t go into the detail of the structures that were worked through, except to note that two of the four starting points had pyrazole rings substituted onto the tetrahydropyridine. The team went back to the 4.3 billion compound list looking for other similar pyrazole-containing compounds, but nothing seemed to hit when these were synthesized. Trying it the other way around (keeping the tetrahydropyridines while looking for pyrazole replacements) worked out better, with azaindoles replacing the pyrazole hits in the final best compounds.

The two best ones were about 40 and 110 nM EC50 at the 5-HT2A receptor subtype, partial agonists that were about 5-fold selective against 5-HT2B and 30- to 50-fold against 5-HT2A. Encouragingly, neither compound hit in a panel of 318 other GPCRs, nor at the hERG ion channel, or at 45 other off-target possibilities. And these compounds were biased towards G-protein signaling as opposed to arrestin signaling (which is more characteristic of LSD and other hallucinogens). 

The team was able to get a cryo-EM structure of one of these ligands in the 5-HT2A receptor, which is good to see. The azaindole fit exactly as predicted, although the tetrahydropyridine ring was in a somewhat different conformation, interestingly. And both compounds had good CNS penetration in mouse dosing experiments, with strong indications (lack of head-twitch response) that they were not hallucinogenic. In fact, preadministration blocked the head-twitch response brought on by LSD itself and that compound’s tendency to cause rapid movement around the cage. They also tested one of the compounds for antidepressant activity in mouse assays and compared these to ketamine, but I have to say that I find rodent antidepressant assays generally unconvincing. (It looks like there wasn’t enough of the other compound to test by that point!) For what it’s worth, it showed signs of positive activity, as well as signs of anxiolytic effects.

The authors note several limitations, one of the first ones being that there are not that many really well-behaved multicomponent convergent synthetic routes to such compounds. So you’re not going to be able to rip out relevant billion-compounds libraries in every direction. They also point out that while they were able to predict the binding modes quite well, this took a lot of extra work past the dock-and-screen step, with computationally intense free-energy pertubation work needed for a real answer. And of course they make the distinction that these compounds are not yet drug candidates, although they are certainly interesting leads and ideas for tool compounds. And there are still off-target possibilities (God knows, there always are).

I would add some other considerations: even with the FEP work, the conformation of the tetrahydropyridine ring in the cryo-EM structure seems to have been a bit of a surprise. Their best compounds did not come directly out of the initial docking-screening-scoring-clustering work flow, but rather (as mentioned above) were the product of deliberately dipping back into the larger collection with an eye to what came out of the virtual screening cascade. (That’s pretty common in these approaches – virtual screening is still an art form). The selectivity for G-protein signaling versus beta-arrestin signaling appears to have been a fortunate result, because (to the best of my knowledge) there’s no good way to predict this or bias the screen for compounds in that direction.  Finally, working in the GPCR space really does give you a leg up compared to many other areas because of our extensive body of knowledge, but that’s why the authors chose it in the first place, because they realize the odds against them if they’d started in a more ex nihilo drug discovery space. But overall, this paper represents a lot of solid work and presents some really interesting results. I think it’s an excellent example of the current state of the art in reducing virtual screening to practice!