SciBite’s James Malone Talks Future of Biocuration on Conference Panel

Biocurators are among the unsung heroes of the life sciences world, performing the essential task of translating and integrating biomedical information into interoperable databases. Of course, there is a lot more to it than that, as attendees of the International Society for Biocuration’s 14th Annual Biocuration Conference already know.

At the first
session of this year’s conference, held virtually on April 13 and chaired by
Genentech’s Rama Balakrishnan, panelists had a lively discussion on “The Future
of Biocuration,” but started by sharing some of their feelings about what the
job itself entails. “When I think about it, it’s applying semantic standards to
ensure data findability and aggregation,” explained Carol Bult of the Jackson
Laboratory in Bar Harbor, Maine. Kambiz Karimi of Myriad Women’s Health in San
Francisco talked about how it involves structuring content and using a
controlled vocabulary, also noting that “cleaning up” the content is a
significant part of the effort.

Maintaining
data quality

The clean-up aspect of data curation connects to the issue of data quality. Balakrishnan posed the question: What does good quality data mean, and what are the key metrics to ensure that quality? For James Malone, CTO of the semantic AI company SciBite (acquired by Elsevier last year, bringing together the ability to deliver applied AI in a scalable, structured and repeatable way), the answer was “Testing, testing, testing.” He explained that SciBite uses comprehensive, gold-standard tests, and also pointed out that the end product should be “in the right form so you can consume it and use it”.

Bult also
clarified that “Data quality and annotation accuracy are two different things”
and are approached with different processes. Some of the panelists described
methods of ensuring accuracy that are very hands-on, including directly
contacting authors, publishers and laboratories to fact check information.

On the related
matter of quality control, Karimi explained that his company, which specializes
in genetics testing and personalized medicine, has a peer review process and
has 30 curators on their team to ensure rigorous checking and double-checking
for errors and omissions, plus some automated processes.

AI as
a helper, not a replacement

When thinking of
the future of biocuration, AI and machine learning loom large. “It will enhance
our work by making some bottom-level decisions for us,” suggested Sandra
Orchard of the European Bioinformatics Institute (EBI), who doesn’t envision
machine learning replacing manual curation. Although she certainly can imagine
it becoming increasingly important as ML becomes more powerful, she thinks
papers are going to continue requiring human interpretation to understand what
the human who wrote it meant.

Malone said that
at SciBite they are putting a lot of effort into developing techniques and
building models, and that those models can work quite well—but there is still a
lot of work to do to really start exploiting deep learning advances. He does
see a future in training AI models to help with curation, but predicts that,
“It will become an assistant; it will not replace subject matter experts.”  

Shaping
the perception of biocuration

Carol Bult flagged
one particular danger of AI and ML, which is a broad misunderstanding about
what they can actually do. “Trying to get funding for biocuration is
challenging because of the perception that ML can do most of it. We’re working
on the technology, but it’s not going to replace biocurators.” She feels that biocurators
need to tackle this mis-perception and articulate a framework of how AI and
biocuration go hand in hand. 

Importantly,
James Malone highlighted the link between data science and curation, noting
that much of a data scientist’s work is data wrangling and cleaning data, and
so a big chunk of what they do is, essentially, curation. Bult argued that
biocurators need to make sure people realize how important their discipline is
to data science, because the value of data science is already recognized across
industries.

“If we frame
biocuration in the context of data science, I think that will help,” she said. “We
have to get better at explaining what the ROI is. What can you do—because data
are quality controlled and curated—that you wouldn’t be able to do if it wasn’t
curated? We have to do a better job of explaining and telling the stories.”

Malone believes
the prevalence of AI will actually shine a light on the value of curation. The
more commonplace that AI becomes, and the more it and approaches for data lakes
and knowledge graphs and so on are at the forefront of decision maker’s mind,
the more they will appreciate the value of well-labelled data. After all, he
says, your models are only as good as your data, and the same is true for data
lakes and knowledge graphs—proving that biocuration has never been more
relevant.

Learn more about upcoming sessions of the Biocuration Conference, and read James Malone’s blog post on why curators are the heroes of bioinformatics.

Discover more about SciBiteAI