Pharmaceutical Market Europe • June 2023 • 16-17

DIGITAL LIFE SCIENCES

Using knowledge graphs to discover powerful new insights

Looking at how multidimensional knowledge graphs work and their potential within the life sciences industry

By Alexander Jarasch

Image

Whether via new standards adoption or the digital transformation of regulatory processes, data-based ambitions have featured heavily on the pharma agenda in recent years. But it could be knowledge-based graphs and their ability to draw advanced insights from complex data correlations that yield the real benefits.

Pharma’s appetite for good data, and the ability to do more with it, is growing at an accelerating pace. If necessity is the mother of creative thinking, then the pandemic and continuing supply issues, on top of a large-scale shift to next-generation therapies, are causing life sciences players to think laterally about how they can make more of what they know. And indeed of what they might know, if only they could combine and distil meaningful insights from the diverse yet interconnected information sitting around their organisations.

It’s in this context that the life sciences industry is waking up to knowledge graphs in a big way. While work continues in earnest in standardising and fine-tuning the quality of everyday data, knowledge graphs are providing a means for companies to convert the rich but highly unstructured information they already have into actionable intelligence on a vast scale. The potential spans the entire drug development and marketing life cycle, too.

Image

Taking data to another dimension

So what are knowledge graphs, and why the excitement? The technology first entered mainstream consciousness a few years ago, when the technology was harnessed to crack the Panama Papers scandal. Knowledge graphs are multi-dimensional and work on the basis that every data set is a connected element. It is in the power of those complex interconnections of different data sources that the breakthrough insights lie. Traditional SQL databases can’t represent that complexity or make those connections, because they were not set up to do that.

In the context of biological science, information about a disease is inextricably linked to information about genes, the environment, a person’s diet and behaviour, and so on.  The more these interrelationships and correlations can be analysed, the richer the knowledge and the faster important deductions can be made. Modern native graph databases have made mass-scale cross-comparisons – involving billions of connections – more viable, changing the game in complex fields such as medicine.

The potential is still largely untapped. Existing use cases in life sciences only scratch the surface of what’s possible. Currently, nine of the top ten pharma companies are using knowledge graphs, but in the majority of cases the technology represents one in a hundred of their databases. While organisations often feel held back by their legacy data, the reality is that they could be doing a great deal more with it – and knowledge graphs hold the key.

To date, the three main use cases have been:

  1. Identifying novel drug targets for new therapies
  2. Transforming clinical trials
  3. Managing supply chains more dynamically.

Once companies provide for greater interconnectivity between data from the outset, they’ll be able to supplement these areas and approach high throughput screening/compound registrations in smarter and more efficient ways.

Image

Harnessing heterogenous, unstructured data

In December 2022, Neo4j hosted an event exploring the wider possibilities of knowledge graphs in life sciences, and participants were enthralled by some of the emerging use cases. These ranged from drug discovery management to AstraZeneca’s application of the technology with chemical reactions, to predict new reactions and molecular synthesis – with the potential to possibly circumvent or attack existing patents.

Beyond AstraZeneca’s emerging work with reaction and synthesis prediction in the drug discovery process, and my own presentation detailing progress in classifying diabetes patients – by combining patient data with a Graph Data Science library and out-of-the-box machine learning – ‘data innovators’ at GlaxoSmithKline explored the scope for knowledge graphs to improve clinical reporting workflows. Applying the technology here is helping to overcome the traditional problems of labour intensity, multiple hand offs and all manner of necessary data transformations, by way of a single, contextualised, knowledge graph.

What’s especially enticing about knowledge graphs is that they aren’t dependent on data sources having been prepared or formatted in a particular way (data schema). They can work with the native data structure, and queries can be performed by asking meaningful questions. Queries can be performed at hyper speed, too: typically, 3,000 times faster than an SQL database query and across dense networks of knowledge.  That could be pinpointing the best clinical doctors to target for clinical trials to be successful – based not only on their areas of expertise, but also their current capacity, whether they have access to the right equipment and whether they may be working with a competitor.

Overcoming gaps in clinical data

At a clinical trial level, knowledge graphs are resonating especially because of the growing challenges of reaching statistical significance with data among small populations with a rare condition. As some of the growing body of work in diabetes research shows, knowledge graphs can help with phenotype mapping between humans and animals, by extrapolating and connecting data points that are phenotypically equivalent between studies of mice, and of humans, where clinical parameters and observations aren’t immediately comparable.

There can be no doubt that the future of life sciences is data-driven. Just as vehicle manufacturers are investing heavily in intelligent onboard systems and teams of data scientists as their next-generation differentiators, so must pharma now look to a bolder use of data to determine its product road maps more strategically and navigate regulatory approvals more swiftly.

All roads point to a data-based future

For now, the challenge for all life sciences companies is to be open to new data technologies and techniques, and the potential they represent in overcoming critical business challenges in fresh ways. Instead of pursuing projects involving traditional relational database systems, the smarter movers are thinking laterally about how to deliver new results.

None of this takes away from the call to keep enriching and enhancing data to drive up its quality. However, the more data that is fed into a knowledge graph, the more accurate the overall picture it is able to build, using AI/machine learning to understand the insights that matter. A good analogy would be Google Maps, which builds reliable representations of the physical world from huge volumes of diverse data, a picture that won’t alter if the odd rogue data point creeps in.

As the clinical opportunity for pharma grows ever more rewarding, yet more demanding and complex in scope, knowledge graphs are a game-changer.  Understanding the value of relationships between data is every bit as important as what those individual data points tell us in their own right. Without the ability to mine those correlations for new insights, companies will lack vital context and find themselves compromised in their ability to make accurate advanced predictions.

Just as hiring data scientists and data administrators will be high on the agenda for life sciences this year, so will be the exploration of knowledge graphs and all that they represent.


Alexander Jarasch is Technical Consultant for Pharma and Life Sciences at Neo4j