Neo4j’s Alicia Frame explains how life science researchers can exploit graph databases to get truly granular insight into big data to make major leaps forward in medical research.
Complex data sets hold the key to advancing medical breakthroughs. These data sets tend to be voluminous and heterogeneous by nature, presenting an insurmountable challenge for traditional data analysis methods as they struggle to link patterns and outcomes. The unfortunate consequence is a slowdown in the progress of research.
Anyone who works in life sciences is aware that they are working with highly connected information; the challenge is making sense of these connections. Unfortunately, many scientists are still using relational databases and spreadsheets which makes mapping important patterns and connections unintuitive and difficult, if not impossible.
Graph technology is emerging as an enabler for researchers to trawl gargantuan amounts of unstructured data, turning it into valuable facts which can be scaled and connected. The value of graph technology was first recognised by social web giants Google, LinkedIn and Facebook. Whether researchers are analysing genome data, investigating drug interactions, disease factors or processing clinical data, graphs can reveal incredible insights from connected data.
Graph database technology was used to process 2.6 terabytes of data in the Panama and Paradise Papers investigations, mining data from 11.5 million files to find hidden relationships between powerful people and off-shore bank accounts. It would have taken humans years to manually parse this information. It would have been impossible to detect these important connections with tabular technologies such as SQL-based relational databases and other approaches that depend on storing data in rows and columns. Similarly, academic articles and patents can be parsed and used to build knowledge graphs that reveal new connections and insights for the life sciences.
Powerful framework for data
The beauty of graph databases is they can not only handle vast data sets, but they can uniquely uncover patterns in the data, joining the dots and uncovering patterns that are useful to researchers. Essentially, graph technology provides a powerful framework for storing, managing and querying highly connected data, which is why it lends itself so well to medical research across all links in the life science chain. A graph database adds a critical perspective and turns data researchers are working on into actionable intelligence.
Graph technology also has the inbuilt power to collaboratively filter data that has been gathered by a number of users. As an example, collaborative filtering is a core technique used by recommendation engines on retail websites. Data or patterns can be filtered via various criteria such as viewpoints and data sources. Similarly, medical researchers can use collaborative filtering to work on promising datasets in parallel, saving on time and research budgets.
Graph technology is already being leveraged in the fight against diabetes. The German Centre for Diabetes Research, the DZD, the Federal Republic’s national centre for studying diabetes has been an early adopter of graph technology in the life sciences arena.
Joining the dots
The DZD is exploiting graph technology combined with other powerful techniques such as artificial intelligence (AI) to spot patterns in its research data. Graph databases enable researchers to easily connect different types of data, making it much easier to query, dramatically accelerating data analysis and speeding up research time.
Graph database software enables it to dive deeper into its diabetes ‘map’ to find hidden connections or relationships to open up new pathways of research that are largely unexplored. It is also looking at creating new data models to better represent existing human and animal data. In the future, it may be possible to integrate data from diabetes research with cancer research, for example, to see if there are any hidden connections possible between these two complex disease areas.
DZD is also exploring combining machine learning (ML) with graphs to identify new subtypes of diabetes. It hopes to build predictive models that will identify the probability of disease progression for specific patients.
The potential of graph technology is also being used by The Institute for Cancer Research and Treatment (IRCC) of Candiolo, in Italy. The IRCC research team performs molecular and biological tests on cancer samples that have been collected from hospitals across Europe.
The data is complex and hierarchical with frequently changing relationships. Initially, it tried to model its data using a relational database, but it was slow and there were issues with data integration. The IRCC now uses graph databases to continually import data from its data sources and find intricate relationships in the data sets, analyse experimental procedures, build genomic domains and so forth. Graph technology is enabling IRCC to harvest valuable insight from its data that were not previously possible.
Advances in research
Medical research is about diving into the unknown. Graph technology’s ability to focus on relationships between entities makes it an extremely powerful tool in life sciences research going forward. Being able to mine large amounts of data quickly and accurately is imperative in speeding up medical research to improve the health and wellbeing of our society.
Alicia Frame is a Senior Data Scientist at the world’s leading graph database, Neo4j