Dr. Anton Yuryev, Professional Services Director, Elsevier, explores the increasing importance of artificial intelligence (AI) within drug discovery and precision medicine and how it is helping to discover new treatments for children with aggressive brain tumours.
COVID-19 has highlighted the important role of artificial intelligence (AI) and natural language processing (NLP) for many life sciences companies. Up to now, it has typically been used to automate existing processes, but today we’re seeing it applied to make tangible improvements to research and development (R&D) outcomes. For instance, using AI to find approved drugs that may have therapeutic value for COVID-19.
The role of AI/NLP is going to expand in the next few years – particularly in drug discovery and other areas of the R&D value chain, to help organisations gain a competitive edge. AI is also enabling the evolution of precision medicine and sequencing genomics and helping to draw inferences from the vast amount of data and scientific literature available today.
The use of AI in precision medicine has been especially beneficial in the search for new treatments for rare diseases. A recent collaborative project between Elsevier and the Sinergia Consortium (DMG/DIPG Center at University Children’s Hospital Zurich; ETH Zurich; the Centre for Molecular Medicine Norway (NCMM)), and the Open Pediatric Brain Tumor Atlas project, has demonstrated this.
Building an AI disease model
The project seeks to develop an AI disease model for diffuse intrinsic pontine glioma (DIPG). DIPG is a highly aggressive and difficult-to-treat form of brain tumor that targets children. Currently, there are no effective treatments for DIPG. Using NLP and machine learning (ML), the proposed disease model will assist with drug repurposing and support precision medicine. The initial models were built using knowledge graphs of peer reviewed literature through text mining.
Getting the full picture of DIPG
DIPG occurs in an area of the brainstem called the pons, which controls many of the body’s most vital functions, such as breathing, blood pressure, and heart rate. It is estimated to be the site of 10-15% of all brain tumors in children. To understand more about the disease with a view to find potential treatments, OMICs data for patients with DIPG was analyzed using Elsevier’s Biology Knowledge Graph and complementary software to develop a molecular disease model.
Developing the initial disease model
The project team then analyzed the patient OMICs data, utilizing an AI algorithm to understand protein activity in the tumors. This was used to project results and delve into the biological processes activity (to see which genes are most active in each patient).
The next stage was to map this against the cancer hallmarks model to understand which genes were mutating. The project increased the number of cancer hallmarks – biological capabilities acquired during the multistep development of human tumors – from 10 to 14, to increase the understanding of the disease. Each individual hallmark has several fundamental mechanisms depending on the tissue and biological processes involved. This enabled genes that were frequently mutated in DIPG patients to be identified from available scientific literature.
Adding real world patient data
To extend and improve the AI disease model, the Children’s Hospital of Philadelphia developed the cloud-based platform Cavatica, which includes DIPG data for more than 30 patients – indicating the genetic mutations and gene expression. These were found by sequencing tumor genomes and then comparing tumor genome with patient blood genomes.
The scale of the data for each patient is immense, containing 3,000 and 5,000 mutations and between 2,000 and 3,000 additional genetic mutations in the cancer patients. The data from the Children’s Hospital of Philadelphia confirmed the literature data and found additional novel mutations (including PTPRD). Overall, the Elsevier findings discovered two major cancer hallmarks to be active in 30 DIPG patients – TGF-beta signaling and VEGF production – which could help to find new effective treatments.
Using the disease model to find potential drug candidates
To build further and get closer to find effective treatments for DIPG, the algorithm was used to uncover drugs that repress the activity of the proteins which were found to be active in the cancer; 637 drug were identified that inhibit major expression regulators in DIPG patients. The drugs were ranked by their ability to reverse the change in protein activity in DIPG tumors compared with normal pons.
Modified NLP (using Elsevier AI deep reading and text mining) was then implemented to extract additional data from literature about drugs that can inhibit mutant TP53. This found 144 drugs that were described in scientific literature to inhibit the mutated protein– more than 50% are also among drugs predicted to reverse DIPG gene expression.
This model was then leveraged to find FDA-approved drugs which inhibit the disease mechanism, narrowing the drugs to 212. To find the 10 drugs with the most potential for experimental validation, further drug ranking scores were developed. One of which was to see if drugs penetrated the blood-brain barrier – predicted with another AI model trained using data from Reaxys. This eliminates drugs that can’t transverse the barriers and therefore would not be effective. One key concern when reviewing drugs was also toxicity, especially for children as DIPG patients are commonly between 5 and 10 years old. Data from Pharmapendium was used obtain toxicity profiles of found drugs.
The project demonstrates that AI, NLP, and text mining are effective ways of bettering our understanding of rare diseases to find potential drug candidates for further experimental testing and possibly lead to improved treatment of DIPG. This work can also provide a model to apply to other disease areas. When this model can be scaled, AI will allow further precision medicine approaches for a variety of rare diseases.