Ted Slater shares his thoughts on how to overcome the challenges to making data FAIR in pharma.
Data have long been the strong foundation on which scientific discovery has been built, giving the means for validating discoveries, supporting peer review and predicting the unknown. As we see an increase in the uptake of technologies like artificial intelligence (AI) and machine learning (ML) across the sciences, data will remain just as critical.
AI and ML are reliant on quality – and researchers agree, with two-thirds of life science professionals citing data quality as the biggest barrier to using AI in drug design. Without high-fidelity data, researchers risk inaccurate or incorrect outcomes, rendering the investment in AI and ML technology significantly less valuable.
This is where data standards, like the FAIR data principles, can play a significant role in optimising scientific discovery. FAIR standards ensure data are Findable, Accessible, Interoperable and Reusable. In broad terms, this means data should be catalogued so they can be easily discovered by researchers, expressed in a standard and sharable format, able to be used by a range of applications and use cases, and available to be recycled and reprocessed.
While the FAIR data principles are currently optional for many (but by no means all) researchers and their organisations, implementing them can add real value to any industry engaged in R&D – from drug discovery to materials research. These principles help develop good data management practices that are necessary for integrating and reusing data and knowledge, and are also the basis of knowledge discovery and innovation.
Compliance with FAIR makes it easier to search data to find relevant information more quickly and can cut down on the duplication of work when researchers are unable to find the necessary data. Moreover, FAIR data is interoperable, making it transferable between different internal and external systems. Not only does this reduce the costs associated with research, but given that data scientists only spend an estimated 20% of their time on data analysis, FAIR allows researchers to spend more time focusing on their research rather than sifting through endless data.
The scientific industry also needs to ensure its data are ‘clean’ and formatted for tools like AI and ML – as well as future technologies – to use. Without data standards, we risk falling into the trap of ‘garbage in, garbage out’ (GIGO) where bad data inputs result in poor quality outcomes.
Challenges to adopting FAIR
Despite the increasing need for data standards, there are three main challenges to adopting FAIR: cost, culture and compliance.
- The cost factor: This encompasses the initial cost of revising existing data to comply with data standards, the historic investment in legacy systems, and the cost if data are lost during transfer. But while there will be some initial costs to becoming compliant with FAIR, research from the European Commission has estimated the minimum annual cost of scientific disciplines not having FAIR data as €10.2bn across the European Union. Not only is FAIR cost effective to implement, it has long-term financial benefits too.
- The culture shock: Research has shown culture is the second biggest challenge for life science companies looking to develop data-driven and digital competencies. These cultural challenges can vary from a lack of awareness of the data principles, perceptions about the complexity of implementing FAIR, and researchers’ loyalty to existing systems. These are easily countered with greater education – both within industry and at an academic level – about the need for data standards.
- The compliance hurdle: Given the heavy regulation across scientific industries – from health and safety guidelines in chemicals to approval cycles in life sciences – changing processes can be a complicated task. However, FAIR data standards are beneficial from a regulatory perspective, ensuring chemical data, for example, is sharable across an organisation to avoid a repeated safety incident. While some assume implementing FAIR requires all data to be open, which may cause compliance worries, this is not the case – particularly in instances involving patient confidentiality, legal sensitivity or commercial interests. Equally, that does not mean that ‘closed’ data should be exempt from being FAIR just because it isn’t universally accessible.
FAIR also comes with significant advantages, such the initial opportunity to collaborate on shaping the implementation of FAIR, as well as potential future partnerships. The benefits of FAIR will be felt by entire industries and are a shared responsibility across the scientific community. Not only is collaboration necessary for ensuring data are interoperable, the involvement of multiple stakeholders can make the process of becoming FAIR quicker, cheaper and more successful overall.
A FAIR future for all?
While it’s not compulsory for organisations or individuals to introduce any principles for managing their data, it’s highly recommended in today’s changing scientific landscape. We’re at the outset of AI-driven science with its potential to solve some of the biggest scientific challenges we’re facing today, and data will be at the heart of this.
We have the opportunity now to work together and develop guidelines for our data to advance research. The alternative is to wait for the stage when technology compels organisations to format and standardise all their data. Then, the companies without FAIR standards will be one step behind, racing to meet the basic requirements needed to derive value from their data.