The National Data Library (the NDL) is a government initiative aiming to make public data more accessible for research and development. It looks to support data-driven public services with strong safeguards, and the development and deployment of AI for public benefit. The initiative is currently in its scoping phase in the Department of Science, Innovation and Technology, and is gaining significant attention in parliament, government policy papers, and among a wider audience of stakeholders who believe there is much to gain from such an initiative.
Here, we outline ten key takeaways on the practical implementation of this crucial initiative, sharing insights that panelists and invited speakers offered during the sessions. In particular, we outline the importance of the importance of the initiative being shaped around deep existing expertise and community needs.
1. It makes sense to call it a ‘Library’
Speakers drew parallels with traditional libraries, arguing that the NDL should be recognised as a crucial digital research infrastructure. Such data infrastructure is sorely needed. We heard about existing pain points that this initiative can address such as costs in preparing data, the academic lifecycle focus of current initiatives which are unsuitable for 24-hour operational needs, and the lack of capacity in local government for massive datasets.
2. The initiative should be federated, not centralised
While the term ‘library’ perhaps conjures images of a single repository, this should not be a new centralised platform to solve all problems. Instead, speakers suggested that the initiative should leverage existing datasets and platforms to facilitate new use cases. We heard proposals for a federated architecture where the NDL does not hold all data centrally, but rather facilitates access to existing, curated collections.
3. The initiative should not reinvent the wheel
There was broad agreement that the initiative must build from vast expertise and best existing use cases in responsible data stewardship and governance, data quality, data access. Speakers highlighted lessons from historical data initiatives, for example data environments like OpenSAFELY, which could be leveraged in the initial design of the initiative, and reducing timelines to its launch. We also heard about the importance of building FAIR (Findable, Accessible, Interoperable, Reusable) data from day one.
4. Going beyond health data
There are clear and hugely valuable use cases associated with health data, with a huge amount to gain from better infrastructure in this field given difficulties in data access, demonstrated by the King’s PharosAI initiative among many others mentioned. However, use cases go far beyond health applications.
The afternoon’s keynote featured the "Data Wishlist" project, which identified around 130 requests for valuable datasets across sectors like health, education, economy, and environment that the National Data Library can address. Across these use cases, bottlenecks to data access and use included slow turnaround times, poor metadata, complex application processes, and insufficient compute power. The project also outlined service needs such as remote access, training, and data preparation tools, which the NDL can seek to address. We heard provided compelling examples of local authority needs, such as data for net-zero initiatives, climate resilience, and social justice issues like free school meals and tracking young people's outcomes.
5. AI readiness should be a core design principle
One talk outlined the Open Data Institute’s vision for an AI-ready NDL, emphasising high-quality public data, user-centric design, and the use of open standards, federation, and interoperability. We heard about significant challenges in government organisations' readiness to provide AI-ready data, including limited standardisation, project-based funding, and a lack of public trust. The ODI's AI-ready data framework, was also shared, which describes four facets for a dataset: technical optimisation, adherence to standards, legal compliance, and responsible collection.
AI can also support the initiative - for instance improving metadata descriptions of datasets or helping people with varying data literacy to make sense of data.
6. Address the missing voices in the debate
Some ecosystem voices are not well integrated in the conversations about the NDL. Speakers emphasised the NDL's potential to bring user communities such as local authorities, civil society, and individuals into the data ecosystem, ensuring data diversity and mitigating bias. In all, without diverse voices the NDL will not deliver for the UK.
Speakers advocated for building trust through active engagement, like London Data Week, rather than just avoiding mistrust, and drew on international examples like India Stack, illustrating how decentralised data environments can empower citizens and drive economic growth by enabling direct data sharing and control at the individual level. Suggestions further included initial metrics that could form a basis for a thriving NDL, including representation across diverse sectors (academia, industry, audit, regulatory), alongside data ingress and consumption.
7. NDL delivery needs to operate at speed, different from similar projects in the past
The initiative should deliver a strong minimum viable product in the near future and define principles for new data sources and platforms to meet requirements to join the NDL.
Speakers suggested aligning funding models with rapid impact, taking inspiration from PharosAI's venture catalyst model which requires a commercial spin-out within two years, and focusing the NDL on urgent public sector problems, such as prison service. Further, the importance of building from existing infrastructure such as standards for responsible data practices to support this rapid deployment was also clear.
8. Design the NDL iteratively to allow flexibility
There is not one way to build the NDL, and the initiative can build from the vast experience in deployment of data and digital infrastructure to identify which approaches are likely to work, and which will not. Some level of risk will be required to start working on delivery sooner rather than later in order not to lose momentum. To achieve this, the NDL should be designed with experimentation in mind, to be improved iteratively over decades. As an example, the Content Store initiative on behalf of UK government, which used hackathons to identify solutions for its roll out, and was based on a foundation of curated high-quality content and careful consideration of permissions and access.
9. Greater success could come development of critical data infrastructure
While the NDL is a unique opportunity to improve access and use of existing data assets, there is critical data infrastructure still missing, with substantial consequences for economic growth. This includes open company data and postcodes data among others.
Infrastructure can also include investment in skills. Speakers identified skills gaps in data management, governance, and risk management, suggesting that strengths in these areas could help the NDL to become a potential control point against bad actors in emerging technologies like synthetic biology. They further noted that high salaries in big tech draw skills away, emphasising the importance of career trajectories and government-funded skills training, as seen in Singapore.
10. The NDL should rely on a sustainable financing model
More work needs to happen to understand how to support different business models for the NDL, and understand equitable ways for the commercial sector to engage with and contribute to the NDL. Data is not free and needs sustainable commercial models, suggesting contextual pricing and exploring models where individuals get paid for their data.
Still, speakers stressed the importance of longevity of initiatives to develop experience in data stewardship, citing the Met Office weather model as an example of deep, long-term memory that adds subtle value.
Conclusions
The takeaways from this event will help both interested stakeholders of the National Data Library to better understand its potential, and those involved in decision-making about the initiative. The initiative could have important impacts on the role of emerging technologies particularly in public services, creating infrastructure for ensuring these better serve public benefit.
Please contact the Centre for Data Futures team (cdf@kcl.ac.uk) if you are interested in future collaborations on this topic.