WHO Gates Foundation pubic health data interoperability

21 March 2016

Geneva | WHO and the Bill & Melinda Gates Foundation

Following on from our work for the [Public Health Research Data Form](/portfolio/2013/11/insight-into-public-health-research-data-management/) into research data management, Whyt*hawk* collaborated with the WHO and Gates Foundation to research challenges for cross-border health data sharing, with the specific focus on pandemics and pandemic-preparedness.

WHO Gates Foundation pubic health data interoperability

Our solution

We had several different challenges. The most important was choosing a good case study. We had two options:

  • Pandemic response in East Africa where a range of emerging and established diseases were being monitored,
  • Extremely multi-drug resistant tuberculosis pandemic response in Eastern Europe.

We chose the TB project in Eastern Europe as being more straightforward. One disease vector, and one process. Even so we then ran into the additional challenge of many of the countries participating being either threatened, or invaded by Russia. However, conflict was also a contributor to the pandemic, as well as to the low-trust environment for data sharing.

We arranged a series of workshops with public health data managers and directors, including in Spain and Turkey, and compiled a list of the challenges they experienced. Much of what we discovered related to data interoperability, as well as limited trust and significant skills shortages for using data gathered during the course of public health interventions to generate research insight.

Whythawk itself could play no roll in the legislative or political challenges, but we were invited by the Gates Foundation to develop a graduate program in Data Science for public health that could be delivered as a one-year taught Masters program.

Outcomes

Whythawk developed the Data as a Science program, including developing four of the 20 taught modules.

The course is based on the Sloyd model of technical training. Each lesson is discrete, building on the previous lesson, and provides a functional and holistic understanding of the scientific method as it applies to data. It is not about learning an algorithm and applying it to abstract, arbitrary data. The course has the objective of training complete data scientists, you will learn how research works and apply tools to a specific case-study.

Each lesson starts with a research question, and progresses by teaching a complete, and practical, set of skills allowing students to learn at their own pace and in an order which suites their current understanding. Case-studies and tutorials are drawn from public health, economics and social issues, and the course is accessible to anyone with an interest in data. Course materials, case studies and guided tutorials are presented in Jupyter Notebooks permitting learners to test running code and gain hands-on understanding of the techniques discussed.

Each lesson is guided by the following four topics:

  • Ethics: determine the social and behavioural challenges posed by a research question,
  • Curation: establish the research requirements for data collection and management,
  • Analysis: investigate, explore and analyse research data,
  • Presentation: prepare and present the results of analysis to promote a response.

Unfortunately, following the US defunding of the WHO, priorities had to rapidly shift, and this course development remains incomplete.

Photo courtesy of the Bill & Melinda Gates Foundation

Related projects

La Marine Nationale Française CKAN Upgrade & Deployment
17 October 2025

Marine Nationale has an existing CKAN data management portal deployed on their internally accessible network. This is a secure environment, and upgrades and extensions to the software are performed by Marine Nationale directly.

UCL Data-As-A-Science Graduate Bootcamp
11 July 2025

Data has become the most important language of our era, informing everything from intelligence in automated machines, to predictive analytics in medical diagnostics. The plunging cost and easy accessibility of the raw requirements for such systems – data, software, distributed computing, and sensors – are driving the adoption and growth of data-driven decision-making.

openLocal Commercial Location Data for England & Wales Research Integration
30 June 2025

openLocal.uk is a quarterly-updated commercial location database, aggregating open data on vacancies, rental valuations, rates & ratepayers, into an integrated time-series database of individual retail, industrial, office and leisure business units.

essential