Skip to content Accessibility statement

New project to investigate societal consequences of using synthetic data to train algorithms

News

Posted on Thursday 4 September 2025

Researchers in the University of York’s Department of Sociology will lead one of the first large-scale, systematic social science studies of synthetic data.
A new project will investigate the societal effects of synthetic data.

Synthetic data are information generated by machine learning algorithms and AI models such as Gemini and GPT-4. They are increasingly used to fill the gaps in real-world data and are often seen as a solution to some of the challenges presented by AI development, such as lack of diversity and representation and issues of privacy and confidentiality. 

Although already in use within sectors such as healthcare, finance, biometrics and surveillance, their societal effects are currently not well understood and important ethical and political questions have yet to be addressed. 

Consequences

To shed new light on the subject, the European Research Council-funded SYNDATA project, led by Dr Benjamin Jacobsen, will look at the practical and political consequences of using synthetic data. As one of the first large-scale social science studies of this kind, the project aims to generate new knowledge on how synthetic data are transforming AI and society. 

Dr Jacobsen said “A crucial part of the allure of synthetic data is how they promise to address some of the ethical issues associated with the extraction of real-world data and challenges associated with large training datasets, such as class imbalance and lack of racial and gendered representation. If something cannot be found or collected in the real world, it can be generated via algorithms. 

“However, this has significant and disruptive ethical implications, because synthetic data intervene in our understanding of long-standing issues such as bias, fairness and algorithmic injustice.“ 

Under-researched

While there is a substantive literature about the effects of algorithms on society, the area of synthetic data remains both under-researched and under-theorised. For example, what happens when you can generate synthetic data of minority populations to make your algorithm less biased? And, what happens to data privacy when algorithms can be trained on data of people that are not real? 

By examining how they are produced, what kinds of people or groups they depict and how they challenge or reinforce existing power structures, the SYNDATA project will investigate how synthetic data, algorithms and AI models shape society. 

With recent developments in generative AI models it has never been easier to generate realistic synthetic data at scale. Synthetic data are likely to become an issue not only for regulators but also for how we think about the ethics of data and algorithms on a global scale.

Pressing questions

To answer these pressing questions, the project will conduct both archival research, fieldwork and case studies in the form of both historical predecessors of synthetic data as well as defining studies of the different areas where it is currently being generated. 

Dr Jacobsen said “By developing new ways of thinking about the ethics of data in an age where the line between the ‘real’ and the ‘synthetic’ is increasingly blurred, the SYNDATA-project will shed light on both the contemporary and future use of AI data.“ 

Further information

The Ethics of Synthetic Data in the Age of Machine Learning and AI is funded by the European Research Council and will start in January 2026.  

Research newsletter

Our monthly research newsletter features a curated mix of news, events, and recent discoveries delivered straight to your inbox.

Sign up

Explore more news

News

1 April 2026

The University of York’s key community partner, York Cares, has been selected by Lord Mayor Elect, Cllr Margaret Wells, as her official charity for the year ahead.

News

31 March 2026

Scientists at the University of York have cracked a 40-year-old biological cold case by revealing how the parasite that causes Sleeping Sickness stays one step ahead of the human immune system.

News

26 March 2026

A University of York academic has been appointed to the panel of a public inquiry investigating the violent confrontation between police and striking miners at Orgreave coking plant in South Yorkshire in June 1984.

News

26 March 2026

Early hunter-gatherers across Northern and Eastern Europe developed complex culinary tastes and were expert botanists and creative cooks, a new study has revealed.

News

25 March 2026

Twins often don't pick up new skills quite as fast as single-born children in their early years, according to the findings of a new study

Read more news