by Marianna Capasso
Capasso, M. (2025). Synthetic data as meaningful data. On Responsibility in data ecosystems. Big Data & Society, 12(4). https://doi.org/10.1177/20539517251386053 (Original work published 2025)
In the age of large-scale AI, data is more than just fuel, it's value. Yet as AI systems grow more powerful, they are also becoming increasingly data-hungry. In this context, synthetic data emerges as a promising solution: artificially generated data that may offer an infinite supply, privacy by design, and new ways to overcome bias in training data. But while synthetic data holds technical promise, the challenges run much deeper than simply generating convincing data points.
This study argues for viewing synthetic data not as a technical fix, but through an analogical lens. Rather than assessing synthetic data solely by how well it mimics or replaces the real, it should be understood as a relational and regulative concept — one that reflects the complex systems of science and innovation in which data is embedded, including the tensions, power structures, and governance models that surround it.
Key takeaways from this study include:
· - Synthetic data introduces an analogical perspective on data. This shift invites more critical thinking about the trade-offs involved in synthetic data generation and use, and how it might be shaped to better serve data justice. Questions like “What is high-quality synthetic data?” must be asked alongside: “For whom? In what context? At what cost?”
· - Current quality metrics fall short, especially when they ignore broader concerns such as fairness, representational harm, or model collapse. Cross-disciplinary collaboration is therefore essential for embedding responsibility in complex data innovation ecosystems.
· - Synthetic data should not only aim to mitigate bias, but also support algorithmic reparation and responsiveness, proactively addressing historical and systemic bias in data-driven technologies by continuously adapting to emerging societal values, needs, and contextual changes.