Skip to Main content Skip to Navigation
Conference papers

Privacy-Preserving Synthetic Educational Data Generation

Jill-Jênn Vie 1 Tomas Rigaux 1 Sein Minn 2 
2 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.
Complete list of metadata
Contributor : Jill-Jênn Vie Connect in order to contact the contributor
Submitted on : Wednesday, July 6, 2022 - 1:41:10 PM
Last modification on : Saturday, July 9, 2022 - 3:32:56 AM


Files produced by the author(s)


  • HAL Id : hal-03715416, version 1
  • ARXIV : 2207.03202


Jill-Jênn Vie, Tomas Rigaux, Sein Minn. Privacy-Preserving Synthetic Educational Data Generation. EC-TEL 2022, Sep 2022, Toulouse, France. ⟨hal-03715416⟩



Record views


Files downloads