Name: | Description: | Size: | Format: | |
---|---|---|---|---|
672.23 KB | Adobe PDF |
Advisor(s)
Abstract(s)
The assessment of the utility of an anonymized data set can be operational-ized by the determination of the amount of information loss. To investigate the possible degradation of the relationship between variables after anony-mization, hence measuring the loss, we perform an a posteriori analysis of variance. Several anonymized scenarios are compared with the original data. Differential privacy is applied as data anonymization process. We assess data utility based on the agreement between the original data structure and the anonymized structures. Data quality and utility are quantified by standard metrics, characteristics of the groups obtained. In addition, we use analysis of variance to show how estimates change. For illustration, we apply this ap-proach to Brazilian Higher Education data with focus on the main effects of interaction terms involving gender differentiation. The findings indicate that blindly using anonymized data for scientific purposes could potentially un-dermine the validity of the conclusions.
Description
Keywords
Data anonymization Differential privacy Data utility Data quality ENADE
Citation
Ferrão, M.E., Prata, P., Fazendeiro, P. (2023). Anonymized Data Assessment via Analysis of Variance: An Application to Higher Education Evaluation. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023 Workshops. ICCSA 2023. Lecture Notes in Computer Science, vol 14105. Springer, Cham. https://doi.org/10.1007/978-3-031-37108-0_9
Publisher
Springer