Repository logo
 
No Thumbnail Available
Publication

Anonymized Data Assessment via Analysis of Variance: An Application to Higher Education Evaluation

Use this identifier to reference this record.
Name:Description:Size:Format: 
ICCSA-CAS2023-cameraReady.pdf672.23 KBAdobe PDF Download

Advisor(s)

Abstract(s)

The assessment of the utility of an anonymized data set can be operational-ized by the determination of the amount of information loss. To investigate the possible degradation of the relationship between variables after anony-mization, hence measuring the loss, we perform an a posteriori analysis of variance. Several anonymized scenarios are compared with the original data. Differential privacy is applied as data anonymization process. We assess data utility based on the agreement between the original data structure and the anonymized structures. Data quality and utility are quantified by standard metrics, characteristics of the groups obtained. In addition, we use analysis of variance to show how estimates change. For illustration, we apply this ap-proach to Brazilian Higher Education data with focus on the main effects of interaction terms involving gender differentiation. The findings indicate that blindly using anonymized data for scientific purposes could potentially un-dermine the validity of the conclusions.

Description

Keywords

Data anonymization Differential privacy Data utility Data quality ENADE

Citation

Ferrão, M.E., Prata, P., Fazendeiro, P. (2023). Anonymized Data Assessment via Analysis of Variance: An Application to Higher Education Evaluation. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023 Workshops. ICCSA 2023. Lecture Notes in Computer Science, vol 14105. Springer, Cham. https://doi.org/10.1007/978-3-031-37108-0_9

Organizational Units

Journal Issue