Repository logo
 
Publication

Utility-driven assessment of anonymized data via clustering

dc.contributor.authorFerrão, Maria Eugénia
dc.contributor.authorPrata, Paula
dc.contributor.authorFazendeiro, Paulo
dc.date.accessioned2022-08-26T08:33:17Z
dc.date.available2022-08-26T08:33:17Z
dc.date.issued2022-07-30
dc.description.abstractIn this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationFerrão, M.E., Prata, P. & Fazendeiro, P. Utility-driven assessment of anonymized data via clustering. Sci Data 9, 456 (2022). https://doi.org/10.1038/s41597-022-01561-6.pt_PT
dc.identifier.doi10.1038/s41597-022-01561-6pt_PT
dc.identifier.issn2052-4463
dc.identifier.urihttp://hdl.handle.net/10400.6/12328
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringer Naturept_PT
dc.relationResearch in Economics and Mathematics
dc.relationInstituto de Telecomunicações
dc.relation.publisherversionhttps://doi.org/10.1038/s41597-022-01561-6pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectData privacypt_PT
dc.subjectData utilitypt_PT
dc.subjectClusteringpt_PT
dc.subjectEducationpt_PT
dc.titleUtility-driven assessment of anonymized data via clusteringpt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleResearch in Economics and Mathematics
oaire.awardTitleInstituto de Telecomunicações
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F05069%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50008%2F2020/PT
oaire.citation.conferencePlaceOnlinept_PT
oaire.citation.issue1pt_PT
oaire.citation.titleScientific Datapt_PT
oaire.citation.volume9pt_PT
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
person.familyNameFerrão
person.familyNamePrata
person.familyNameFazendeiro
person.givenNameMaria Eugénia
person.givenNamePaula
person.givenNamePaulo
person.identifier.ciencia-id651F-C1C8-44AD
person.identifier.ciencia-id911F-3584-721F
person.identifier.orcid0000-0002-1317-0629
person.identifier.orcid0000-0002-3072-0186
person.identifier.orcid0000-0001-6054-7188
person.identifier.ridA-2665-2011
person.identifier.ridB-7713-2008
person.identifier.scopus-author-id24075949800
person.identifier.scopus-author-id6506143567
person.identifier.scopus-author-id19640174600
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationf32b6cd9-ea61-4de5-898c-d4e0d40a057f
relation.isAuthorOfPublication138a0dac-5e5d-466c-901d-4ed34f860403
relation.isAuthorOfPublication47442970-f246-4908-b873-0b58e684a9e9
relation.isAuthorOfPublication.latestForDiscoveryf32b6cd9-ea61-4de5-898c-d4e0d40a057f
relation.isProjectOfPublication6d4350a2-a786-44ff-9ab8-6350f3b4ae97
relation.isProjectOfPublication5a9bd4c8-57a9-46c4-95dc-a5e5c220c117
relation.isProjectOfPublication.latestForDiscovery6d4350a2-a786-44ff-9ab8-6350f3b4ae97

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SdataOnline.pdf
Size:
1.48 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: