Loading...
19 results
Search Results
Now showing 1 - 10 of 19
- Imputação de Valores Omissos em Análise Descritiva de Dados, em RPublication . Salambiaku, Luzizila; Prata, Paula; Ferrão, Maria EugéniaOs valores omissos representam um problema frequente no processo de análise de dados. Neste artigo foram comparados seis métodos distintos de imputação, disponíveis no software R e avaliado o seu desempenho em conjuntos de dados relacionados com a área da educação. Foi estudada uma amostra de 20408 estudantes para testar os seis algoritmos em quatro conjuntos de dados gerados por simulação com diferentes percentagens de valores omissos, considerando 5%, 10%, 15% e 20% nas variáveis de interesse. Foram explorados métodos de imputação simples (Média, Mediana e Moda), métodos baseados em aprendizagem automática (kNN e bPCA) e um método de imputação múltipla (MICE). Foi avaliado o desempenho de cada método calculando os respetivos erros de imputação através as métricas RMSE e MAE. Os resultados obtidos mostram que a imputação pela Moda forneceu quase de forma constante menores valores de erro.
- Anonymized Data Assessment via Analysis of Variance: An Application to Higher Education EvaluationPublication . Ferrão, Maria Eugénia; Prata, Paula; Fazendeiro, PauloThe assessment of the utility of an anonymized data set can be operational-ized by the determination of the amount of information loss. To investigate the possible degradation of the relationship between variables after anony-mization, hence measuring the loss, we perform an a posteriori analysis of variance. Several anonymized scenarios are compared with the original data. Differential privacy is applied as data anonymization process. We assess data utility based on the agreement between the original data structure and the anonymized structures. Data quality and utility are quantified by standard metrics, characteristics of the groups obtained. In addition, we use analysis of variance to show how estimates change. For illustration, we apply this ap-proach to Brazilian Higher Education data with focus on the main effects of interaction terms involving gender differentiation. The findings indicate that blindly using anonymized data for scientific purposes could potentially un-dermine the validity of the conclusions.
- Data Anonymization: K-anonymity Sensitivity AnalysisPublication . Santos, Wilson; Sousa, Gonçalo; Prata, Paula; Ferrão, Maria EugéniaThese days the digitization process is everywhere, spreading also across central governments and local authorities. It is hoped that, using open government data for scientific research purposes, the public good and social justice might be enhanced. Taking into account the European General Data Protection Regulation recently adopted, the big challenge in Portugal and other European countries, is how to provide the right balance between personal data privacy and data value for research. This work presents a sensitivity study of data anonymization procedure applied to a real open government data available from the Brazilian higher education evaluation system. The ARX k-anonymization algorithm, with and without generalization of some research value variables, was performed. The analysis of the amount of data / information lost and the risk of re-identification suggest that the anonymization process may lead to the under-representation of minorities and sociodemographic disadvantaged groups. It will enable scientists to improve the balance among risk, data usability, and contributions for the public good policies and practices.
- Multi-device Notifications: a comparison between MQTT and CoAPPublication . Silva, Luis; Mello, Gabriel de; Silva, Bruno Alves da; Villarrubia Gonzalez, Gabriel; Santana, Juan da Paz; Prata, Paula; Leithardt, ValderiNew devices generate, send, and display messages about their status, data retrieval, and device information. An increase in the number of notifications received, tends to reduce their tolerance . This article sets out a notification management system focused on user profiles and environments. The solution involves transferring notifications in a multi-device scenario using MQTT and CoAP technologies, while also adopting privacy criteria. It consists of three modules, the first of which was prototype and evaluated using real devices.
- Computing Topics on Multiple Imputation in Big Identifiable Data Using R: An Application to Educational ResearchPublication . Ferrão, Maria Eugénia; Prata, PaulaThis article shows how to conduct multiple imputation in big identifiable data for educational research purposes. The R statistical package and procedures to handle missing data applied for the purpose of this study were “Bay-lorEdPsych” and “mi”. Firstly, we checked that every dataset rejected the null hypothesis for Missing Completely At Random (MCAR), using the function “LittleMCAR”. Simulated and real data analyses were conducted. Results sug-gest that the improvement of the quality of imputation requires alternative methods to be developed.
- Web application for the analysis of assessment testsPublication . Prata, Paula; Duarte, Luís; Ferrão, Maria EugéniaIntroduction: An assessment test enables the evaluation of an individual’s competence or ability. Such tests are important for both teaching and professional training institutions, as well as for the recruiting of human resources in the enterprise. Objectives: The present paper introduces the “Evaluate” web application, for the analysis of assessment tests. Methods: The design and implementation of the application is described, which allows the management of assessment items, used to constitute evaluation tests, upon which results the main descriptive statistic values used under the classical test theory in the analysis of assessment tests are calculated. The application was developed in Python, within the Django framework, and tested with real assessment tests. Results: Scores are assigned to each assessment item, and various statistics — such as difficulty and discrimination index, point-biserial correlation, test internal consistency coefficient — can be obtained upon the answers of the subjects, as well as a graphic analysis of the performance of each subject on each assessment item, as well as on the test as a whole. Conclusion: The “Evaluate” application makes a meaningful contribution to a better knowledge of assessment tools used in competence evaluation, by allowing the detection of inconsistencies and the consequent improvement in the process.
- Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment systemPublication . Ferrão, Maria Eugénia; Prata, Paula; Alves, Maria Teresa G.Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.
- Estratégias de Tolerância a Falhas em Computação Móvel na NuvemPublication . Catumbela, Euclides; Prata, PaulaApesar de os periféricos móveis possuírem cada vez mais capacidade de computação e armazenamento, a ligação da computação móvel com a computação na núvem (cloud) é também, cada vez mais, forte. Aplicações móveis que processem ou partilhem grandes quantidades de dados usam a nuvem para superar a limitação de recursos imposta por smartphones e tablets. Estes sistemas trazem novos desafios em termos de tolerância a falhas. Por um lado funcionam com baterias cuja carga tem duração limitada e por outro lado, a mobilidade do utilizador pode dificultar a obtenção de conectividade contínua e com largura de banda invariável como seria desejável. Neste trabalho propomos e avaliamos mecanismos de tolerância a falhas para dois tipos de falhas comuns em computação móvel na nuvem: Falha da carga da bateria e falhas na ligação à rede.
- Mobile Cloud Computing - Building High Availability ApplicationsPublication . Prata, Paula; Catumbela, EuclidesMobile Computing seems to spread to all aspects of our life, from light entertainment to health or finance apps. Cloud services appear as the common solution to be used as backend of mobile applications. In complex applications the cloud can even be used as an additional computational resource. Mobile cloud computing applications raise new reliability and availability challenges that result namely from the device mobility and from the limited battery charge. In this work, fault tolerant mechanisms for connection problems and for low battery charge are proposed and studied. The execution time overhead of those mechanisms is evaluated and compared with the offline support existent in two common cloud platforms: Firebase and Azure.
- Performance Assessment of the Canonical Genetic Algorithm: a Study on Parallel Processing Via GPU ArchitecturePublication . Fazendeiro, Paulo; Prata, PaulaGenetic Algorithms (GAs) exhibit a well-balanced operation, combining exploration with exploitation. This balance, which has a strong impact on the quality of the solutions, depends on the right choice of the genetic operators and on the size of the population. The results reported in the present work shows that the GPU architecture is an efficient alternative to implement population-based search methods. In the case of heavy workloads the speedup gains are quite impressive. The reported experiments also show that the two-dimensional granularity offered by the GPU architecture is advantageous for the operators presenting functional and data independence at the population+genotype level.