Repository logo
 

Search Results

Now showing 1 - 10 of 52
  • Identifying top relevant dates for implicit time sensitive queries
    Publication . Campos, Ricardo; Dias, Gaël; Jorge, Alípio; Nunes, Célia
    Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., ‘‘philip seymour hoffman’’) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.
  • Chisquared and related inducing pivot variables: an application to orthogonal mixed models
    Publication . Ferreira, Dário; Ferreira, Sandra S.; Nunes, Célia; Fonseca, Miguel; Mexia, João T.
    We use chi-squared and related pivot variables to induce probability measures for model parameters, obtaining some results that will be useful on the induced densities. As illustration we considered mixed models with balanced cross nesting and used the algebraic structure to derive confidence intervals for the variance components. A numerical application is presented.
  • Discriminant analysis and decision theory
    Publication . Ferreira, Sandra Saraiva; Ferreira, Dário; Nunes, Célia; Mexia, João T.
    A unified approach, based in Statistical Decision Theory, is presented for Discriminant Analysis. Thus optimum allocation rules minimizing the expected costs are derived for the continuous case and for the mixed case. In the first case, the observed variables are continuous, while in the mixed case, there will also be discrete as qualitative variables. The second case has many times been treated using logistic regression. The breaking up of the allocation problem into distinct cases is now overcome.
  • Estimation of variance components in normal linear mixed models with additivity
    Publication . Ferreira, Dário; Ferreira, Sandra S.; Nunes, Célia; Mexia, João T.
    In this paper we use commutative Jordan Algebras to estimate variance components in linear mixed models. We apply the theory to a model in which three factors cross and one of the factors is additive to the other two.
  • Confidence intervals for variance components in gauge capability studies
    Publication . Ferreira, Dário; Ferreira, Sandra S.; Nunes, Célia; Oliveira, Teresa A.; Mexia, João T.
    We present a method, that uses pivot variables, which are functions of statistics and parameters, of constructing confidence intervals for variance components in gauge capability studies. As illustration we will consider a study on repeatability and reproducibility measures. Besides this the paper includes a simulation study demonstrating that in approximately 9500 out of 10000 simulations the 95% confidence interval covers the true value of the parameter.
  • Parents’ educational level and second-hand tobacco smoke exposure at home in a sample of Portuguese children
    Publication . Vitória, Paulo; Nunes, Célia; Precioso, J.
    Second-hand tobacco smoke (SHS) exposure is a major and entirely avoidable health risk for children's health, well-being and development. The main objective of the current study was to investigate the association between parents' educational level and children's SHS home exposure. A self-administered questionnaire was conducted within a sample of 949 students in 4th grade (mean age 9.56±0.75, 53.4% male). The sample was randomly selected from all schools located at Lisbon District, Portugal. The current study confirmed that Portuguese children are exposed to unacceptable high levels of SHS at home, mainly by their parents' smoke. Prevalence of smokers was higher amongst parents with low educational level. Children of parents with low educational level were more likely to suffer SHS exposure at home. These results confirmed the social inequalities associated with smoking, support the relevance of more research on this subject and stress the need for more interventions to control this problem. Some interventions should be specifically aimed at less educated parents, particularly at less educated mothers.
  • Estimation of Variance Components in Linear Mixed Models with Commutative Orthogonal Block Structure
    Publication . Ferreira, Sandra S.; Ferreira, Dário; Nunes, Célia; Mexia, João T.
    Segregation and matching are techniques to estimate variance compo- nents in mixed models. A question arising is whether segregation can be applied in situations where matching does not apply. Our motivation for this research relies on the fact that we want an answer to that question and to explore this important class of models that can contribute to the devel- opment of mixed models. That is possible using the algebraic structure of mixed models. We present two examples showing that segregation can be applied in situations where matching does not apply.
  • Maximum Likelihood Estimation Methods for Variance Components in Linear Non-Orthogonal Small Size Design Models
    Publication . Ferreira, Dário; Ferreira, Sandra S.; Nunes, Célia; Mexia, João T.
    We compare four Maximum Likelihood Estimation methods for estimating variance components in normal linear mixed models, in the case of unbalanced small size design models: The Newton-Raphson, the Triple Minimization, the Gradient and a method where the starting points for the Newton-Raphson are the estimates obtained with the Triple Minimization method.
  • Random sample sizes in one-way fixed effects models
    Publication . Nunes, Célia; Capistrano, Gilberto; Ferreira, Dário; Ferreira, Sandra S.; Mexia, João T.
    Analysis of variance (ANOVA) is one of the most frequently used statistical analysis in several research areas, namely in medical research. Despite its wide use, it has been applied assuming that sample dimensions are known. In this work we aim to carry out ANOVA like analysis of one-way fixed effects models, to situations where the samples sizes may not be previously known. Assuming that the samples were generated by Pois- son counting processes we obtain the unconditional distribution of the test statistic, under the assumption that we have random sample sizes. The applicability of the pro- posed approach is illustrated considering a real data example on cancer registries. The results obtained suggested that false rejections may be avoid by applying our approach.
  • Exact critical values for one-way fixed effects models with random sample sizes
    Publication . Nunes, Célia; Capistrano, Gilberto; Ferreira, Dário; Ferreira, Sandra S.; Mexia, João T.
    Analysis of variance (ANOVA) is one of the most frequently used statistical analyses in several research areas, namely in medical research. Despite its wide use, it has been applied assuming that sample dimensions are known. In this work we aim to carry out ANOVA like analysis of one-way fixed effects models, to situations where the samples sizes may not be previously known. In these situations it is more appropriate to consider the sample sizes as realizations of independent random variables. This approach must be based on an adequate choice of the distributions of the samples sizes. We assume the Poisson distribution when the occurrence of observations corresponds to a counting process. The Binomial distribution is the proper choice if we have observations failures and there exist an upper bound for the sample sizes. We also show how to carry out our main goal by computing correct critical values. The applicability of the proposed approach is illustrated considering a real data example on cancer registries. The results obtained suggested that false rejections may be avoided by applying our approach.