Ciclo de Palestras 2021 - 2

08/12

We propose data-driven reversible jump algorithms for selecting variables in a regression model. Our main motivation is to identify relevant SNPs for a specific phenotype, estimate their additive, dominance and epistasis (interaction) effects in genetic dataset that usually is a problem involving high-dimension data. We compare the DDRJ’s performance with other traditional methods of variable selection, using real and simulated data sets, and it shows good performance. Joint work with Prof. Dr. Luis Milan.

29/11 a 03/12
24/11

Modelos autorregressivos de duração condicional (ACD) são utilizados principalmente para lidar com dados de duração de transações financeiras. Tais dados possuem informações úteis sobre as atividades do mercado. Nesta apresentação, o modelo original ACD e algumas variantes são apresentadas.

17/11

Beatriz Rodrigues Pinna
Título: Modelos Fatoriais Espaço-Temporais para Dados de Contagem Multivariados: uma aplicação aos dados de criminalidade no Estado do Rio de Janeiro
Orientador: João Batista de Morais Pereira

Cindy Carriazo Alvarez
Título: Um modelo de volatilidade estocástica com mudanças de nível: um estudo empírico usando dinâmica Hamiltoniana
Orientador: Carlos Abanto Valle

Maiara Gripp de Souza
Título: Inferência em modelos de mistura de distribuições de transição
Orientadores: Guilherme Ost e Giulio Iacobelli

Natan Freitas Leite
Título: Treinamento e Performance: A Relação no Ciclismo
Orientadores: Carlos Tadeu Pagani Zanini e Hugo Tremonte de Carvalho

Rafael Cabral Fernandez
Título: A flexible hierarchical quantile spatial autoregressive model
Orientadores: Kelly Cristina Mota Gonçalves e João Batista de Morais Pereira

03/11

A key hypothesis in epidemiological studies is that time to disease exposure provides relevant information to be considered in statistical models. However, the initiation time of a particular condition is usually unknown. Therefore, we developed a multiple imputation methodology for the age at onset of a particular condition, which is supported by incidence data from different sources of information. We introduced and illustrated such a methodology using simulated data in order to examine the performance of our proposal. Then, we analyzed the association of gallstones and fatty liver disease in the Maule Cohort, a Chilean study of chronic diseases, using participants’ risk factors and six sources of information for the imputation of the age-occurrence of gallstones. Simulated studies showed that an increase in the proportion of imputed data does not affect the quality of the estimated coefficients associated with fully observed variables, while the imputed variable slowly reduces its effect. For the Chilean study, the categorized exposure time to gallstones is a significant variable, in which participants who had short and long exposure have, respectively, 26.2% and 29.1% higher chance of getting a fatty liver disease than non-exposed ones. In conclusion, our multiple imputation approach proved to be quite robust both in the linear/logistic regression simulation studies and in the real application, showing the great potential of this methodology. (Paper – https://journals.sagepub.com/doi/10.1177/09622802211013830)

27/10

Regression models are typically constructed to model the mean of a distribution. However, the density of several distributions is not indexed by the mean. In this context, this work provides a collection of regression models considering new parameterizations in terms of the mean and precision parameters. The main advantage of our new parametrizations is the straightforward interpretation of the regression coefficients in terms of the expectation, as usual in the context of generalized linear models.

20/10

Rare populations, such as endangered animals and plants, drug users and individuals with rare diseases, tend to cluster in regions. Adaptive cluster sampling is generally applied to obtain information from clustered and sparse populations since it increases survey effort in areas where the individuals of interest are observed. This work aims to propose a unit-level model which assumes that counts are related to auxiliary variables, improving the sampling process, assigning different weights to the cells, besides referring them spatially. The proposed model fits rare and grouped populations, disposed over a regular grid, in a Bayesian framework. The approach is compared to alternative methods using simulated data and a real experiment in which adaptive samples were drawn from an African Buffaloes population in a 24,108 square kilometers area of East Africa. Simulation studies show that the model is efficient under several settings, validating the methodology proposed in this paper for practical situations.

06/10

Factor Analysis is a popular method for modeling dependence in multivariate data. However, determining the number of factors and obtaining a sparse orientation of the loadings are still major challenges. In this paper, we propose a decision-theoretic approach that brings to light the relation between a sparse representation of the loadings and factor dimension. This relation is done through a summary from information contained in the multivariate posterior. To construct such summary, we introduce a three-step approach. In the first step, the model is fitted with a conservative factor dimension. In the second step, a series of point-estimates with a decreasing number of factors is obtained by minimizing an expected predictive loss function. In step three, the degradation in utility in relation to the sparse loadings and factor dimensions is displayed in the posterior summary. The findings are illustrated with a simulation study, and an application to personality data. We used different prior choices and factor dimensions to demonstrate the flexibility of the proposed method. This is joint work with
Henrique Bolfarine (USP), Carlos Carvalho (UT Austin) and Jared Murray (UT Austin).

25/08

Improvement of statistical learning models to increase efficiency in solving classification or regression problems is a goal pursued by the scientific community. Particularly, the support vector machine model has become one of the most successful algorithms for this task. Despite the strong predictive capacity from the support vector approach, its performance relies on the selection of hyperparameters of the model, such as the kernel function that will be used. The traditional procedures to decide which kernel function will be used are computationally expensive. In this presentation, we proposed a novel framework to deal with the kernel function selection called Random Machines. The results display an improvement in the predictive capacity as well as reduced computational time.

Assista à palestra no Youtube
04/08

Numa parceria entre o Observatório de Síndromes Respiratórias da UFPB e o Governo do Estado da Paraíba, foi realizado entre 3 de novembro e 22 de dezembro de 2020 o levantamento epidemiológico “Continuar Cuidando PB”, pesquisa por amostragem de domicílios, que coletou dados sociodemográficos e sobre sintomas, além de aplicar testes rápidos e do tipo RT-PCR para diagnóstico de COVID-19. A amostra total de 394 setores censitários foi estratificada em quatro macrorregiões do estado, e subdividida de forma balanceada nesses estratos para coleta ao longo de 8 semanas. Em cada setor censitário selecionado, foi aplicado um protocolo de coleta simplificado utilizando arrolamento dos domicílios e amostragem inversa com sorteio via amostragem de Bernoulli, com regra de parada definida em função do número de moradores testados com teste RT-PCR nos domicílios selecionados maior ou igual a 25. Esse protocolo permitiu a realização tempestiva da coleta, a liberação de resultados parciais a cada 2 semanas, e total sincronia entre os dados obtidos nos questionários e nos testes para COVID-19 realizados. Com a conclusão da pesquisa em fins de dezembro de 2020, a administração da saúde pública na Paraíba teve acesso a indicadores que permitiram tomar melhores decisões sobre políticas públicas relativas à gestão da pandemia e informar à sociedade sobre a evolução da doença naquele estado. A coleta de dados foi realizada pela SCIENCE em parceria com equipes das secretarias municipais e estadual de saúde, que ficaram responsáveis pela aplicação dos testes rápidos e RT-PCR, pela devolução de resultados aos participantes testados, e pela oferta de serviços de cuidado a todos que testaram positivo para a doença. Todos os cuidados sanitários foram tomados para proteção da equipe de entrevistadores e profissionais de saúde envolvidos na coleta, bem como dos moradores dos domicílios selecionados que participaram da pesquisa.

Assista à palestra no Youtube
28/07

Numerical weather predictions (NWPs) are systematically subject to errors due to the deterministic solutions used by numerical models to simulate the atmosphere. Statistical postprocessing techniques are widely used nowadays for NWP calibration. However, time-varying bias is usually not accommodated by such models. The calibration performance is also sensitive to the temporal window used for training. This paper proposes space-time models that extend the main statistical postprocessing approaches to calibrate NWP model outputs. Trans-Gaussian random fields are considered to account for meteorological variables with asymmetric behavior. Data augmentation is used to account for censoring of the response variable. The benefits of the proposed extensions are illustrated through the calibration of hourly 10-meter height wind speed forecasts in Southeastern Brazil coming from the Eta model.

Assista à palestra no Youtube
21/07

The mixed-effects state space models (MESSM) can be considered as an alternative to study the HIV dynamic in a longitudinal data environment, defining the mixed-effects component into state-space models setup. As in Liu et al., 2011, we consider a hierarchical structure to capture possible differences between the immune systems for different patients. We extend MESSM, allowing observational errors to follow a more flexible distribution to take account for heavy tails. Our proposal consists in defining the error distribution of the observations using the hierarchical structure of the scale mixture of normal distributions. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. Under the Bayesian paradigm, an efficient Markov Chain Monte Carlo (MCMC) algorithm is implemented. To evaluate the properties of the proposed models, we carried out simulation studies. Finally, we illustrate our approach with an application in real HIV longitudinal data.

Assista à palestra no Youtube