Ciclo de Palestras 2011 – 2º Semestre

Palestras do Departamento de Metodos Estatísticos - Instituto de Matemática - UFRJ

2° semestre de 2011
As palestras ocorreram no Auditório do Laboratório de Sistemas Estocásticos (LSE), sala I-044b, as 15:30 h, a menos de algumas exceções devidamente indicadas.


Latent factor models are a general class of graphical model with wide applications in to name a few examples topic modelling, recomender systems and signal processing.
In this talk I will talk about this class of model and discuss how fully Bayesian and semi-Bayesian approaches benefit from the exponential family representation. Finally I will talk about my recent work applying the online EM algorithm to this class of problem.

É bem conhecido que em sistemas subcríticos a correlação entre pontos distantes decresce exponencialmente com a distância. Nesta palestra consideraremos o processo de percolação de Bernoulli, em que elos da rede cúbica são independentemente abertos com probabilidade p, e fechados com probabilidade 1-p. Introduziremos uma linha de defeitos em que os elos são abertos com probabilidade p’ < p, e estudaremos o efeito de p' e da dimensão sobre o decaimento exponencial da fase subcrítica. Em particular pretendemos apresentar (de maneira não-técnica) a origem da influência de p' e a sua conexão com as propriedades de recorrência/transiência de um passeio aleatório com incrementos independentes.

Motor systems are exquisitely adapted to transform an action goal into the production of a movement of greatest fit in a given context. This transformation, called motor planning, is thought to be performed through internal models of actions. These models operate by continuously monitoring the motor output and by making future predictions of changes in body states and of the immediate environment. Because of the delays inherent to sensorimotor processing, the ability to predict the future state of the motor system in a variable environment (context) is considered crucial to create efficient movements and appropriate behaviors. In this colloquium I intend to explore how the brain activity associated with motor planning changes as a function of the context in which this movement shall be performed. Furthermore, I will present results showing that lesions in specific brain regions can affect the capacity of making predictions of one’s own and/or of other’s upcoming actions.


Consideramos o modelo de percolacao de primeira passagem em Z^d dado por v.a. i.i.d. de distribuição F. Sejam t_{pi}(u,v) o tempo pra passar de u a v pelo caminho pi e t(u,v) o minimo destes tempos considerando todos os caminhos de u a v. Perguntamos se existe pontos x e y e um caminho semi-infinito pi=(y_0=y,y_1,…) tal que t_{pi}(y,y_{n+1})


Consider a branching random walk on the real numbers, with offspring distribution Z and non-negative displacement distribution W. We say that explosion occurs if an infinite number of particles may be found within a finite distance of the origin. In this talk, we discuss the problem of characterising pairs (Z, W) for which explosion occurs a.s. In particular, in the case that the offspring distribution Z has a sufficiently heavy tail we give a necessary and sufficient condition on W for explosion to occur. Furthermore, we demonstrate that our condition on the tail is best possible for this equivalence to occur. Joint with Omid Amini, Luc Devroye and Neil Olver.

11/08 Colóquio institucional (excepcionalmente às 14:30 horas)

Modern Statistics is made of the sensible combination of direct evidence (the data directly relevant or the “individual data”) and indirect evidence (the data and knowledge indirectly relevant or the “group data”). The admissible procedures are a combination of the two sources of information, and the advance of technology is making indirect evidence more substantial and ubiquitous. It has been pointed out however, that in “borrowing strength” a fundamental problem of Statistics is to treat in a fundamentally different way exceptional cases, cases that do not adapt to the central “aurea mediocritas”. This is what has been recently coined as “the Clemente problem”, Efron (2010). In this article we put forward that the problem is caused by the simultaneous use of square loss function and conjugate (light tailed) priors which is the usual procedure. We propose in their place to use robust penalties, in the form of losses that penalize more severely huge errors, or (equivalently) priors of heavy tails which make more probable the exceptional. Using heavy tailed prior we can reproduce in a Bayesian way, Efron and Morris’ “limited translated estimators” (with Double Exponential Priors) and “discarding priors estimators” (with Cauchy-like priors) which discard the prior in the presence of conflict. Both Empirical Bayes and Full Bayes approaches are able to alleviate the Clemente Problem and furthermore beat the James- Stein estimator in terms of smaller square errors, for sensible Robust Bayes priors. We model in parallel Empirical Bayes and Fully Bayesian hierarchical models, illustrating that the differences among sensible versions of both are minute, as compared with the effect due to the robust assumptions. We also propose a heavy tailed Beta2 distribution for variances that arises naturally as an alternative to the usual Inverted-Gamma distribution. This adds stability and robustness, and strickenly produce a marginal for the location, which is the first known “Horseshoe” (optimal) prior in closed analytical form. This has been put recently to the test Fuquene, Perez and Pericchi (2011) in a Dynamical Bayesian model for detection of structural changes and outliers.

We consider estimation of scalar functions which determine the dynamics of diffusion processes. It has been recently shown that nonparametric maximum likelihood is ill-posed in this context. We adopt a probabilistic approach to regularize the problem by the adoption of a prior distribution for the unknown functional. A Gaussian prior measure is specified in the function space by means of its precision operator, which is defined as an appropriate differential operator. We establish that a Bayesian Gaussian conjugate analysis for the drift of one-dimensional non-linear diffusions is feasible given high-frequency data. This is achieved by expressing the log-likelihood as a quadratic function of the drift, with sufficient statistics given by the so-called local time process and the end points of the observed path. Computationally efficient posterior inference is carried out using a finite element method.
We embed this technology in partially observed situations and adopt a data augmentation approach whereby we iteratively generate missing data paths and draws from the unknown functional. Our methodology is applied to estimate the drift of models used in molecular dynamics and financial econometrics using high and low frequency observations. We discuss extensions to other partially observed schemes and connections to other types of non-parametric inference.
Joint work with Yvo Pokern (UCL), Gareth O. Roberts (Warwick) and Andrew M. Stuart (Warwick)


A general methodology is presented for the construction and effective use of control variates for reversible MCMC samplers. The values of the coefficients of the optimal linear combination of the control variates are computed, and adaptive, consistent MCMC estimators are derived for these optimal coefficients. All methodological and asymptotic arguments are rigorously justified. Numerous MCMC simulation examples from Bayesian inference applications demonstrate that the resulting variance reduction can be quite dramatic.

06/07 (excepcionalmente no Auditório do LSE, sala I-044b)

Neste trabalho desenvolvemos um algoritmo MCMC para estimar os parâmetros de um modelo TRI com distribuição normal assimétrica centralizada para os traços latentes, proposto por Azevedo, Bolfarine e Andrade (2011). Consideramos uma representação estocástica bastante utilizada para a distribuição normal assimétrica, a fim de facilitar o desenvolvimento e implementação do referido algoritmo. Mecanismos de verificação e validação do modelo são desenvolvidos, também no contexto Bayesiano. Estudos de simulação indicam que o algoritmo desenvolvido estima tão bem ou melhor os parâmetros, em relação ao algoritmo desenvolvido anteriormente por Azevedo, Bolfarine e Andrade (2011), em diversas situações. Além disso, o algoritmo proposto no presente trabalho é mais rápido que seu predecessor. Um conjunto de dados da área educacional é analisado para ilustrar a metodologia, algoritmo de estimação e ferramentas de validação do modelo, desenvolvidos.

12/12 (excepcionalmente uma 2ª feira)

It is argued by prominent epidemiologists that assessment of interaction should be based on departures from additive rates or risks. Unfortunately, in case-control studies the corresponding “fundamental interaction parameter” can usually not be estimated. To overcome this problem, epidemiologists have proposed surrogate measures of interaction based on relative risks from logistic models. In this talk we investigate the performance of these measures in practice, where covariates must be included to control for confounding. We uncover two fundamental problems with the advocated approach and suggest an approach that rectifies the problems.


Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. We measured allele-specific gene expression (ASE) in a diploid hybrid of two diverse Saccharomyces cerevisiae strains using RNA-Seq. To capitalize on the wealth of information contained in RNA-Seq data sets, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about ASE. We show that we are able to accurately quantify levels of ASE with specified false discovery rates, achieving high reproducibility between independent sequencing platforms. A key feature of the model is the use of additional genomic DNA data to calibrate the null model. We pinpoint loci that show unusual and biologically interesting patterns, including allele-specific alternative splicing and allele-specific transcription start sites. Joint work with Dan Skelly and Josh Akey.

11/11 (excepcionalmente uma 6ª feira às 13:30 horas)

A subpopulation or domain is called a small area if the area-specific sample size is small or even zero. Traditional area specific direct estimators of means are not suited for small areas and it is necessary to use indirect estimators that borrow strength across related areas. Small area estimation has been extensively studied under linking models based on linear mixed models. Empirical best linear unbiased prediction (EBLUP) estimators of small area means have been developed along with nearly unbiased estimators of mean squared errors. However, EBLUP estimators can be sensitive to outliers. In this talk, I will first present a robust EBLUP type method for small area estimation and demonstrate its advantage over the customary EBLUP under unit level nested error linear regression models in the presence of outliers in the random small area effects and/or unit level errors. I will also study a bootstrap method of estimating the mean squared error of the robust EBLUP type estimator. Secondly, I will relax the assumption of linear regression model for the fixed part of the linear mixed model and replace it by the weaker assumption of a penalized-spline regression model and develop robust EBLUP type estimators of small area means in the presence of outliers in the random small area effects and /or unit level errors. I will also discuss bootstrap estimators of mean squared error. Simulation results and applications to real data will also be presented.

21/09 Colóquio Inter-institucional "Modelos Estocásticos e Aplicações" (excepcionalmente às 14:00 horas na UFF)

In this talk the following research topics will be discussed.
Robust estimation: It is well-known that the sample autocovariance is not robust to the presence of additive outliers. Hence, the definition of an autocovariance estimator which is robust to additive outlier can be very useful for time-series modeling. The robust autocovariance estimator proposed by Ma and Genton (2000) is studied and applied to time series with different correlation structures such as short and long memory. Based on the robust autocorrelation function, a robust estimator of the parameter d in ARFIMA(p, d, q) is proposed. Some simulations are used to support the use of this method when a time series has additive outliers.
DF unit root test based on ranks: In this subject, the classical Dickey-Fuller (DF) test will be studied in the context of unit root time series with outliers. Based on the ranks of the observations, a robust DF test is proposed. The test is robust against outliers observations. The asymptotic distribution of the test is obtained.
Counting process: The Integer-valued Autoregressive Moving Average (INARMA) models have suggested modeling observed count time series. This research is concerned with the problem of modeling INAR processes under seasonal, unit root and long memory properties.

O teorema ergódico subaditivo foi inicialmente provado por Kingman. Ele dá condições suficientes para convergência quase certa de X_n/n onde {X_n} é uma sequência subaditiva de variáveis aleatórias. Veremos como uma versão um pouco mais geral deste teorema permite deduzir resultados de interesse para a percolação de primeira passagem e para alguns sistemas de partículas unidimensionais, como o processo de contato ou o processo de exclusão.