Workshop on Theory for Scalable, Modern Statistical Methods

The workshop aims to bring together researchers working on new directions in modern statistical problems, including scalable computation (both Bayesian and frequentist), uncertainty quantification, high-dimensional structured models, manifold estimation, non-linear inverse problems, causality, deep learning,… etc.

Location: Bocconi University, Milano, Room AS01 (via Rontgen 1, -2nd floor)
Time: 5th-7th of April, 2023
Sponsor: The workshop is sponsored by the ERC Starting Grant (nr: 101041064) on the project: “BigBayesUQ: The missing story of Bayesian uncertainty quantification for big data”.

Schedule:

Wednesday (5th of April)
9:00-9:25 arrival & registration
9:25-9:30 opening
9:30-10:15 Ismael Castillo (Sorbonne): Trade-offs for multiple testing and classification of sparse vectors
10:15-11:00 Aad van der Vaart (Delft): Bayesian sensitivity analysis in causal analysis
11:00-11:30 Coffee break
11:30-12:15 Bernhard Stankewitz (Bocconi): Early stopping for $ L^{2} $-boosting in sparse high-dimensional linear models
12:15-13:00 Lasse Vuursteen (Delft): The cost of privacy and bandwidth constraints in nonparametric distributed hypothesis testing
13:00-15:30 lunch break
15:30-16:15 Paul Rosa (Oxford): Manifold-adaptive regression: insights from the behaviour of Matérn processes
16:15-17:00 Badr-Eddine Chérief-Abdellatif (Sorbonne): Bayes meets Bernstein in Meta Learning
17:00-17:30 Coffee break
17:30-18:15 Omiros Papaspiliopoulos (Bocconi): Nonparametric estimation of the marginal likelihood
18:30-19:30 Aperitivo (GT Bistrot Cafe, Via Guglielmo Röntgen, 6)

Thursday (6th of April)
9:30-10:15 Richard Nickl (Cambridge): On Posterior consistency in non-linear data assimilation problems with Gaussian process priors
10:15-11:00 Jan Bohr (Bonn): A Bernstein-von-Mises theorem for the Calderón problem with piecewise constant conductivities
11:00-11:30 Coffee break
11:30-12:15 Marc Hoffman (Dauphine): On estimating a multidimensional diffusion from discrete data
12:15-13:00 Ieva Kazlauskaite (Cambridge): Variational Bayesian inference for PDE based inverse problems
13:00-15:30 lunch break
15:30-16:15 Judith Rousseau (Oxford): Bayesian targeted inference in semiparametric models
16:15-17:00 Alice L’Huillier (Sorbonne): Semiparametric inference using fractional posteriors
17:00-17:30 Coffee break
17:30-18:15 Chao Gao (Chicago): Detection and Recovery of Sparse Signal Under Correlation
20:00-22:30 Conference dinner (restaurant El Brellin)

Friday (7th of April)
9:30-10:15 Giacomo Zanella (Bocconi): Complexity of coordinate-wise MCMC through Bayesian Asymptotics
10:15-11:00 Sergios Agapiou (Cyprus): Adaptive rates of contraction with heavy-tailed priors
11:00-11:30 Coffee break
11:30-12:15 Thibault Randrianarisoa (Bocconi): Deep Horseshoe Gaussian processes
12:15-13:00 Matteo Giordano (Oxford): Bayesian nonparametric intensity estimation for inhomogeneous point processes with covariates
13:00-14:00 lunch

Abstracts

Sergios Agapiou (Cyprus): Adaptive rates of contraction with heavy-talied priors

Abstract: We propose a new strategy for adaptation to smoothness based on heavy tailed priors. We illustrate it in a variety of settings, showing in particular that adaptation in the minimax sense (up to logarithmic factors) is achieved without tuning of hyperparameters. We present numerical simulations corroborating the theory. This is ongoing joint work with Ismaël Castillo.
Jan Bohr (Bonn): A Bernstein-von-Mises theorem for the Calderón problem with piecewise constant conductivities

Abstract: The talk presents a finite dimensional statistical model for the Calderón problem with piecewise constant conductivities. In this setting one can consider a classical iid noise model and the injectivity of the forward map and its linearisation suffice to prove the invertibility of the information operator. This results in a BvM-theorem and optimality guarantees for estimation in Bayesian posterior means.
Ismael Castillo (Sorbonne): Trade-offs for multiple testing and classification of sparse vectors

Abstract: We study multiple testing when the loss is the sum of False Discovery Rate (FDR) and False Negative Rate (FNR). We derive a necessary and sufficient condition for arbitrary sparse signals to be testable in the above sense at a given level, as well as the sharp minimax testing constant as a function of the arbitrary signal intensities. The results hold for a variety of models (location and scale) and noise distributions, as well as for the (normalised) classification loss. We also discuss two procedures to achieve adaptation: an empirical Bayes ell-value procedure, and the BH procedure with properly tuned target level.
When all signals are above the previously determined testing boundary, we also study optimal rates of decrease to zero of FDR+FNR, and of the probability of missclassification, in Gaussian and Subbotin noise.
This is joint work with Kweku Abraham and Étienne Roquain.
Badr-Eddine Chérief-Abdellatif (Sorbonne): Bayes meets Bernstein in Meta Learning

Abstract: Bernstein assumption is a crucial assumption under which PAC-Bayes methods can learn $n$ observations at the fast rate of convergence $O\left(d_{\pi}/n\right)$, as opposed to the slow rate $O\left(\sqrt{d_{\pi}/n}\right)$ without it, where $d_{\pi}$ is a parameter which depends on the prior $\pi$. Coming to the process of learning $T$ tasks each composed of $n$ observations, meta learning takes advantage of the commonality of the $T$ tasks to learn more efficiently. In this talk, we will see that Bernstein assumption is always satisfied at the meta level (between the $T$ tasks) when learning the prior and therefore, that PAC-Bayes techniques achieve the fast rate of convergence $O\left(\inf_{\pi} d_{\pi}/n + 1/T\right)$ if Bernstein assumption is satisfied at the observation level (between the $n$ observations), and the rate $O\left(\inf_{\pi} \sqrt{d_{\pi}/n} + 1/T\right)$ otherwise, improving upon the existing $O\left(\inf_{\pi} \sqrt{d_{\pi}/n} + \sqrt{1/T}\right)$ rate.
Chao Gao (Chicago): Detection and Recovery of Sparse Signal Under Correlation

Abstract: We study a p dimensional Gaussian sequence model with equicorrelated noise. In the first part of the talk, we consider detection of a signal that has at most s nonzero coordinates. Our result fully characterizes the nonasymptotic minimax separation rate as a function of the dimension p, the sparsity s and the correlation level. Surprisingly, not only does the order of the minimax separation rate depend on s, it also varies with p-s. This new phenomenon only occurs when correlation is present. In the second part of the talk, we consider the problem of signal recovery. Unlike the detection rate, the order of the minimax estimation rate has a dependence on p-2s, which is also a new phenomenon that only occurs with correlation. We also consider detection and recovery procedures that are adaptive to the sparsity level. While the optimal detection rate can be achieved adaptively without any cost, the optimal recovery rate can only be achieved in expectation with some additional cost.
Matteo Giordano (Oxford): Bayesian nonparametric intensity estimation for inhomogeneous point processes with covariates

Abstract: In this work we study Bayesian nonparametric estimation of the intensity function of a spatial Poisson point process, in the case where the intensity depends on covariates and a single observation of the process is available. The presence of covariates allows to borrow information from far away locations in the domain, enabling consistent estimation in the growing domain asymptotics. In particular, we derive posterior concentration rates under both global and local losses. The global rates are obtained under conditions on the prior distribution resembling those in the well established theory of Bayesian nonparametrics, here combined with suitable concentration inequalities for stationary processes to control certain random covariates-dependent losses. The local rates are instead derived with an ad-hoc analysis, exploiting recent advances in the theory of Polya-tree-like priors.
Marc Hoffman (Dauphine): On estimating a multidimensional diffusion from discrete data

Abstract: We revisit an old problem: estimate non-parametrically the drift and diffusion coefficient of a diffusion process from discrete data $(X_0,X_D, X_{2D}, \ldots, X_{ND})$. The novelty are: (i) we work in a multivariate setting: only few results have been obtained in this case of discrete data (and, to the best of our knowledge, no results for the diffusion coefficient from high-frequency data) (ii) the sampling scheme is high frequency but arbitrarily slow: $D=D_N \rightarrow 0$ and $N_D_N^q$ bounded for some possibly arbitrarily large $q$ (à la Kessler) and (iii) the process lies in a (not necessarily convex, not necessarily bounded) domain in $\mathbb R^d$ with reflection at the boundary. (In particular we recover the case of a bounded domain or the whole Euclidean space $R^d$.) We conduct a (relatively) standard minimax — adaptive — program for integrated squared error loss over bounded domains (and more losses in the simpler case of the drift) over standard smoothness classes plus some other miscellanies. When $ND^2 \rightarrow 0$ and in the special case of the conductivity equation over a bounded domain, we actually obtain contraction rates in squared error loss in a nonparametric Bayes setting. The main difficulty here lies in controlling small ball probabilities for the likelihood ratios; we develop small time expansions of the heat kernel with a bit of Riemannian geometry to control adequate perturbations in KL divergence, using classical ideas of Azencott and others. Joint ongoing work with C. Amorino and C. Strauch. That last part is joint with K. Ray.
Alice L’Huillier (Sorbonne): Semiparametric inference using fractional posteriors

Abstract: We establish a general Bernstein–von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a shifted-and-rescaled fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent. It is a joint work with Luke Travis, Ismaël Castillo and Kolyan Ray.
Ieva Kazlauskaite (Cambridge): Variational Bayesian inference for PDE based inverse problems

Abstract: In this talk I will discuss inference in PDE based Bayesian inverse problems and present our recent work on variational inference as an alternative to MCMC for this class of problems. In this work, we propose a family of Gaussian trial distributions parametrised by precision matrices, taking advantage of the inherent sparsity of the inverse problem encoded in its finite element discretisation. We utilise stochastic optimisation to efficiently estimate the variational objective and provide an empirical assessment of the performance. Furthermore, I will mention some recent work that utilises physics-informed neural network as an alternative to the classical finite element solvers and illustrate how these can be used in PDE based forward and inverse problems.
Richard Nickl (Cambridge): On Posterior consistency in non-linear data assimilation problems with Gaussian process priors

Abstract: We discuss recent results about consistency of Gaussian process methods in two non-linear data assimilation problems. The first concerns inference on the diffusivity from multi-dimensional low frequency diffusion measurements, and the second inference on the initial condition in the 2-dimensional Navier Stokes equations.
Omiros Papaspiliopoulos (Bocconi): Nonparametric estimation of the marginal likelihood

Abstract: We consider the problem of exploring the marginal likelihood of low-dimensional hyperparameters from high-dimensional Bayesian hierarchical models. First, we provide a unified framework that connects seemingly unconnected approaches to estimating normalizing constants, including the Vardi estimator, the umbrella sampling and the Gibbs sampler. The framework requires Monte Carlo sampling from the posteriors of the high-dimensional parameters for different values of the hyperparameters on a lattice. We prove that the resultant estimators of the marginal likelihood are consistent both as the sampling gets large (for given lattice) but also as the lattice gets dense (for given Monte Carlo effort per lattice point). We then introduce a novel kernel method that allows an optimal bias-variance tradeoff.
Thibault Randrianarisoa (Bocconi): Deep Horseshoe Gaussian processes

Abstract: This work is concerned with the study of theoretical properties of deep Gaussian processes, which have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and to use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep–HGP, a new prior based on deep Gaussian processes with squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distributions recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor. At the same time, Deep–HGP are conceptually quite simple to construct. One main idea is that the horseshoe prior enables simultaneous adaptation to both smoothness and structure.
Paul Rosa (Oxford): Manifold-adaptive regression: insights from the behaviour of Matérn processes

Abstract: We study the asymptotic behaviour of Matérn process regression on manifolds under frequentists assumptions. In the first part we introduce the Laplace-Beltrami operator which allows us to define an analog of the Euclidean Matérn processes and Besov spaces on manifolds. In the second part we investigate the behaviour of the restriction of a Euclidean Matérn process on a (possibly unknown) submanifold, on which the covariates are assumed to be located, and we use trace and extension operators between ambient and manifold Sobolev spaces to account for the non-Euclidean geometry. Interestingly we find that the asymptotic posterior contraction rate is actually the same for a large class of functional spaces, indicating that an ambient Matérn process can automatically adapt to submanifolds in terms of asymptotic rate of contration, suggesting the need for a further investigation in order to quantify the differences between the two methods. This is a joint work with Viacheslav Borovitskiy and Alexander Terenin.
Judith Rousseau (Oxford): Bayesian targeted inference in semiparametric models

Abstract: TBD
Bernhard Stankewitz (Bocconi): Early stopping for $ L^{2} $-boosting in sparse high-dimensional linear models

Abstract: We consider $ L^{2} $-boosting in a sparse high-dimensional linear model via orthogonal matching pursuit (OMP). For this greedy, nonlinear subspace selection procedure, we analyze a data-driven early stopping time $ \tau $, which is sequential in the sense that its computation is based on the first $ \tau $ iterations only. Our approach is substantially less costly than established model selection criteria, which require the computation of the full boosting path.
We prove that sequential early stopping preserves statistical optimality in this setting in terms of a general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. The proofs include a subtle $ \omega $-pointwise analysis of a stochastic bias-variance trade-off, which is induced by the greedy optimization procedure at the core of OMP. Simulation studies show that, at a significantly reduced computational cost, these types of methods match or even exceed the performance of other state of the art algorithms such as the cross-validated Lasso or model selection via a high-dimensional Akaike criterion based on the full boosting path.
Aad van der Vaart (Delft): Bayesian sensitivity analysis in causal analysis

Abstract: Causal inference is based on the assumption of “conditional exchangeability” . This is not verifiable based on the data when using nonparametric modelling. A “sensititvity analysis” considers the effect of deviations from the assumption. In a Bayesian framework we could put a prior on the size of the deviation and obtain an ordinary posterior. We review possible approaches and present some results comparing different ways of nonparametric modelling. (Based on joint with Stéphanie van der Pas and Bart Eggen.)
Lasse Vuursteen (Delft): The cost of privacy and bandwidth constraints in nonparametric distributed hypothesis testing

Abstract: Distributed testing is concerned with settings where data concerning the same hypothesis is distributed over multiple locations or servers. We consider settings in which the data is processed locally, before it is sent to a central location where it is aggregated and the final inference (test outcome) is obtained. The communication of the local servers to the central one is restricted: they cannot transmit the full data due to privacy and / or bandwidth constraints.
Distributed testing turns out to be subject to fundamentally different phenomena than distributed estimation. This talk covers minimax optimal testing in nonparametric setting under differential privacy and bandwidth constraints. Specifically, the focus will be on the questions: What is the cost of privacy and / or bandwidth restrictions? Which constraint is more stringent? And is adaptation possible? The talk is based on joint work with T. Tony Cai, Abhinav Chakraborty, Botond Szabo and Harry van Zanten.
Giacomo Zanella (Bocconi): Complexity of coordinate-wise MCMC through Bayesian Asymptotics

Abstract: Coordinate-wise MCMC algorithms (e.g. Gibbs samplers, Metropolis-within-Gibbs and variants) are popular algorithms to approximate posterior distributions arising from Bayesian hierarchical models. Despite their popularity and good empirical performances, however, there are still relatively few quantitative theoretical results on their scalability or lack thereof, e.g. much less than for gradient-based sampling methods. We introduce a novel technique to analyse the asymptotic behaviour of mixing times of coordinate-wise MCMC, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs-type schemes under random data-generating assumptions, for a broad class of two-level models with generic likelihood function.

If you have question please contact Botond Szabo (botond[dot]szabo[at]unibocconi[dot]it)