# Workshop on Theory for Scalable, Modern Statistical Methods

The workshop aims to bring together researchers working on new directions in modern statistical problems, including scalable computation (both Bayesian and frequentist), uncertainty quantification, high-dimensional structured models, manifold estimation, non-linear inverse problems, causality, deep learning,… etc.

Location: Bocconi University, Milano
Time: 5th-7th of April, 2023
Sponsor: The workshop is sponsored by the ERC Starting Grant (nr: 101041064) on the project: “BigBayesUQ: The missing story of Bayesian uncertainty quantification for big data”.

Speakers:
Sergios Agapiou (Cyprus)
Jan Bohr (Bonn)
Ismael Castillo (Sorbonne)
Chao Gao (Chicago)
Matteo Giordano (Oxford)
Marc Hoffmann (Paris Dauphine)
Richard Nickl (Cambridge)
Dennis Nieman (VU Amsterdam)
Ieva Kazlauskaite (Cambridge)
Alice L’Huillier (Sorbonne)
Omiros Papaspiliopoulos (Bocconi)
Thibault Randrianarisoa (Bocconi)
Paul Rosa (Oxford)
Judith Rousseau (Oxford)
Bernhard Stankewitz (Bocconi)
Lasse  Vuursteen (Delft)

Schedule:

Wednesday (5th of April)
9:30-10:15 TBA
10:15-11:00 TBA
11:00-11:30 Coffee break
11:30-12:15 TBA
12:15-13:00 TBA
13:00-15:30 lunch break
15:30-16:15 TBA
16:15-17:00 TBA
17:00-17:30 Coffee break
17:30-18:15 TBA
18:30-19:30 Aperitivo

Thursday (6th of April)
9:30-10:15 TBA
10:15-11:00 TBA
11:00-11:30 Coffee break
11:30-12:15 TBA
12:15-13:00 TBA
13:00-15:30 lunch break
15:30-16:15 TBA
16:15-17:00 TBA
17:00-17:30 Coffee break
17:30-18:15 TBA
19:30-22:30 Conference dinner

Friday (7th of April)
9:30-10:15 TBA
10:15-11:00 TBA
11:00-11:30 Coffee break
11:30-12:15 TBA
12:15-13:00 TBA
13:00-14:00 lunch

Abstracts

•  Sergios Agapiou (Cyprus): Adaptive rates of contraction with heavy-talied priors” and the abstract follows

Abstract: We propose a new strategy for adaptation to smoothness based on heavy tailed priors. We illustrate it in a variety of settings, showing in particular that adaptation in the minimax sense (up to logarithmic factors) is achieved without tuning of hyperparameters. We present numerical simulations corroborating the theory. This is ongoing joint work with Ismaël Castillo.

• Jan Bohr (Bonn): A Bernstein-von-Mises theorem for the Calderón problem with piecewise constant conductivities

Abstract: The talk presents a finite dimensional statistical model for the Calderón problem with piecewise constant conductivities. In this setting one can consider a classical iid noise model and the injectivity of the forward map and its linearisation suffice to prove the invertibility of the information operator. This results in a BvM-theorem and optimality guarantees for estimation in Bayesian posterior means.

• Badr-Eddine Chérief-Abdellatif (Sorbonne): Bayes meets Bernstein in Meta Learning

Abstract: Bernstein assumption is a crucial assumption under which PAC-Bayes methods can learn $n$ observations at the fast rate of convergence $O\left(d_{\pi}/n\right)$, as opposed to the slow rate $O\left(\sqrt{d_{\pi}/n}\right)$ without it, where $d_{\pi}$ is a parameter which depends on the prior $\pi$. Coming to the process of learning $T$ tasks each composed of $n$ observations, meta learning takes advantage of the commonality of the $T$ tasks to learn more efficiently. In this talk, we will see that Bernstein assumption is always satisfied at the meta level (between the $T$ tasks) when learning the prior and therefore, that PAC-Bayes techniques achieve the fast rate of convergence $O\left(\inf_{\pi} d_{\pi}/n + 1/T\right)$ if Bernstein assumption is satisfied at the observation level (between the $n$ observations), and the rate $O\left(\inf_{\pi} \sqrt{d_{\pi}/n} + 1/T\right)$ otherwise, improving upon the existing $O\left(\inf_{\pi} \sqrt{d_{\pi}/n} + \sqrt{1/T}\right)$ rate.

• Chao Gao (Chicago): Detection and Recovery of Sparse Signal Under Correlation

Abstract: We study a p dimensional Gaussian sequence model with equicorrelated noise. In the first part of the talk, we consider detection of a signal that has at most s nonzero coordinates. Our result fully characterizes the nonasymptotic minimax separation rate as a function of the dimension p, the sparsity s and the correlation level. Surprisingly, not only does the order of the minimax separation rate depend on s, it also varies with p-s. This new phenomenon only occurs when correlation is present. In the second part of the talk, we consider the problem of signal recovery. Unlike the detection rate, the order of the minimax estimation rate has a dependence on p-2s, which is also a new phenomenon that only occurs with correlation. We also consider detection and recovery procedures that are adaptive to the sparsity level. While the optimal detection rate can be achieved adaptively without any cost, the optimal recovery rate can only be achieved in expectation with some additional cost.

• Matteo Giordano (Oxford): Bayesian nonparametric intensity estimation for inhomogeneous point processes with covariates

Abstract: In this work we study Bayesian nonparametric estimation of the intensity function of a spatial Poisson point process, in the case where the intensity depends on covariates and a single observation of the process is available. The presence of covariates allows to borrow information from far away locations in the domain, enabling consistent estimation in the growing domain asymptotics. In particular, we derive posterior concentration rates under both global and local  losses. The global rates are obtained under conditions on the prior distribution resembling those in the well established theory of Bayesian nonparametrics, here combined with suitable concentration inequalities for stationary processes to control certain random covariates-dependent losses. The local rates are instead derived with an ad-hoc analysis, exploiting recent advances in the theory of Polya-tree-like priors.

• Marc Hoffman (Dauphine): On estimating a multidimensional diffusion from discrete data

Abstract: We revisit an old problem: estimate non-parametrically the drift and diffusion coefficient of a diffusion process from discrete data $(X_0,X_D, X_{2D}, \ldots, X_{ND})$. The novelty are: (i) we work in a multivariate setting: only few results have been obtained in this case of discrete data (and, to the best of our knowledge, no results for the diffusion coefficient from high-frequency data) (ii) the sampling scheme is high frequency but arbitrarily slow: $D=D_N \rightarrow 0$ and $N_D_N^q$ bounded for some possibly arbitrarily large $q$ (à la Kessler) and (iii) the process lies in a (not necessarily convex, not necessarily bounded) domain in $\mathbb R^d$ with reflection at the boundary. (In particular we recover the case of a bounded domain or the whole Euclidean space $R^d$.) We conduct a (relatively) standard minimax — adaptive — program for integrated squared error loss over bounded domains (and more losses in the simpler case of the drift) over standard smoothness classes plus some other miscellanies. When $ND^2 \rightarrow 0$ and in the special case of the conductivity equation over a bounded domain, we actually obtain contraction rates in squared error loss in a nonparametric Bayes setting. The main difficulty here lies in controlling small ball probabilities for the likelihood ratios; we develop small time expansions of the heat kernel with a bit of Riemannian geometry to control adequate perturbations in KL divergence, using classical ideas of Azencott and others. Joint ongoing work with C. Amorino and C. Strauch. That last part is joint with K. Ray.

• Alice L’Huillier (Sorbonne): Semiparametric inference using fractional posteriors

Abstract: We establish a general Bernstein–von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a shifted-and-rescaled fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent.  It is a joint work with Luke Travis, Ismaël Castillo and Kolyan Ray.

• Richard Nickl (Cambridge): On Posterior consistency in non-linear data assimilation problems with Gaussian process priors

Abstract:  We discuss recent results about consistency of Gaussian process methods in two non-linear data assimilation problems. The first concerns inference on the diffusivity from multi-dimensional low frequency diffusion measurements, and the second inference on the initial condition in the 2-dimensional Navier Stokes equations.

• Dennis Nieman (VU Amsterdam): Α frequentist analysis of variational Gaussian process regression with inducing points

Abstract
: TBD

• Omiros Papaspiliopoulos (Bocconi): Nonparametric estimation of the marginal likelihood

Abstract: We consider the problem of exploring the marginal likelihood of low-dimensional hyperparameters from high-dimensional Bayesian hierarchical models. First, we provide a unified framework that connects seemingly unconnected approaches to estimating normalizing constants, including the Vardi estimator, the umbrella sampling and the Gibbs sampler. The framework requires Monte Carlo sampling from the posteriors of the high-dimensional parameters for different values of the hyperparameters on a lattice. We prove that the resultant estimators of the marginal likelihood are consistent both as the sampling gets large (for given lattice) but also as the lattice gets dense (for given Monte Carlo effort per lattice point). We then introduce a novel kernel method that allows an optimal bias-variance tradeoff.

• Paul Rosa (Oxford): Posterior contraction rates for stationary Gaussian processes priors on compact Lie groups and their homogeneous spaces

Abstract
: TBD

• Bernhard Stankewitz (Bocconi):  Early stopping for $$L^{2}$$-boosting in sparse high-dimensional linear models

Abstract: We consider $$L^{2}$$-boosting in a sparse high-dimensional linear model via orthogonal matching pursuit (OMP). For this greedy, nonlinear subspace selection procedure, we analyze a data-driven early stopping time $$\tau$$, which is sequential in the sense that its computation is based on the first $$\tau$$ iterations only. Our approach is substantially less costly than established model selection criteria, which require the computation of the full boosting path.
We prove that sequential early stopping preserves statistical optimality in this setting in terms of a general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. The proofs include a subtle $$\omega$$-pointwise analysis of a stochastic bias-variance trade-off, which is induced by the greedy optimization procedure at the core of OMP. Simulation studies show that, at a significantly reduced computational cost, these types of methods match or even exceed the performance of other state of the art algorithms such as the cross-validated Lasso or model selection via a high-dimensional Akaike criterion based on the full boosting path.

• Aad van der Vaart (Delft): Bayesian sensitivity analysis in causal analysis

Abstract: Causal inference is based on the assumption of “conditional exchangeability” . This is not verifiable based on the data when using nonparametric modelling. A “sensititvity analysis” considers the effect of deviations from the assumption. In a Bayesian framework we could put a prior on the size of the deviation and obtain an ordinary posterior. We review possible approaches and present some results comparing different ways of nonparametric modelling. (Based on joint with Stéphanie van der Pas and Bart Eggen.)