# Séminaire de Statistique

## M. Rosenbaum

Lundi / Monday 14h - 15h15 – SALLE 3001 ENSAE

 Septembre 16 2019 Yury Poliansky MIT Smoothed Empirical Measures and Entropy Estimation Abstract: In this talk we discuss behavior of the empirical measure P_n corresponding to iid samples from a distribution P on a d-dimensional space. Let Q_n and Q denote the result of convolving P_n and P, respectively, with an isotropic standard Guassian kernel. We discuss convergence of the p-Wasserstein, KL and other distances between Q_n and Q. Curiously, for some distances (like 1-Wasserstein) we get parametric 1/sqrt(n) speed of convergence regardless of dimension, whereas for some other distances (like 2-Wasserstein) the 1/sqrt(n) rate can change to \omega(1/sqrt(n)). We give an if and only if characterization in the class of subgaussian P for the parametric rate. As an application, we show that differential entropy of Q_n converges to that of Q at parametric rate 1/sqrt(n) regardless of dimension. An estimator of differential entropy of Q, in turn, allows us to estimate the input-output mutual information in noisy neural networks. Joint work with Ziv Goldfeld, Kristjan Greenewald and Jonathan Weed. Sept 23 2019 Matthieu Lerasle ENSAE Pair Matching as a tricky bandit problem Abstract: The pair-matching problem appears in many applications where one wants to discover good matches between pairs of individuals. The set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represents the good matches. A pair matching algorithm queries sequentially pairs of nodes and observes the presence/absence of edges. Its goal is to discover as many edges as possible with a fixed budget of queries.  Pair-matching is a particular instance of multi-armed bandit problem in which the arms are pairs of individuals and the rewards are edges linking these pairs. This bandit problem is non- standard since each arm can only be played once.   Given this last constraint, sublinear regret can be expected only if the graph presents some underlying structure. In the talk, I will show that sublinear regret is achievable in the case where the graph is generated according to a stochastic block model with two communities. Optimal regret bounds are computed for this pair-matching problem. They exhibit a phase transition related to the Kesten-Stigund threshold for community detection in the stochastic block model.  To limit the concentration of queried pairs, the pair-matching problem is also considered in the case where each node is constrained to be sampled less than a given amount of times. I will show how this constraint deteriorates optimal regret rates. This talk is based on the submitted paper arXiv:1905.07342. Sept 30 2019 Alexandra Carpentier University of Magdeburg Oct 7 2019 Vianney Perchet ENSAE Oct 14 2019 Séminaire IHP Oct 21 2019 Pas de séminaire Oct 28 2019 Pas de séminaire Nov 4 2019 Richard Samworth University of Cambridge High-dimensional principal component analysis with heterogeneous missingness Abstract: We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence. However, deeper investigation reveals both that, particularly in more realistic settings where the missingness mechanism is heterogeneous, the empirical performance of the IPW estimator can be unsatisfactory, and moreover that, in the noiseless case, it fails to provide exact recovery of the principal components. We therefore introduce a new method for high-dimensional PCA, called primePCA', that is designed to cope with situations where observations may be missing in a heterogeneous manner. Starting from the IPW estimator, primePCA iteratively projects the observed entries of the data matrix onto the column space of our current estimate to impute the missing entries, and then updates our estimate by computing the leading right singular space of the imputed data matrix. It turns out that the interaction between the heterogeneity of missingness and the low-dimensional structure is crucial in determining the feasibility of the problem. This leads us to impose an incoherence condition on the principal components and we prove that in the noiseless case, the error of primePCA converges to zero at a geometric rate when the signal strength is not too small. An important feature of our theoretical guarantees is that they depend on average, as opposed to worst-case, properties of the missingness mechanism. Joint work with Ziwei Zhu and Tengyao Wang Nov 11 2019 Férié Nov 18 2019 Nov 25 2019 Séminaire IHP Dec 2 2019 Henry Reeve University of Birmingham Dec 9 2019 Gilles Blanchard Dec 16 2019 Pas de séminaire Jan 6 2020 Jan 13 2020 Jan 20 2020 Séminaire IHP Jan 27 2020 Feb 3 2020 Feb 10 2020 Feb 17 2020 Séminaire IHP Feb 24 2020 March 2 2020 Richard Nickl University of Cambridge March 9 2020 March 16 2020 Séminaire IHP March 23 2020 March 30 2020 April 6 2020 April 13 2020 April 20 2020 April 27 2020 Séminaire IHP May 2020 June  29 2020 Séminaire IHP

Lundi / Monday 14h - 15h15 – SALLE 3001 ENSAE

 Sept 10 2018 Alexander Meister University of Rostock Title: Nonparametric density estimation for intentionally corrupted functional data Abstract: We consider statistical models where functional data are artificially contaminated by independent Wiener processes in order to satisfy privacy constraints. We show that the corrupted observations have a Wiener density which determines the distribution of the original functional random variables uniquely, and we construct a nonparametric estimator of that density. We derive an upper bound for its mean integrated squared error which  has a polynomial convergence rate, and we establish an asymptotic lower bound on the minimax convergence rates which is close to the rate attained by our estimator. Our estimator requires the choice of a basis and of two smoothing parameters. We propose data-driven ways of choosing them and prove that the asymptotic quality of our estimator is not significantly affected by the empirical parameter selection. We examine the numerical performance of our method via simulated examples. This talk is based on a joint work with Aurore Delaigle (University of Melbourne, Australia). Sept 17 2018 Christophe Giraud University Paris Sud Title: Partial recovery bounds for clustering with (corrected) relaxed Kmeans (1/2) Abstract: We will explain why, in a clustering context, Kmeans must and can be debiased. We will then discuss how a convex relaxation of the corrected Kmeans can be applied in various setting (including mixture of sub-Gaussian distribution, SBM, Ising-Block model, etc) and we will provide some optimal exponential bounds in terms of partial recovery of the clusters. Hence, the relaxed (corrected) Kmeans appears to be a versatile clustering tool, with the nice feature to have a single tuning parameter (the number K of clusters). Based on a joint work with Nicolas Verzelen. Sept 24 2018 Séminaire Parisien de Statistique IHP Oct 1 2018 Christophe Giraud University Paris Sud Title: Partial recovery bounds for clustering with (corrected) relaxed Kmeans (2/2) Oct 8 2018 Stefan Wager Stanford University Title: Machine Learning for Causal Inference   Abstract : Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this talk, I will discuss general principles for estimating heterogeneous treatment effects in observational studies via loss minimization, and then present a random forest algorithm that builds on these principles. As established both formally and empirically, the proposed approach is an order of magnitude more robust to confounding that direct regression-based baselines. Oct 15 2018 Mathias Drtn Université de Copenhague Title : Causal discovery in linear non-Gaussian models   Abstract: We consider the problem of inferring the causal graph underlying a structural equation model from an i.i.d. sample.  It is well known that this graph is identifiable only under special assumptions. We consider one such set of assumptions, namely, linear structural equations with non-Gaussian errors, and discuss inference of the causal graph in high-dimensional settings as well as in the presence of latent confounders. Joint work with Y. Samuel Wang. Oct 18 et 19, 2018   14 – 16 :30   Salle 2043 Anatoly Judtsky Université Grenoble-Alpes   Cours OFPR –1 et 2 Title: Statistical Estimation via Convex Optimization Abstract: When speaking about links between Statistics and Optimization, what comes to mind first is the indispensable role played by optimization algorithms in the “numerical toolbox” of Statistics. The goal of this course is to present another type of links between Optimization and Statistics. We are speaking of situations where Optimization theory (theory, not algorithms!) is of methodological value in Statistics, acting as the source of statistical inferences. We focus on utilizing Convex Programming theory, mainly due to its power, but also due to the desire to end up with inference routines reducing to solving convex optimization problems and thus implementable in a computationally efficient fashion. The topics we consider are: ·        As a starter, we consider estimation of a linear functional of unknown “signal” (a signal in the “usual sense,” a distribution, or an intensity of a Poisson process, etc). We also discuss the problem of estimating quadratic functional by “lifting” linear functional estimates. As an application, we consider a signal recovery procedure – “polyhedral estimate” – which relies upon efficient estimation of linear functionals. ·        Next, we turn to general problem of linear estimation of signals from noisy observations of their linear images. Here application of Convex Optimization allows to propose provably optimal (or nearly so) estimation procedures. The exposition does not require prior knowledge of Statistics and Optimization; as far as these disciplines are concerned, all necessary for us facts and concepts are introduced before being used. The actual prerequisites are elementary Calculus, Probability, Linear Algebra and (last but by far not least) general mathematical culture. Oct 22 2018 Vacances Oct 29 2018 Séminaire Parisien de Statistique IHP Nov 5 2018 Pas de séminaire Nov 7 et 9 2018 14 – 16 :30 Salle 2043 Anatoly Judtsky Université Grenoble-Alpes   Cours OFPR –3 et 4 Title: Statistical Estimation via Convex Optimization Nov 12 2018 Karim Lounici Ecole Polytechnique Title: On-line PCA: non-asymptotics statistical guarantees for the Krasulina scheme Principal Component Analysis is a popular method used to analyse the covariance structure $\Sigma$ of a random vector. Recent results on the statistical properties of standard PCA have highlighted the importance of the effective rank as a measure of the intrinsic statistical complexity in the PCA problem. In particular, optimal rates of estimation of the spectral projectors have been established in the offline setting where all the observations are available at once and a batch estimation method is implemented. In the online setting, observations arrive in a stream and our estimate of eigenvalues and spectral projectors are updated every time a new observation is available. This problem has attracted a lot a attention recently but little is known on the statistical properties of the existing methods. In this work, we consider the Krasulina scheme (stochastic gradient ascent scheme) and establish non-asymptotic estimation bounds in probability for the spectral projectors. For this method, the effective rank also plays a central role in the performance of the method, however the obtained rate is slower than that obtained in the offline setting. Nov 19 2018 Séminaire Parisien de Statistique IHP Nov 20 2018 Mardi/Tuesday Guenther Walther Stanford University Title : Constructing an optimal confidence set for a distribution function with applications to visualizing data Abstract: We show how to construct a confidence set of distribution functions that have the same optimal estimation performance as the empirical distribution function, but which may be much simpler, that is the distribution functions may have many fewer jumps. We show how an accelerated dynamic program can be applied to a relaxed optimization problem in order to find the distribution function in the confidence set that has the fewest jumps. We propose to use this distribution function for summarizing and visualizing the data via a histogram, as this distribution function is the simplest function that optimally addresses the two main tasks of the histogram: estimating probabilities and detecting features such as increases and (anti)modes in the distribution. We will illustrate our methodology with examples. This is joint work with Housen Li, Axel Munk, and Hannes Sieling. Nov 26 2018 Stanislas Minsker University of Southern California Median-of-means estimator and its generalizations New results are presented for the class of estimators obtained via the generalization of the median-of-means (MOM) technique. These results stem from connections between performance of MOM-type estimators and the rates of convergence in normal approximation. We provide tight non-asymptotic deviations guarantees in the form of exponential concentration inequalities, as well as asymptotic results in the form of limit theorems. Finally, we will discuss multivariate extensions. Dec 3 2018 Nabil Mustafa ESIEE Paris Title : Sampling in Geometric Configurations In this talk I will present recent progress on the problem of sampling in geometric configurations. A typical problem: given a set P of n points in d-dimensions, and a parameter eps>0, is it possible to pick a set Q of O(1/eps) points of P such that any half-space containing at least eps*n points of P must contain some point of Q. Based on joint works with Imre Barany, and Arijit Ghosh and Kunal Dutta. Dec 10 2018 Quentin Berthet University of Cambridge Title: Estimation of smooth densities in Wasserstein distance We describe the optimal rates of density estimation in Wasserstein distance, for different notions of smoothness. This is motivated by the growing use of these distances in various applications, and we also analyze the algorithmic aspects of this problem. joint work with J. Weed, MIT. Dec 17  2018 Pas de séminaire Jan 7 2019 Séminaire Parisien de Statistique IHP Jan 14 2019 14h – 16h30 Salle 2016 Johannes Schmidt-Hieber Leiden University Cours OFPR 1/4 Theoretical results for deep neural networks Large databases and increasing computational power have recently resulted in astonishing performances of deep neural networks for a broad range of extremely complex tasks, including image and text classification, speech recognition and game playing. These deep learning methods are build by imitating the action of the brain and there are few theoretical results as of yet. To formulate a sound mathematical framework explaining these successes is a major challenge for current research.    The course aims to give an overview about existing theory for deep neural networks with a strong emphasis on recent developments. Core topics are approximation theory and complexity bounds for the function spaces generated by deep networks. Beyond that we also discuss modelling aspects, theoretical results on the energy landscape and statistical risk bounds.  Literature: -Anthony, M., and Bartlett, P. L. Neural network learning: theoretical foundations. Cambridge University Press, 1999. -Bach, F. Breaking the curse of dimensionality with convex neural networks. JMLR. 2017 -Barron, A. Universal approximation bounds for superpositions of a sigmoidal function. IEEE . 1993. -Barron, A. Approximation and estimation bounds for artificial neural networks. Machine Learning. 1994. -Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., LeCun, Y. The loss surface of multilayer networks. Aistats. 2015. -Goodfellow, Bengio, Courville. Deep Learning. MIT Press, 2016. -Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numerica, 143-195, 1999. -Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. ArXiv 2017. -Telgarsky, M. Benefits of depth in neural networks. ArXiv. 2016. -Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Networks. 2017. Jan 17 2019 14h – 16h30 Salle 2016 Johannes Schmidt-Hieber Leiden University Cours OFPR 2/4 Theoretical results for deep neural networks Jan 21 2019 14h – 16h30 Salle 2016 Johannes Schmidt-Hieber Leiden University Cours OFPR 3/4 Theoretical results for deep neural networks Jan 24 2019 14h – 16h30 Salle 2016 Johannes Schmidt-Hieber Leiden University Cours OFPR 4/4 Theoretical results for deep neural networks Jan 28 2019 Mohamed Ndaoud ENSAE Title: Interplay of minimax estimation and minimax support recovery under sparsity   We introduce the notion of scaled minimaxity for sparse estimation in high-dimensional linear regression model. Fixing the scale of the signal-to-noise ratio, we prove that the estimation error can be much smaller than the global minimax error. Taking advantage of the interplay between estimation and support recovery we achieve optimal performance for both problems simultaneously under orthogonal designs. We also construct a new optimal estimator for scaled minimax sparse estimation. Sharp results for the classical minimax risk are recovered as a consequence of our study. For general designs, we introduce a new framework based on algorithmic regularization where previous sharp results hold. Our analysis bridges the gap between optimization and statistical accuracy. The procedure we present achieves optimal statistical error faster than, for instance, classical algorithms for the Lasso. As a consequence, we present a new iterative algorithm for high-dimensional linear regression that is scaled minimax optimal, fast and adaptive. Feb. 4 2019 Tom Berrett University of Cambridge Title: Efficient multivariate functional  estimation and independence testing Abstract: Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this talk I will first describe new entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. These estimators are constructed as weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample of n independent and identically distributed random vectors taking values in R^d . A careful choice of weights enables us to obtain an efficient estimator for arbitrary d, given sufficient smoothness, while the original unweighted estimator is typically only efficient for  d ≤ 3. I will also discuss newer results on the estimation of more general functionals, in settings where we have samples from two different distributions. The next part of the talk will be to use our entropy estimators to propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, which we may decompose into joint and marginal entropies. The proposed critical values, which may be obtained from simulation in the case where an approximation to one marginal is available or resampling otherwise, facilitate size guarantees, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. Feb 11 2019 CANCELLED Feb 18 2019 Séminaire Parisien de Statistique IHP Feb 25 2019 Pas de séminaire March 4 2019 Pas de séminaire March 11 2019 Ismael Castillo Sorbonne Université Title: On frequentist false discovery rate of Bayesian multiple testing procedure Abstract : In many high dimensional statistical  settings, a central task is to identify active variables among a large number of candidates. For the practitioner, a key concern is not to make too many false positives’, which correspond to declaring as active an inactive variable. Given a multiple testing procedure, a typical aim is then to control its false discovery rate (FDR), that is the average number of  false positives. In this talk, I will consider this question for a popular Bayesian procedure, namely the posterior distribution arising from a spike-and-slab prior, where the sparsity parameter is calibrated by an  empirical Bayes approach, in the canonical sparse sequence model. I will introduce some commonly used Bayesian Multiple Testing procedures (BMTs)  based on posterior distributions. We then ask whether such procedures control the FDR at a given target level, for any possible true sparse signal. That is, we consider the question of whether BMTs give a uniform control over any sparse vector of the pointwise (i.e. frequentist) false discovery rate. We find that the answer is positive for natural BMTs based on empirical spike-and-slab posterior distributions, although some of those can be  slightly `conservative’. We also demonstrate that certain BMTs are not too conservative, in the sense that they achieve a sharp FDR control at the desired target level when non-zero coordinates of the sparse signal are suitably large, while controlling the FDR up to a constant of the  target level uniformly over arbitrary sparse signals. This is joint work with Étienne Roquain (Sorbonne Université). March 18 2019 Séminaire Parisien de Statistique IHP March 25 2019 Tengyuan Liang University of Chicago Booth School of Business New Thoughts on Adaptivity, Generalization and Interpolation Motivated from Neural Networks Abstract: Consider the problem: given data pair (x, y) drawn from a population with f_*(x) = E[y|x], specify a neural network and run gradient flow on the weights over time until reaching any stationarity. How does f_t, the function computed by the neural network at time t, relate to f_*, in terms of approximation and representation? What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks. We show that when reaching any local stationarity, gradient flow learns an adaptive RKHS representation, and performs the global least squares projection onto the adaptive RKHS, simultaneously. In addition, we prove that as the RKHS is data-adaptive and task-specific, the residual for f˚ lies in a subspace that is smaller than the orthogonal complement of the RKHS, formalizing the representation and approximation benefits of neural networks. Then we will move to discuss generalization for interpolating methods in RKHS. In the absence of explicit regularization, Kernel “Ridgeless” Regression with nonlinear kernels has the potential to fit the training data perfectly. It has been observed empirically, however, that such interpolated solutions can still generalize well on test data. We isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions which is due to a combination of high dimensionality of the input data, curvature of the kernel function, and favorable geometric properties of the data such as an eigenvalue decay of the empirical covariance and kernel matrices. In addition to deriving a data-dependent upper bound on the out-of-sample error, we present experimental evidence suggesting that the phenomenon occurs in the MNIST dataset. April 1 2019 Botond Szabo Leiden University On the fundamental understanding of distributed computation Abstract: In recent years, the amount of available information has become so vast in certain fields of applications that it is infeasible or undesirable to carry out the computations on a single server. This has motivated the design and study of distributed statistical or learning methods. In distributed methods, the data is split amongst different administrative units and computations are carried out locally, in parallel to each other. The outcome of the local computations are then aggregated into a final result on a central machine. We consider the limitations and guarantees of distributed methods under communication constraints (i.e. only limited, fixed amount of bits are allowed to be transmitted between the local and the central machines) in context of the random design regression model. We derive minimax lower bounds (which depending on the communication budget can be substantially higher than the standard non-distributed minimax rates), matching upper bounds and provide adaptive estimators reaching these limits. We also consider the case where the number of transmitted bits is taken to be data-driven and investigate whether one can achieve the minimax non-distributed estimation rate and at the same time transmit in some sense the optimal amount of information between the machines. This is a joint work with Harry van Zanten. April 8 2019 Asaf Weinstein Carnegie Mellon University A power analysis for knockoffs using Lasso and thresholded-Lasso statistics Practitioners using the Lasso are well aware of its limitations as a variable selector. That good prediction usually leads to many falsely selected variables implies that to capture most of the signal, one necessarily pays with a high number of false positives. It is also well known that this situation can be mitigated by thresholding the Lasso estimates, i.e., by discarding small estimates one can reduce considerably the number of false selections.  Recent works have studied both these phenomena from a theoretical perspective in the approximate message-passing (AMP) framework, providing exact asymptotic predictions for Type I and Type II error rates. These existing works are important because they allow to answer both quantitative and qualitative questions (for example, where to stop on the Lasso path to ensure FDR\leq \alpha, or at which value of \lambda should the thresholded Lasso estimates be computed for optimal power?) and compare Lasso to Thresholded Lasso. However, many of these answers depend on the distribution of the underlying signal. In this talk I’ll focus on our experience with using Knockoffs to mimic these oracle procedures in the absence of knowledge about the signal. It will be demonstrated that the sensitivity of power to the fraction of fake variables added, is very different for Lasso and for Thresholded Lasso. April 15 2019 Séminaire Parisien de Statistique IHP April 22 2019 Férié April 29 2019 Pas de séminaire May 6 2019 Pas de séminaire May 13 2019 Alexey Naumov Higher School of Economics, Moscow Gaussian approximation for maxima of large number of quadratic forms of high-dimensional random vectors Let (X_i)_{i=1, ... , n} be independent identically distributed random observations taking values in a high dimensional space and (Q_j)_{j=1, ..., p} be symmetric positive definite matrices. We consider joint distribution function of quadratic forms (Q_j S_n, S_n), where S_n is the normalized sum of observations, and prove the rate of Gaussian approximation with explicit dependence on n, p and dimension. We also compare this result with results of Bentkus (2003) and Chernozhukov, Chetverikov, Kato (2016). Statistical applications will be discussed as well. The talk is based on the joint project with Friedrich Goetze, Vladimir Spokoiny and Alexander Tikhomirov. May 20 2019 Journée de l’Ecole doctorale 9h00 – 11h00 May 20 2019 – 2 p.m. Yohann De Castro Ecole Ponts ParisTech Exact False Negative Control using Stopping Times on the LAR's Path In this talk we introduce a new exact testing procedure in high dimensional linear regression framework using the outcomes of the standard Least Angle Regression (LAR) algorithm. Under the Gaussian noise assumption, we give the exact law of the sequence of knots conditional on the sequence of variables entering the model (i.e., the ‘‘post-selection'' law of the knots of the LAR). Based on this result, we introduce a new exact testing procedure on the existence of false negatives. This testing procedure can be deployed after any support selection procedure that will produce an estimation of the support (i.e., the indexes of nonzero coefficients). The type I error of the test can be exactly controlled as long as the selection procedure follows some elementary hypotheses, referred to as ‘‘admissible selection procedures''. These support selection procedures are such that the estimation of the support is given by the k first variables entering the model where the random variable k is a stopping time. May 27 2019 Geoffrey Chinot CREST, ENSAE ERM and RERM are tractable and optimal estimators in the ε-contamination model   We study the ERM and RERM with Lipschitz and convex loss functions under a subgaussian assumption on the design. We consider a setting where |O| outliers may contaminate the labels. In that case, we show that the error rate is bounded by r(N) + |O|/N, where N is the total number of observations and r(N) is the optimal error rate in the non-contaminated setting. The main result can be used for both non-regularized and regularized procedures. For instance we present results for the Huber's M-estimators without penalization or regularized by the l1-norm. For these two applications we get minimax-optimal rates in the ε-contamination model. June 3 2019 Ilias Diakonikolas University of Southern California Algorithmic Questions in Robust High-Dimensional Statistics   Fitting a model to a collection of observations is one of the quintessential questions in statistics. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This simplifying assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics degrades badly as the dimension increases.    In this talk, we will survey the recent progress in algorithmic high-dimensional robust statistics. We will describe the first computationally efficient algorithms for robust mean and covariance estimation and the main insights behind them. We will also present practical applications of these estimators to exploratory data analysis and adversarial machine learning. Finally, we will discuss new directions and opportunities for future work.    The talk will be based on a number of joint works with (various subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. June 10 2019 Férié June 17 2019 Pas de séminaire June 24 2019 10h30 Tracy Ke Harvard University Optimal Adaptivity of Signed-Polygon Statistics for Network Testing Given a symmetric social network, we are interested in testing whether it has only one community or multiple communities. The desired tests should (a) accommodate severe degree heterogeneity, (b) accommodate mixed-memberships, (c) have a tractable null distribution, and (d) adapt automatically to different levels of sparsity, and achieve the optimal phase diagram. How to find such a test is a challenging problem. We propose the Signed Polygon as a class of new tests. Fixing m ≥ 3, for each m-gon in the network, define a score using the centered adjacency matrix. The sum of such scores is then the m-th order Signed Polygon statistic. The Signed Triangle (SgnT) and the Signed Quadrilateral (SgnQ) are special examples of the Signed Polygon. We show that both the SgnT and SgnQ tests satisfy (a)-(d), and especially, they work well for both very sparse and less sparse networks. Our proposed tests compare favorably with the existing tests. For example, the EZ and GC tests behave unsatisfactorily in the less sparse case and do not achieve the optimal phase diagram. Also, many existing tests do not allow for severe heterogeneity or mixed-memberships, and they behave unsatisfactorily in our settings. The analysis of the SgnT and SgnQ tests is delicate and tedious, and the main reason is that we need a unified proof that covers a wide range of sparsity levels and a wide range of degree heterogeneity. For lower bound theory, we use a phase transition framework, which includes the standard minimax argument, but is more informative. The proof uses classical theorems on matrix scaling. (Joint work with Jiashun Jin and Shengming Luo. arXiv preprint: https://arxiv.org/abs/1904.09532) June 24 2019 14h00 Jonas Peters University of Copenhagen The Impossibility of Conditional Independence Testing and Causality in Dynamical Systems In this talk, we (1) discuss in which sense conditional independence testing for continuous random variables is an unsolvable statistical problem. (2) We also introduce CausalKinetiX, a framework that aims at learning the underlying structure in dynamical systems by trading off invariance and predictability. The talk contains joint work with Rajen Shah, Niklas Pfister, and Stefan Bauer. No prior knowledge about causality is required. July 8 2019 14h00 Vladimir Ulyanov Moscow University Asymptotic expansions for the distributions of statistics with random sample size In practice, we often encounter situations where a sample size is not de_ned in advance and can be random itself. In Gnedenko (1989) it is demonstrated that the asymptotic properties of the statistics can be radically changed when the non-random sample size is replaced by a random value. In the talk, the second order Chebyshev_Edgeworth and Cornish_Fisher type expansions (see Ulyanov, Aoshima, Fujikoshi, 2016) based on Student's t- and Laplace distributions and their quantiles are derived for sample mean and sample median with random sample size of a special kind. We use general transfer theorem (see Bening, Galieva, Korolev, 2013) which enables us to construct the asymptotic expansions for the distributions of the randomly normalized statistics applying the asymptotic expansions for the distributions of the considered non-randomly normalized statistics and for the distributions of the random size of the underlying sample.

 Sept 11 2017 Arnak Dalalyan ENSAE Title : User-friendly bounds for sampling from a log-concave density using Langevin Monte Carlo   Abstract :  We will present new bounds on the sampling error in the case where the target distribution has a smooth and log-concave density. These bounds are established for the Langevin Monte Carlo and its discretized versions involving the Hessian matrix of the log-density. We will also discuss the case where accurate evaluation of the gradient is impossible. Sept 18 2017 Séminaire Parisien de Statistique - IHP Sept 25 2017 Mathias Trabs Université de Hamburg Title : Volatility estimation for stochastic PDE’s using high-frequency observations   Abstract : We study the parameter estimation for parabolic, linear, second order, stochastic partial differential equations (SPDEs) observing a mild solution on a discrete grid in time and space. A high-frequency regime is considered where the mesh of the grid in the time variable goes to zero. Focusing on volatility estimation, we provide an explicit and easy to implement method of moments estimator based on the squared increments of the process. The estimator is consistent and admits a central limit theorem. Starting from a representation of the solution as an infinite factor model, the theory considerably differs from the statistics for semi-martingales literature. The performance of the method is illustrated in a simulation study. This is joint work with Markus Bibinger. Oct 2 2017 Nicolas Marie Modal’X (Paris 10)/ ESME Sudria Title : Estimation non-paramétrique dans les équations différentielles dirigées par le mouvement brownien fractionnaire.   Abstract : Après avoir introduit quelques notions de calcul stochastique trajectoriel, l’exposé présentera un estimateur type Nadaraya-Watson de la fonction de drift d’une équation différentielle dirigée par un bruit multiplicatif fractionnaire. Afin d’établir la consistance de l’estimateur, les résultats d’ergodicité de Hairer et Ohashi (2007) seront énoncés et expliqués. Une fois sa consistence établie, la question de la vitesse de convergence de l’estimateur sera abordée. Il s’agit d’un travail en collaboration avec F. Comte. Oct 9 2017 Zoltan Szabo Ecole Polytechnique Title : Characteristic Tensor Kernels   Abstract : Maximum mean discrepancy (MMD) and Hilbert-Schmidt independence criterion (HSIC) are popular techniques in data science to measure the difference and the independence of random variables, respectively.  Thanks to their kernel-based foundations, MMD and HSIC are applicable on a variety of domains including documents, images, trees, graphs, time series, mixture models, dynamical systems, sets, distributions, permutations. Despite their tremendous practical success, quite little is known about when HSIC characterizes independence and MMD with tensor kernel can discriminate probability distributions, in terms of the  contributing kernel components. In this talk, I am going to present a complete answer to this question, with conditions which are often easy to verify in practice. [Joint work with Bharath K. Sriperumbudur (PSU).  Preprint: https://arxiv.org/abs/1708.08157] Oct 16 2017 Séminaire Parisien de Statistique - IHP Oct 23 2017 Philip Thompson ENSAE Cancelled Oct 30 2017 Vacances Nov 6 2017 Martin Kroll ENSAE Title : On minimax optimal and adaptive estimation of linear functionals in inverse Gaussian sequence space models   Abstract : We consider an inverse problem in a Gaussian sequence space model where the multiplication operator is not known but only available via noisy observations. Our aim is not to reconstruct the solution itself but the value of a linear functional of the solution. In our setup the optimal rate depends on two different noise levels, the noise level concerning the observation of the transformed solution and the noise level concerning the noisy observation of the operator.  We consider this problem from a minimax point of view and obtain upper and lower bounds under smoothness assumptions on the multiplication operator and the unknown solution.  Finally, we sketch an approach to the adaptive estimation in the given model using a method combining both model selection and the Goldenshluger-Lepski method. This is joint work in progress with Cristina Butucea (ENSAE) and Jan Johannes (Heidelberg) Nov 13 2017 Séminaire Parisien de Statistique - IHP Nov 20 2017 Olivier Collier Université Paris Nanterre Title : Estimation robuste de la moyenne en temps polynômial   Abstract : Il s'agit de résultats obtenus en collaboration avec Arnak Dalalyan. Quand les observations sont polluées par la présence d'outliers, il n'est plus souhaitable d'estimer l'espérance par la moyenne empirique. Des méthodes optimales ont été trouvées, comme la profondeur de Tuckey dans le modèle dit de contamination. Cependant, ce dernier estimateur n'est pas calculable en temps polynômial. Dans un modèle gaussien, nous remarquerons que l'estimation de la moyenne revient à l'estimation d'une fonctionnelle linéaire sous contrainte de sparsité de groupe. Il est alors naturel d'utiliser group-lasso. Nous pourrons alors noter plusieurs phénomènes intéressants : dans ce contexte, la sparsité par groupe permet un gain polynômial par rapport à la seule sparsité, alors que les études précédentes montraient au mieux un gain logarithmique, et il semble que l'estimation en temps polynômial ne puisse pas atteindre la performance optimale des méthodes en temps exponentiel. Nov 27 2017 Elisabeth Gassiat Université Paris-Sud Title : Estimation of the proportion of explained variation in high dimensions.   Abstract : Estimation of heritability of a phenotypic trait based on genetic data may be set as estimation of the proportion of explained variation in high dimensional linear models. I will be interested in understanding the impact of: — not knowing the sparsity of the regression parameter, — not knowing the variance matrix of the covariates on minimax estimation of heritability. In the situation where the variance of the design is known, I will present an estimation procedure that adapts to unknown sparsity.  when the variance of the design is unknown and no prior estimator of it is available,  I will show that  consistent estimation of heritability is impossible. (Joint work with N. Verzelen, and PHD thesis of A. Bonnet). Dec 4 2017 Philip Thomson ENSAE Title : Stochastic approximation with heavier tails   Abstract : We consider the solution of convex optimization and variational inequality problems via the stochastic approximation methodology where the gradient or operator can only be accessed through an unbiased stochastic oracle. First, we show that (non-asymptotic) convergence is possible with unbounded constraints and a "multiplicative noise" model: the oracle is Lipschitz continuous with a finite pointwise variance which may not be uniformly bounded (as classically assumed). In this setting, our bounds depend on local variances at solutions and the method uses noise reduction in an efficient manner: given a precision, it respects a near-optimal sample and averaging complexities of Polyak-Ruppert's method but attains the order of the (faster) deterministic iteration complexity. Second, we discuss a more "robust" version where the Lipschitz constant L is unknown but, in terms of error precision, near-optimal complexities are maintained. A price to pay when L is unknown is that a large sample regime is assumed (still respecting the complexity of the SAA estimator) and "non-martingale-like" dependencies are introduced. These dependencies are coped with an "iterative localization" argument based on empirical process theory and self-normalization.  Joint work with A. Iusem (IMPA), A. Jofré (CMM-Chile) and R.I. Oliveira (IMPA). Dec 11 2017 Jamal Najim CNRS UPEM Title : Grandes matrices de covariance empiriques    Abstract : Les modèles de grandes matrices de covariance empirique ont été énormément étudiés depuis l’article fondateur de Marchenko et Pastur en 1967, qui ont décrit le comportement du spectre de telles matrices quand les deux dimensions (dimension des observations et taille de l’échantillon) croissent vers l’infini au même rythme.    L’objectif de cet exposé sera dans un premier temps de présenter les résultats standards (et moins standards!) liés à ces modèles et les outils mathématiques d’analyse (principalement la transformée de Stieltjes). On insistera ensuite sur le cas particulier des « spiked models », modèles de grandes matrices de covariance dans lesquels quelques valeurs propres sont éloignées de la masse des valeurs propres de la matrice considérée. Ces « spiked models »  sont très populaires en finance, économétrie, traitement du signal, etc.   Enfin, on s’intéressera à des grandes matrices de covariance dont les observations sont issues d’un processus à mémoire longue. Pour de telles observations, on décria le comportement asymptotique et des fluctuations de la plus grande valeur propre. Ce travail est issu d’une collaboration avec F. Merlevède et P. Tian. Dec 18 2017 Pas de séminaire Jan 8 2018 Eric Moulines Ecole Polytechnique Title : Algorithmes de simulation de Langevin   Abstract : Les algorithmes de Langevin ont connu récemment un vif regain d’intérêt dans la communauté de l’apprentissage statistique, suite aux travaux de M. Welling et Y.W. Teh (‘Bayesian learning via Stochastic gradient Langevin dynamics’, ICML, 2011). Cette méthode couplant approximation stochastique et méthode de simulation permet d’envisager la mise en œuvre de méthodes de simulation en grande dimension et pour des grands ensembles de données. Les applications sont très nombreuses à la fois dans les domaines «classiques » des statistiques bayésiennes (inférence bayésienne, choix de modèles) mais aussi en optimisation bayésienne. Dans cet exposé, nous présenterons quelques travaux récents sur l’analyse de convergence de cet algorithme. Nous montrerons comment obtenir des bornes explicites de convergence en distance de Wasserstein et en variation totale dans différents cadres (fortement convexe, convexe différentiable, super-exponentiel, etc.). Nous nous intéresserons tout particulièrement à la dépendance de ces bornes dans la dimension du paramètre. Nous montrerons aussi comment étendre ces méthodes pour des fonctions convexes mais non différentiables en nous inspirant des méthodes de gradient proximaux. Jan 15 2018 Séminaire Parisien de Statistique – IHP Jan 22 2018 Data Science Jan 29 2018 Anatoli Juditsky Université de Grenoble-Alpes Titre : Aggrégation des estimateurs à partir d’observations indirectes   Nous considérons le problème d’agrégation d'estimation adaptative dans le cas où des observations indirectes du signal sont disponibles. Nous proposons une approche au problème d’agrégation par tests quasi-optimaux d'hypothèses convexes basée sur la réduction du problème statistique d’agrégation à des problèmes d'optimisation convexe admettant une analyse et une mise en œuvre efficace. On montre que cette approche conduit aux algorithmes quasi-optimaux dans le problème classique de l’agrégation - L_2 pour différents schémas d'observation (par exemple, observations gaussiennes indirectes, modèle d'observations de Poisson et échantillonnage à partir d’une loi discrète). Nous discutons également le lien avec le problème lié d’estimation adaptative. Feb 5 2018 Frédéric Chazal INRIA Title: An introduction to persistent homology in Topological Data Analysis and the density of expected persistence diagrams.   Abstract: Persistence diagrams play a fundamental role in Topological Data Analysis (TDA) where they are used as topological descriptors of data represented as point cloud. They consist in discrete multisets of points in the plane $\R^2$ that can equivalently be seen as discrete measures in $\R^2$. In a first part of the talk, we will introduce the notions of persistent homology and persistence diagrams and show how they are built from point cloud data (so no knowledge in TDA is required to follow the talk). In the second part of the talk we will show a few properties of persistence diagrams when the data come as a random point cloud. In this case, persistence diagrams become random discrete measures and we will show that, in many cases, their expectation has a density with respect to Lebesgue measure in the plane and we will discuss its estimation. This is a joint work with Vincent Divol (ENS Paris / Inria DataShape team) Feb 12 2018 Séminaire Parisien de Statistique - IHP Feb 19 2018 Laëtitia Comminges Université Paris-Dauphine Some effects in adaptive robust estimation under sparsity   Abstract: Adaptive estimation in the sparse mean model and in sparse regression exhibits some interesting effects. This paper considers estimation of a sparse target vector, of its $\ell_2$-norm and of the noise variance in the sparse linear model. We establish the optimal rates of adaptive estimation when adaptation is considered  with respect to the triplet "noise level -- noise distribution -- sparsity". These rates turn out to be different from the minimax non-adaptive rates when the triplet is known. A crucial issue is the ignorance of the noise level. Moreover, knowing or not knowing the noise distribution can also influence the rate. For example, the rates of estimation of the noise level can differ depending on whether the noise is Gaussian or sub-Gaussian without a precise knowledge of the distribution.  Estimation of noise level in our setting can be viewed as an adaptive variant of robust estimation of scale in the contamination model, where instead of fixing the "nominal" distribution in advance we assume that it belongs to some class of distributions. We also show that in the problem of estimation of a sparse vector under the $\ell_2$-risk when the variance of the noise in unknown, the optimal rate depends dramatically on the design. In particular, for noise distributions with polynomial tails, the rate can range from sub-Gaussian to polynomial depending on the properties of the design. Feb 26 2018 Vacances Mars 5 2018 Rajarshi Mukherjee Berkeley Global Testing Against Sparse Alternatives under Ising Models Abstract: We study the effect of dependence on detecting sparse signals. In particular, we focus on global testing against sparse alternatives for the magnetizations of an Ising model and establish how the interplay between the strength and sparsity of a signal determines its detectability under various notions of dependence (i.e. the coupling constant of the Ising model). The impact of dependence is best illustrated under the Curie-Weiss model where we observe the effect of a "thermodynamic" phase transition. In particular, the critical state exhibits a subtle "blessing of dependence" phenomenon in that one can detect much weaker signals at criticality than otherwise. Furthermore, we develop a testing procedure that is broadly applicable to account for dependence and show that it is asymptotically minimax optimal under fairly general regularity conditions. This talk is based on joint work with Sumit Mukherjee and Ming Yuan. Mars 12 2018 Pas de séminaire Mars 19 2018 Yihong Wu 10h30-12h30 Yale Polynomial method in statistical estimation : from large domain to mixture models – 1 Mars 22 2018 Yihong Wu 14h-17h Yale 2 Mars 26 2018 Yihong Wu 10h30-12h30 Yale   14h00 – 17h00 3     Ecole doctorale – exposés des doctorants Mars 29 2018 Yihong Wu 14h-17h Yale 4 Avril 2 2018 Férié April 5 2018 Angelika Rohde 14h-15h15 Freiburg Universität Geometrizing rates of convergence under privacy constraints Abstract : We study estimation of a functional $\theta(\Pr)$ of an unknown probability distribution $\Pr \in\P$ in which the original iid sample $X_1,\dots, X_n$ is kept private even from the statistician via an $\alpha$-local differential privacy constraint. Let $\omega_1$ denote the modulus of continuity of the functional $\theta$ over $\P$, with respect to total variation distance. For a large class of loss functions $l$, we prove that the privatized minimax risk is equivalent to $l(\omega_1((n\alpha^2)^{-1/2}))$ to within constants, under regularity conditions that are satisfied, in particular, if $\theta$ is linear and $\P$ is convex. Our results extend the theory developed by Donoho and Liu (1991) to the nowadays highly relevant case of privatized data. Somewhat surprisingly, the difficulty of the estimation problem in the private case is characterized by $\omega_1$, whereas, it is characterized by the Hellinger modulus of continuity if the original data $X_1,\dots, X_n$ are available. We also provide a general recipe for constructing rate optimal privatization mechanisms and illustrate the general theory in numerous examples. Our theory allows to quantify the price to be paid for local differential privacy in a large class of estimation problems. April 6 2018 Cheng Mao – 14h-15h15 MIT Breaking the n^{-1/2} barrier for permutation-based ranking models  Abstract : The task of ranking from pairwise comparison data arises frequently in various applications, such as recommender systems, sports tournaments and social choice theory. There has been a recent surge of interest in studying permutation-based models, such as the noisy sorting model and the strong stochastic transitivity model, for ranking from pairwise comparisons. Although permutation-based ranking models are richer than traditional parametric models, a wide gap exists between the statistically optimal rate n^{-1} and the rate n^{-1/2} achieved by the state-of-the-art computationally efficient algorithms. In this talk, I will discuss new algorithms that achieve rates n^{-1} and n^{-3/4} for the noisy sorting model and the more general strong stochastic transitivity model respectively. The talk is based on joint works with Jonathan Weed, Philippe Rigollet, Ashwin Pananjady and Martin J. Wainwright. Avril 9 2018 Séminaire Parisien de Statistique - IHP Avril 13 2018 Eric Kolaczyk – 14h-15h15 Boston University Title:  Dynamic Networks with Multi-scale Temporal Structure   Abstract: We describe a novel method for modeling non-stationary multivariate time series, with time-varying conditional dependencies represented through dynamic networks. Our proposed approach combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing temporally local structure in the data while maintaining sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. We present theoretical and numerical results describing the performance of our method, which is motivated and illustrated using task-based magnetoencephalography (MEG) data in neuroscience.  This is joint work with Xinyu Kang and Apratim Ganguly. Avril 23 2017 Vacances Avril 30 2018 Pas de séminaire May 7 2018 Pas de séminaire May 14 2018 May 21 2018 Jour férié May 22 2018 Mikhail Belkin 10h-13h Ohia State University The Differential Geometry of Data - 1 May 23 2018 Mikhail Belkin 16h-18h Ohia State University 2 May 28 2018 Sivaraman Balakrishnan Carnegie Melon May 29 2018 Mikhail Belkin 10h-13h salle-2003 Ohia State University 3 May 30 2018 Mikhail Belkin 16h-18h sale-2040 Ohia State University 4 June 4 Séminaire Parisien de Statistique - IHP June 11 2018 Jeremy Heng Harvard University Title: Controlled sequential Monte Carlo   Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques to approximate high-dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics and related fields as they can be applied to perform state estimation for non-linear non-Gaussian state space models and Bayesian inference for complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which have a crucial impact on their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. We provide theoretical analysis of our proposed methodology and demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications. June 18 2018 Mihai Cucuringu Oxford University Title: Laplacian-based methods for ranking and constrained clustering We consider the classic problem of establishing a statistical ranking of a set of n items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years. Its least squares solution can be approximated by either a spectral or a semidefinite programming relaxation, followed by a rounding procedure. We perform extensive numerical simulations on both synthetic and real-world data sets, showing that our proposed method compares favorably to other algorithms from the recent literature. We also briefly discuss ongoing work on extensions and applications of the group synchronization framework to k-way synchronization, list synchronization, synchronization with heterogeneous information and partial rankings, and phase unwrapping. We also present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning, via a generalized Cheeger inequality. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches both in speed and quality. June 27 2018 Wednesday -14h Emtiyaz Khan Riken, Tokyo Title: Fast yet Simple Natural-Gradient Variational Inference in Complex Models   Approximate Bayesian inference is promising in improving generalization and reliability of deep learning, but is computationally challenging. Modern variational-inference (VI) methods circumvent the challenge by formulating Bayesian inference as an optimization problem and then solving it using gradient-based methods. In this talk, I will argue in favor of natural-gradient approaches which can improve convergence of VI by exploiting the information geometry of the solutions. I will discuss a fast yet simple natural-gradient method obtained by using a duality associated with exponential-family distributions. I will summarize some of our recent results on Bayesian deep learning, where natural-gradient methods lead to an approach which gives simpler updates than existing VI methods while performing comparably to them. Joint work with Wu Lin (UBC), Didrik Nielsen (RIKEN), Voot Tangkaratt (RIKEN), Yarin Gal (UOxford), Akash Srivastva (UEdinburgh), Zuozhu Liu (SUTD). Based on: