Séminaire de Statistique
Organisateurs:


Lundi / Monday 14h  15h15 – SALLE 3001 ENSAE 
Alexander
Meister University of Rostock 
Title: Nonparametric density estimation for intentionally corrupted
functional data Abstract: We consider statistical models where functional data are artificially contaminated by
independent Wiener processes in order to satisfy privacy constraints. We
show that the corrupted observations have a Wiener density which
determines the distribution of the original functional random variables
uniquely, and we construct a nonparametric estimator of that density. We
derive an upper bound for its mean integrated squared error which This talk is based on a joint work with
Aurore Delaigle (University of Melbourne, Australia). 

Sept 17 2018 
Christophe
Giraud University Paris Sud 
Title: Partial recovery bounds
for clustering with (corrected) relaxed Kmeans (1/2)

Sept 24 2018 
Séminaire Parisien de Statistique IHP 

Christophe
Giraud University Paris Sud 
Title: Partial recovery bounds
for clustering with (corrected) relaxed Kmeans (2/2) 

Oct 8 2018 
Stefan Wager Stanford University 
Title: Machine Learning for Causal Inference Abstract : Flexible estimation of
heterogeneous treatment effects lies at the heart of many statistical
challenges, such as personalized medicine and optimal resource allocation. In
this talk, I will discuss general principles for estimating heterogeneous
treatment effects in observational studies via loss minimization, and then
present a random forest algorithm that builds on these principles. As
established both formally and empirically, the proposed approach is an order
of magnitude more robust to confounding that direct regressionbased
baselines. 
Oct 15 2018 
Mathias Drtn Université de Copenhague 
Title : Causal discovery in linear nonGaussian
models Abstract:
We consider the problem of
inferring the causal graph underlying a structural equation model from an
i.i.d. sample. It is well known that this graph is identifiable only
under special assumptions. We consider
one such set of assumptions, namely, linear structural equations with
nonGaussian errors, and discuss inference of the causal graph in
highdimensional settings as well as in the presence of latent confounders. Joint work
with Y. Samuel Wang. 
Oct 18 et 19, 2018 14 – 16 :30 Salle 2043 
Anatoly Judtsky Université
GrenobleAlpes Cours OFPR –1 et 2 
Title:
Statistical Estimation via Convex Optimization Abstract: When speaking
about links between Statistics and Optimization, what comes to mind first is
the indispensable role played by optimization algorithms in the “numerical
toolbox” of Statistics. The goal of this course is to present another type of
links between Optimization and Statistics. We are speaking of situations
where Optimization theory (theory, not algorithms!) is of methodological
value in Statistics, acting as the source of statistical inferences. We focus
on utilizing Convex Programming theory, mainly due to its power, but also due
to the desire to end up with inference routines reducing to solving convex
optimization problems and thus implementable in a computationally efficient fashion. The topics we consider
are: ·
As a starter, we consider estimation of a
linear functional of unknown “signal” (a signal in the “usual sense,” a
distribution, or an intensity of a Poisson process, etc). We also discuss the
problem of estimating quadratic functional by “lifting” linear functional
estimates. As an application, we consider a signal recovery procedure –
“polyhedral estimate” – which relies upon efficient estimation of linear
functionals. ·
Next, we turn to general problem of linear
estimation of signals from noisy observations of their linear images. Here
application of Convex Optimization allows to propose provably optimal (or
nearly so) estimation procedures. The exposition does not
require prior knowledge of Statistics and Optimization; as far as these
disciplines are concerned, all necessary for us facts and concepts are
introduced before being used. The actual prerequisites are elementary
Calculus, Probability, Linear Algebra and (last but by far not least) general
mathematical culture. 
Oct 22 2018 
Vacances 

Oct 29 2018 
Séminaire Parisien de Statistique IHP 

Nov 5 2018 
Pas de séminaire 

Nov 7 et 9 2018 14 – 16 :30 Salle 2043 
Anatoly Judtsky Université
GrenobleAlpes Cours OFPR –3 et 4 
Title:
Statistical Estimation via Convex Optimization 
Nov 12 2018 
Karim Lounici Ecole Polytechnique 
Title:
Online PCA: nonasymptotics statistical guarantees for the Krasulina scheme Principal Component
Analysis is a popular method used to analyse the covariance structure
$\Sigma$ of a random vector. Recent results on the statistical properties of
standard PCA have highlighted the importance of the effective rank as a measure of the intrinsic
statistical complexity in the PCA problem. In particular, optimal rates of
estimation of the spectral projectors have been established in the offline
setting where all the observations are available at once and a batch
estimation method is implemented. In the online setting, observations arrive
in a stream and our estimate of eigenvalues and spectral projectors are
updated every time a new observation is available. This problem has attracted
a lot a attention recently but little is known on the statistical properties
of the existing methods. In this work, we consider the Krasulina scheme
(stochastic gradient ascent scheme) and establish nonasymptotic estimation
bounds in probability for the spectral projectors. For this method, the
effective rank also plays a central role in the performance of the method,
however the obtained rate is slower than that obtained in the offline
setting. 
Nov 19 2018 
Séminaire Parisien de Statistique IHP 

Nov 20 2018 Mardi/Tuesday 
Guenther Walther Stanford University 
Title :
Constructing an optimal confidence set for a distribution
function with applications to visualizing data Abstract: We
show how to construct a confidence set of distribution functions that have
the same optimal estimation performance as the empirical distribution
function, but which may be much simpler, that is the distribution functions
may have many fewer jumps. We show how an accelerated dynamic program can
be applied to a relaxed optimization problem in order to find the
distribution function in the confidence set that has the fewest jumps. We propose to use this distribution function
for summarizing and visualizing the data via a histogram, as this
distribution function is the simplest function that optimally addresses the
two main tasks of the histogram: estimating probabilities and detecting features
such as increases and (anti)modes in the distribution. We will illustrate our
methodology with examples. This is joint work with Housen Li, Axel Munk,
and Hannes Sieling. 
Nov 26 2018 
Stanislas
Minsker University of Southern California 
Medianofmeans estimator and its generalizations New results are presented for the class of estimators obtained via the generalization of the medianofmeans (MOM) technique. These results stem from connections between performance of MOMtype estimators and the rates of convergence in normal approximation. We provide tight nonasymptotic deviations guarantees in the form of exponential concentration inequalities, as well as asymptotic results in the form of limit theorems. Finally, we will discuss multivariate extensions. 
Dec 3 2018 
Nabil
Mustafa ESIEE Paris 
Title : Sampling in Geometric
Configurations In this talk I will present recent progress on
the problem of sampling in geometric configurations. A typical problem: given
a set P of n points in ddimensions, and a parameter eps>0, is it possible
to pick a set Q of O(1/eps) points of P such that any halfspace containing
at least eps*n points of P must contain some point of Q. Based on joint works
with Imre Barany, and Arijit Ghosh and Kunal Dutta. 
Dec 10 2018 
Quentin
Berthet University of Cambridge 
Title: Estimation of smooth densities in
Wasserstein distance We describe the optimal rates of density estimation in Wasserstein distance, for different notions of smoothness. This is motivated by the growing use of these distances in various applications, and we also analyze the algorithmic aspects of this problem. joint work with J.
Weed, MIT. 
Dec 17 2018 
Pas
de séminaire 

Jan 7 2019 
Séminaire Parisien de Statistique IHP 

Jan 14 2019 14h –
16h30 Salle
2016 
Johannes
SchmidtHieber Leiden University 
Cours
OFPR 1/4 Theoretical results for deep neural
networks Large databases and increasing
computational power have recently resulted in astonishing performances of
deep neural networks for a broad range of extremely complex tasks, including
image and text classification, speech recognition and game playing. These
deep learning methods are build by imitating the action of the brain and
there are few theoretical results as of yet. To formulate a sound mathematical
framework explaining these successes is a major challenge for current
research. The course aims to give an overview
about existing theory for deep neural networks with a strong emphasis on
recent developments. Core topics are approximation theory and complexity
bounds for the function spaces generated by deep networks. Beyond that we
also discuss modelling aspects, theoretical results on the energy landscape
and statistical risk bounds. Literature: Anthony, M., and Bartlett, P. L. Neural
network learning: theoretical foundations. Cambridge University Press, 1999. Bach, F. Breaking the curse of
dimensionality with convex neural networks. JMLR. 2017 Barron, A. Universal approximation
bounds for superpositions of a sigmoidal function. IEEE . 1993. Barron, A. Approximation and estimation
bounds for artificial neural networks. Machine Learning. 1994. Choromanska, A., Henaff, M., Mathieu,
M., Arous, G. B., LeCun, Y. The loss surface of multilayer networks. Aistats.
2015. Goodfellow, Bengio, Courville. Deep
Learning. MIT Press, 2016. Pinkus, A. Approximation theory of the
MLP model in neural networks. Acta Numerica, 143195, 1999. SchmidtHieber, J. Nonparametric
regression using deep neural networks with ReLU activation function. ArXiv
2017. Telgarsky, M. Benefits of depth in
neural networks. ArXiv. 2016. Yarotsky, D. Error bounds for
approximations with deep ReLU networks. Neural Networks. 2017. 
Jan 17 2019 14h –
16h30 Salle
2016 
Johannes
SchmidtHieber Leiden University 
Cours
OFPR 2/4 Theoretical results for deep neural
networks 
Jan 21 2019 14h –
16h30 Salle
2016 
Johannes
SchmidtHieber Leiden University 
Cours
OFPR 3/4 Theoretical results for deep neural
networks 
Jan 24 2019 14h –
16h30 Salle
2016 
Johannes
SchmidtHieber Leiden University 
Cours
OFPR 4/4 Theoretical results for deep neural
networks 
Jan 28 2019 
Mohamed
Ndaoud ENSAE 
Title: Interplay of minimax estimation and minimax
support recovery under sparsity We introduce the notion of
scaled minimaxity for sparse estimation in highdimensional linear regression
model. Fixing the scale of the signaltonoise ratio, we prove that the
estimation error can be much smaller than the global minimax
error. Taking advantage of the interplay between estimation and support
recovery we achieve optimal performance for both problems
simultaneously under orthogonal designs. We also construct a
new optimal estimator for scaled minimax sparse estimation. Sharp
results for the classical minimax risk are recovered as a consequence of our
study. For general designs, we introduce a new framework based on
algorithmic regularization where previous sharp results hold. Our analysis
bridges the gap between optimization and statistical accuracy. The procedure
we present achieves optimal statistical error faster than, for
instance, classical algorithms for the Lasso. As a consequence, we present a new iterative algorithm for
highdimensional linear regression that is scaled minimax optimal, fast
and adaptive. 
2019 
Tom
Berrett University of Cambridge 
Title: Efficient multivariate functional Abstract: Many statistical procedures, including goodnessoffit tests
and methods for independent component analysis, rely critically on the
estimation of the entropy of a distribution. In this talk I will first
describe new entropy estimators that are efficient and achieve the local
asymptotic minimax lower bound with respect to squared error loss. These
estimators are constructed as weighted averages of the estimators originally
proposed by Kozachenko and Leonenko (1987), based on the knearest neighbour
distances of a sample of n independent and identically distributed random
vectors taking values in R^d . A careful choice of weights enables us to
obtain an efficient estimator for arbitrary d, given sufficient smoothness,
while the original unweighted estimator is typically only efficient for d ≤ 3. I will also discuss newer
results on the estimation of more general functionals, in settings where we
have samples from two different distributions. The next part of the talk will
be to use our entropy estimators to propose a test of independence of two
multivariate random vectors, given a sample from the underlying population.
Our approach, which we call MINT, is based on the estimation of mutual
information, which we may decompose into joint and marginal entropies. The
proposed critical values, which may be obtained from simulation in the case
where an approximation to one marginal is available or resampling otherwise,
facilitate size guarantees, and we provide local power analyses, uniformly
over classes of densities whose mutual information satisfies a lower bound.
Our ideas may be extended to provide a new goodnessoffit tests of normal
linear models based on assessing the independence of our vector of covariates
and an appropriatelydefined notion of an error vector. 
Feb 11 2019 
CANCELLED 

Feb 18 2019 
Séminaire Parisien de Statistique IHP 

Feb 25 2019 
Pas de séminaire 

2019 
Pas de séminaire 

March 11 2019 
Ismael Castillo Sorbonne
Université 
Title: On frequentist false discovery rate of Bayesian multiple
testing procedure Abstract : In many high dimensional statistical settings, a central task is to identify active variables
among a large number of candidates. For the practitioner, a key concern
is not to make too many `false positives’, which correspond to declaring
as active an inactive variable. Given a multiple testing procedure, a
typical aim is then to control its false discovery rate (FDR), that is
the average number of

March 18 2019 
Séminaire Parisien de Statistique IHP 

March 25 2019 
Tengyuan
Liang University of Chicago Booth School of Business 
New Thoughts on
Adaptivity, Generalization and Interpolation Motivated from Neural Networks Abstract: Consider
the problem: given data pair (x, y) drawn from a population with f_*(x) =
E[yx], specify a neural network and run gradient flow on the weights over
time until reaching any stationarity. How does f_t, the function computed by
the neural network at time t, relate to f_*, in terms of approximation and
representation? What are the provable benefits of the adaptive representation
by neural networks compared to the prespecified fixed basis representation
in the classical nonparametric literature? We answer the above questions via
a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the
training process of neural networks. We show that when reaching any local
stationarity, gradient flow learns an adaptive RKHS representation, and
performs the global least squares projection onto the adaptive RKHS,
simultaneously. In addition, we prove that as the RKHS is dataadaptive and
taskspecific, the residual for f˚ lies in a subspace that is smaller
than the orthogonal complement of the RKHS, formalizing the representation
and approximation benefits of neural networks. Then we will move to discuss generalization
for interpolating methods in RKHS. In the absence of explicit regularization,
Kernel “Ridgeless” Regression with nonlinear kernels has the potential to fit
the training data perfectly. It has been observed empirically, however, that
such interpolated solutions can still generalize well on test data. We isolate
a phenomenon of implicit regularization for minimumnorm interpolated
solutions which is due to a combination of high dimensionality of the input
data, curvature of the kernel function, and favorable geometric properties of
the data such as an eigenvalue decay of the empirical
covariance and kernel matrices. In addition to deriving a datadependent
upper bound on the outofsample error, we present experimental evidence
suggesting that the phenomenon occurs in the MNIST dataset. 
2019 
Botond Szabo Leiden
University 
On the fundamental understanding of distributed computation Abstract: In
recent years, the amount of available information has become so vast in
certain fields of applications that it is infeasible or undesirable to carry
out the computations on a single server. This has motivated the design and
study of distributed statistical or learning methods. In distributed methods,
the data is split amongst different administrative units and computations are
carried out locally, in parallel to each other. The outcome of the local
computations are then aggregated into a final result on a central machine. We consider the limitations and guarantees of
distributed methods under communication constraints (i.e. only limited, fixed
amount of bits are allowed to be transmitted between the local and the
central machines) in context of the random design regression model. We derive
minimax lower bounds (which depending on the communication budget can be
substantially higher than the standard nondistributed minimax rates),
matching upper bounds and provide adaptive estimators reaching these limits.
We also consider the case where the number of transmitted bits is taken to be
datadriven and investigate whether one can achieve the minimax nondistributed
estimation rate and at the same time transmit in some sense the optimal
amount of information between the machines. This is a joint work with Harry van Zanten. 
April 8 2019 
Asaf Weinstein Carnegie
Mellon University 
A power analysis for knockoffs using Lasso and thresholdedLasso
statistics Practitioners
using the Lasso are well aware of its limitations as a variable selector.
That good prediction usually leads to many falsely selected variables implies
that to capture most of the signal, one necessarily pays with a high
number of false positives. It is also well known that this situation can be
mitigated by thresholding the Lasso estimates, i.e., by discarding small
estimates one can reduce considerably the number of false selections. Recent works have
studied both these phenomena from a theoretical perspective in the
approximate messagepassing (AMP) framework, providing exact asymptotic
predictions for Type I and Type II error rates. These existing works are
important because they allow to answer both quantitative and qualitative
questions (for example, where to stop on the Lasso path to ensure FDR\leq
\alpha, or at which value of \lambda should the thresholded Lasso estimates
be computed for optimal power?) and compare Lasso to Thresholded Lasso. However,
many of these answers depend on the distribution of the underlying signal. In
this talk I’ll focus on our experience with using Knockoffs to mimic these
oracle procedures in the absence of knowledge about the signal. It will be
demonstrated that the sensitivity of power to the fraction of fake variables
added, is very different for Lasso and for Thresholded Lasso. 
April 15 2019 
Séminaire Parisien de Statistique IHP 

April 22 2019 
Férié 

April 29 2019 
Pas de séminaire 

2019 
Pas de séminaire 

May 13 2019 
Alexey
Naumov Higher School of Economics, Moscow 
Gaussian approximation for
maxima of large number of quadratic forms of highdimensional
random vectors Let (X_i)_{i=1, ... , n} be independent identically
distributed random observations taking values in a high dimensional
space and (Q_j)_{j=1, ..., p} be symmetric positive definite matrices. We consider joint
distribution function of quadratic forms (Q_j S_n, S_n), where S_n is
the normalized sum of observations, and prove the rate of Gaussian
approximation with explicit dependence on n, p and dimension. We also
compare this result with results of Bentkus (2003) and Chernozhukov,
Chetverikov, Kato (2016). Statistical applications will be discussed as
well. The talk is based on the joint project with Friedrich Goetze,
Vladimir Spokoiny and Alexander Tikhomirov. 
May 20 2019 
Journée de l’Ecole doctorale 9h00 – 11h00 

May 20 2019 – 2 p.m. 
Yohann De Castro Ecole
Ponts ParisTech 
Exact False Negative Control using Stopping Times on the LAR's Path In this talk we introduce a new exact testing
procedure in high dimensional linear regression framework using the outcomes
of the standard Least Angle Regression (LAR) algorithm. Under the Gaussian noise
assumption, we give the exact law of the sequence of knots conditional on the
sequence of variables entering the model (i.e., the ‘‘postselection'' law of
the knots of the LAR). Based on this result, we introduce a new exact testing
procedure on the existence of false negatives. This testing procedure can be deployed after any
support selection procedure that will produce an estimation of the support
(i.e., the indexes of nonzero coefficients). The type I error of the test can
be exactly controlled as long as the selection procedure follows some
elementary hypotheses, referred to as ‘‘admissible selection procedures''.
These support selection procedures are such that the estimation of the
support is given by the k first variables entering the model where the random
variable k is a stopping time. 
May 27 2019 
Geoffrey Chinot CREST, ENSAE 
ERM and RERM are tractable
and optimal estimators in the εcontamination model We study the ERM and RERM with Lipschitz and convex
loss functions under a subgaussian assumption on the design. We consider a
setting where O outliers may contaminate the labels. In that case, we
show that the error rate is bounded by r(N) + O/N, where N is the
total number of observations and r(N) is the optimal error rate in the
noncontaminated setting. The main result can be used
for both nonregularized and regularized procedures. For instance
we present results for the Huber's Mestimators without penalization or regularized
by the l1norm. For these two applications we get minimaxoptimal rates in
the εcontamination model. 
2019 
Ilias
Diakonikolas University of Southern California 
Algorithmic Questions in Robust HighDimensional Statistics Fitting a model to a collection
of observations is one of the quintessential questions in statistics. The
standard assumption is that the data was generated by a model of a given type
(e.g., a mixture model). This simplifying assumption is at best only
approximately valid, as real datasets are typically exposed to some source of
contamination. Hence, any estimator designed for a particular model must also
be robust in the presence of corrupted data. This is the prototypical goal in
robust statistics, a field that took shape in the 1960s with the pioneering
works of Tukey and Huber. Until recently, even for the basic problem of
robustly estimating the mean of a highdimensional dataset, all known robust
estimators were hard to compute. Moreover, the quality of the common heuristics
degrades badly as the dimension increases. In this talk, we will survey
the recent progress in algorithmic highdimensional robust statistics. We
will describe the first computationally efficient algorithms for robust mean
and covariance estimation and the main insights behind them. We will also
present practical applications of these estimators to exploratory data
analysis and adversarial machine learning. Finally, we will discuss new
directions and opportunities for future work. The talk will be based on a
number of joint works with (various subsets of) G. Kamath, D. Kane, J. Li, A.
Moitra, and A. Stewart. 
June 10 2019 
Férié 

June 17 2019 
Pas de séminaire 

June 24 2019 10h30 
Tracy Ke Harvard University 

June 24 2019 14h00 
Jonas
Peters University of Copenhagen 




September 16 2019 
Yury Poliansky MIT 

Sept 23 2019 


Sept 30 2019 
Alexandra
Carpentier University of Magdeburg 

Oct 7 2019 
Richard
Nickl University of Cambridge 

Oct 14 2019 


Arnak Dalalyan ENSAE 
Title : Userfriendly bounds for sampling from a
logconcave density using Langevin Monte Carlo Abstract : We will present new bounds on the sampling
error in the case where the target distribution has a smooth and logconcave
density. These bounds are established for the Langevin Monte Carlo and its
discretized versions involving the Hessian matrix of the logdensity. We will
also discuss the case where accurate evaluation of the gradient is
impossible. 

Sept 18 2017 
Séminaire Parisien de
Statistique  IHP 

Sept 25 2017 
Mathias Trabs Université de Hamburg 
Title :
Volatility estimation for stochastic PDE’s using highfrequency observations Abstract : We study the parameter estimation for parabolic,
linear, second order, stochastic partial differential equations (SPDEs)
observing a mild solution on a discrete grid in time and space. A
highfrequency regime is considered where the mesh of the grid in the time
variable goes to zero. Focusing on volatility estimation, we provide an
explicit and easy to implement method of moments estimator based on the
squared increments of the process. The estimator is consistent and admits a
central limit theorem. Starting from a representation of the solution as an
infinite factor model, the theory considerably differs from the
statistics for semimartingales literature. The performance of the
method is illustrated in a simulation study.




Oct 2 2017 
Nicolas Marie Modal’X (Paris 10)/ ESME
Sudria 
Title :
Estimation nonparamétrique dans les équations différentielles dirigées par
le mouvement brownien fractionnaire. Abstract : Après avoir introduit quelques notions de calcul stochastique
trajectoriel, l’exposé présentera un estimateur type NadarayaWatson de la
fonction de drift d’une équation différentielle dirigée par un bruit
multiplicatif fractionnaire. Afin d’établir la consistance de l’estimateur,
les résultats d’ergodicité de Hairer et Ohashi (2007) seront énoncés et
expliqués. Une fois sa consistence établie, la question de la vitesse de
convergence de l’estimateur sera abordée. Il s’agit d’un travail en
collaboration avec F. Comte. 
Zoltan Szabo Ecole Polytechnique 
Title : Characteristic Tensor Kernels Abstract : Maximum mean discrepancy (MMD) and
HilbertSchmidt independence criterion (HSIC) are popular techniques in
data science to measure the difference and the independence of random
variables, respectively. Thanks to their kernelbased foundations, MMD and HSIC are applicable
on a variety of domains including documents, images, trees, graphs,
time series, mixture models, dynamical systems, sets, distributions,
permutations. Despite their tremendous practical success, quite
little is known about when HSIC characterizes independence and MMD with
tensor kernel can discriminate probability distributions, in terms of
the Preprint: https://arxiv.org/abs/1708.08157] 

Oct 16 2017 
Séminaire Parisien de
Statistique  IHP 

Oct 23 2017 
Philip Thompson ENSAE 
Cancelled 
Oct 30 2017 
Vacances 




Nov 6 2017 
Martin Kroll ENSAE 
Title : On minimax
optimal and adaptive estimation of linear functionals in inverse Gaussian
sequence space models Abstract : We consider an inverse problem in a Gaussian
sequence space model where the multiplication operator is not known but
only available via noisy observations. Our aim is not to reconstruct the
solution itself but the value of a linear functional of the solution. In our
setup the optimal rate depends on two different noise levels, the noise
level concerning the observation of the transformed solution and the noise
level concerning the noisy observation of the operator. We
consider this problem from a minimax point of view and obtain upper and lower
bounds under smoothness assumptions on the multiplication operator and the
unknown solution. Finally, we sketch an approach to
the adaptive estimation in the given model using a method combining both
model selection and the GoldenshlugerLepski method.

Séminaire Parisien de
Statistique  IHP 


Nov 20 2017 
Olivier Collier Université
Paris Nanterre 
Title : Estimation
robuste de la moyenne en temps polynômial Abstract : Il
s'agit de résultats obtenus en collaboration avec Arnak Dalalyan. Quand les observations
sont polluées par la présence d'outliers, il n'est plus souhaitable d'estimer
l'espérance par la moyenne empirique. Des méthodes optimales ont été
trouvées, comme la profondeur de Tuckey dans le modèle dit de contamination.
Cependant, ce dernier estimateur n'est pas calculable en temps polynômial.
Dans un modèle gaussien, nous remarquerons que l'estimation de la moyenne
revient à l'estimation d'une fonctionnelle linéaire sous contrainte de
sparsité de groupe. Il est alors naturel d'utiliser grouplasso. Nous
pourrons alors noter plusieurs phénomènes intéressants : dans ce contexte, la
sparsité par groupe permet un gain polynômial par rapport à la seule
sparsité, alors que les études précédentes montraient au mieux un gain
logarithmique, et il semble que l'estimation en temps polynômial ne puisse
pas atteindre la performance optimale des méthodes en temps exponentiel. 
Nov 27 2017 
Elisabeth Gassiat Université
ParisSud 
Title : Estimation of the proportion of explained variation
in high dimensions. Abstract : Estimation of heritability of a phenotypic trait based on genetic data
may be set as estimation of the proportion of explained variation in high
dimensional linear models. I will be interested in understanding the impact
of: — not knowing
the sparsity of the regression parameter, — not knowing
the variance matrix of the covariates on minimax
estimation of heritability. In the
situation where the variance of the design is known, I will present an estimation
procedure that adapts to unknown sparsity. when the
variance of the design is unknown and no prior estimator of it is available,
I will show that consistent estimation of heritability is
impossible. (Joint work
with N. Verzelen, and PHD thesis of A. Bonnet). 



Dec 4 2017 
Philip Thomson ENSAE 
Title : Stochastic approximation
with heavier tails Abstract : We consider the solution of convex
optimization and variational inequality problems via the stochastic
approximation methodology where the gradient or operator can only be accessed
through an unbiased stochastic oracle. First, we show that (nonasymptotic)
convergence is possible with unbounded constraints and a "multiplicative
noise" model: the oracle is Lipschitz continuous with a finite pointwise
variance which may not be uniformly bounded (as classically assumed). In this
setting, our bounds depend on local variances at solutions and the method
uses noise reduction in an efficient manner: given a precision, it respects a
nearoptimal sample and averaging complexities of PolyakRuppert's method but
attains the order of the (faster) deterministic iteration complexity. Second,
we discuss a more "robust" version where the Lipschitz constant L
is unknown but, in terms of error precision, nearoptimal complexities are
maintained. A price to pay when L is unknown is that a large sample regime is
assumed (still respecting the complexity of the SAA estimator) and
"nonmartingalelike" dependencies are introduced. These
dependencies are coped with an "iterative localization" argument
based on empirical process theory and selfnormalization. Joint work with A. Iusem
(IMPA), A. Jofré (CMMChile) and R.I. Oliveira (IMPA). 
Jamal Najim CNRS
UPEM 
Title : Grandes matrices de covariance empiriques Abstract : Les modèles de grandes matrices de
covariance empirique ont été énormément étudiés depuis l’article fondateur de
Marchenko et Pastur en 1967, qui ont décrit le comportement du spectre de
telles matrices quand les deux dimensions (dimension des observations et
taille de l’échantillon) croissent vers l’infini au même rythme. L’objectif
de cet exposé sera dans un premier temps de présenter les résultats standards
(et moins standards!) liés à ces modèles et les outils mathématiques
d’analyse (principalement la transformée de Stieltjes). On insistera ensuite
sur le cas particulier des « spiked models », modèles de grandes
matrices de covariance dans lesquels quelques valeurs propres sont éloignées
de la masse des valeurs propres de la matrice considérée. Ces « spiked
models » sont très populaires en finance, économétrie, traitement
du signal, etc. Enfin,
on s’intéressera à des grandes matrices de covariance dont les observations
sont issues d’un processus à mémoire longue. Pour de telles observations, on
décria le comportement asymptotique et des fluctuations de la plus grande
valeur propre. Ce travail est issu d’une collaboration avec F. Merlevède et
P. Tian. 

Dec 18 2017 
Pas de séminaire 




Eric Moulines Ecole
Polytechnique 
Title : Algorithmes de simulation
de Langevin Abstract : Les algorithmes de Langevin
ont connu récemment un vif regain d’intérêt dans la communauté de
l’apprentissage statistique, suite aux travaux de M. Welling et Y.W. Teh
(‘Bayesian learning via Stochastic gradient Langevin dynamics’, ICML, 2011).
Cette méthode couplant approximation stochastique et méthode de simulation
permet d’envisager la mise en œuvre de méthodes de simulation en grande
dimension et pour des grands ensembles de données. Les applications sont très
nombreuses à la fois dans les domaines «classiques » des statistiques
bayésiennes (inférence bayésienne, choix de modèles) mais aussi en
optimisation bayésienne. Dans cet exposé, nous présenterons quelques travaux récents sur l’analyse
de convergence de cet algorithme. Nous montrerons comment obtenir des bornes
explicites de convergence en distance de Wasserstein et en variation totale
dans différents cadres (fortement convexe, convexe différentiable,
superexponentiel, etc.). Nous nous intéresserons tout particulièrement à la
dépendance de ces bornes dans la dimension du paramètre. Nous montrerons
aussi comment étendre ces méthodes pour des fonctions convexes mais non
différentiables en nous inspirant des méthodes de gradient proximaux. 

Jan 15 2018 
Séminaire Parisien de
Statistique – IHP 

Jan 22 2018 

Data Science 
Jan 29 2018 
Anatoli Juditsky Université de GrenobleAlpes 
Titre : Aggrégation des
estimateurs à partir d’observations indirectes Nous considérons le problème d’agrégation
d'estimation adaptative dans le cas où des observations indirectes du signal
sont disponibles. Nous proposons une approche au problème d’agrégation par
tests quasioptimaux d'hypothèses convexes basée sur la réduction du problème
statistique d’agrégation à des problèmes d'optimisation convexe admettant une
analyse et une mise en œuvre efficace. On montre que cette approche conduit aux algorithmes
quasioptimaux dans le problème classique de l’agrégation  L_2 pour différents
schémas d'observation (par exemple, observations gaussiennes indirectes,
modèle d'observations de Poisson et échantillonnage à partir d’une loi
discrète). Nous discutons également le lien avec le problème lié d’estimation
adaptative. 



2018 
Frédéric Chazal INRIA 
Title: An
introduction to persistent homology in Topological Data Analysis and the
density of expected persistence diagrams. Abstract:
Persistence diagrams play a fundamental role in Topological Data Analysis
(TDA) where they are used as topological descriptors of data represented as
point cloud. They consist in discrete multisets of points in the plane $\R^2$
that can equivalently be seen as discrete measures in $\R^2$. In a first part
of the talk, we will introduce the notions of persistent homology and
persistence diagrams and show how they are built from point cloud data (so no
knowledge in TDA is required to follow the talk). In the second part of the
talk we will show a few properties of persistence diagrams when the data come
as a random point cloud. In this case, persistence diagrams become random
discrete measures and we will show that, in many cases, their expectation has
a density with respect to Lebesgue measure in the plane and we will discuss
its estimation. This is a
joint work with Vincent Divol (ENS Paris / Inria DataShape team) 
Feb 12 2018 
Séminaire Parisien de
Statistique  IHP 

Feb 19 2018 
Laëtitia Comminges Université
ParisDauphine 
Some
effects in adaptive robust estimation under sparsity Abstract: Adaptive
estimation in the sparse mean model and in sparse regression exhibits some
interesting effects. This paper
considers estimation of a sparse target vector, of its $\ell_2$norm and of
the noise variance in the sparse linear model. We establish the optimal rates
of adaptive estimation when adaptation is considered with respect to
the triplet "noise level  noise distribution  sparsity". These
rates turn out to be different from the minimax nonadaptive rates when the
triplet is known. A crucial issue is the ignorance of the noise level.
Moreover, knowing or not knowing the noise distribution can also influence
the rate. For example, the rates of estimation of the noise level can differ
depending on whether the noise is Gaussian or subGaussian without a precise
knowledge of the distribution. Estimation of noise level in our setting
can be viewed as an adaptive variant of robust estimation of scale in
the contamination model, where instead of fixing the "nominal"
distribution in advance we assume that it belongs to some class of
distributions. We also show that in the problem of estimation of a sparse
vector under the $\ell_2$risk when the variance of the noise in unknown, the
optimal rate depends dramatically on the design. In particular, for noise
distributions with polynomial tails, the rate can range from subGaussian to
polynomial depending on the properties of the design. 
Feb 26 2018 
Vacances 




2018 
Rajarshi Mukherjee Berkeley 
Global Testing Against
Sparse Alternatives under Ising Models Abstract:
We study the effect of dependence on detecting sparse signals. In particular,
we focus on global testing against sparse alternatives for the magnetizations
of an Ising model and establish how the interplay between the strength and
sparsity of a signal determines its detectability under various notions of
dependence (i.e. the coupling constant of the Ising model). The impact of
dependence is best illustrated under the CurieWeiss model where we observe
the effect of a "thermodynamic" phase transition. In particular,
the critical state exhibits a subtle "blessing of dependence"
phenomenon in that one can detect much weaker signals at criticality than
otherwise. Furthermore, we develop a testing procedure that is broadly
applicable to account for dependence and show that it is asymptotically
minimax optimal under fairly general regularity conditions. This talk is
based on joint work with Sumit Mukherjee and Ming Yuan. 
Mars 12 2018 
Pas
de séminaire 

Mars 19 2018 
Yihong Wu 10h3012h30 Yale 
Polynomial
method in statistical estimation : from large domain to mixture models –
1 
Mars 22 2018 
Yihong Wu 14h17h Yale 
2 
Mars 26 2018 
Yihong Wu 10h3012h30 Yale 14h00
– 17h00 
3 Ecole doctorale – exposés des
doctorants 
Mars 29 2018 
Yihong Wu 14h17h Yale 
4 



Avril 2 2018 
Férié 

April 5 2018 
Angelika
Rohde 14h15h15 Freiburg Universität 
Geometrizing rates
of convergence under privacy constraints Abstract : We study estimation of a
functional $\theta(\Pr)$ of an unknown probability distribution $\Pr \in\P$
in which the original iid sample $X_1,\dots, X_n$ is kept private even from
the statistician via an $\alpha$local differential privacy constraint. Let
$\omega_1$ denote the modulus of continuity of the functional $\theta$ over
$\P$, with respect to total variation distance. For a large class of loss
functions $l$, we prove that the privatized minimax risk is equivalent to
$l(\omega_1((n\alpha^2)^{1/2}))$ to within constants, under regularity
conditions that are satisfied, in particular, if $\theta$ is linear and $\P$
is convex. Our results extend the theory developed by Donoho and Liu (1991)
to the nowadays highly relevant case of privatized data. Somewhat
surprisingly, the difficulty of the estimation problem in the private case is
characterized by $\omega_1$, whereas, it is characterized by the Hellinger
modulus of continuity if the original data $X_1,\dots, X_n$ are available. We
also provide a general recipe for constructing rate optimal privatization
mechanisms and illustrate the general theory in numerous examples. Our theory
allows to quantify the price to be paid for local differential privacy in a
large class of estimation problems. 
April 6 2018 
Cheng
Mao – 14h15h15 MIT 
Breaking the n^{1/2} barrier for
permutationbased ranking models Abstract : The task of ranking from
pairwise comparison data arises frequently in various applications, such as recommender
systems, sports tournaments and social choice theory. There has been a recent
surge of interest in studying permutationbased models, such as the noisy
sorting model and the strong stochastic transitivity model, for ranking from
pairwise comparisons. Although permutationbased ranking models are richer
than traditional parametric models, a wide gap exists between the
statistically optimal rate n^{1} and the rate n^{1/2} achieved by the
stateoftheart computationally efficient algorithms. In this talk, I will
discuss new algorithms that achieve rates n^{1} and n^{3/4} for the noisy
sorting model and the more general strong stochastic transitivity model
respectively. The talk is based on joint works with Jonathan Weed, Philippe Rigollet,
Ashwin Pananjady and Martin J. Wainwright. 
Avril 9 2018 
Séminaire Parisien de
Statistique  IHP 

Avril 13 2018 
Eric Kolaczyk – 14h15h15 Boston University 
Title:
Dynamic Networks with Multiscale Temporal Structure Abstract: We describe
a novel method for modeling nonstationary multivariate time series, with
timevarying conditional dependencies represented through dynamic networks.
Our proposed approach combines traditional multiscale modeling and network
based neighborhood selection, aiming at capturing temporally local structure
in the data while maintaining sparsity of the potential interactions. Our
multiscale framework is based on recursive dyadic partitioning, which
recursively partitions the temporal axis into finer intervals and allows us
to detect local network structural changes at varying temporal resolutions.
The dynamic neighborhood selection is achieved through penalized likelihood
estimation, where the penalty seeks to limit the number of neighbors used to
model the data. We present theoretical and numerical results describing the
performance of our method, which is motivated and illustrated using
taskbased magnetoencephalography (MEG) data in neuroscience. This is
joint work with Xinyu Kang
and Apratim Ganguly. 
Avril 23 2017 
Vacances 

Avril 30 2018 
Pas
de séminaire 




May 7 2018 
Pas
de séminaire 

May 14 2018 


May 21 2018 
Jour férié 

May 22 2018 
Mikhail Belkin 10h13h Ohia State University 
The
Differential Geometry of Data  1 
May 23 2018 
Mikhail Belkin 16h18h Ohia State University 
2 
May 28 2018 
Sivaraman Balakrishnan Carnegie Melon 

May 29 2018 
Mikhail Belkin 10h13h salle2003 Ohia State University 
3 
May
30 2018 
Mikhail Belkin 16h18h sale2040 Ohia State University 
4 



June 4 
Séminaire Parisien de Statistique  IHP 

June
11 2018 
Jeremy Heng Harvard
University 
Title: Controlled sequential Monte Carlo Sequential Monte Carlo methods, also known as particle methods, are a
popular set of techniques to approximate highdimensional probability
distributions and their normalizing constants. They have found numerous
applications in statistics and related fields as they can be applied to
perform state estimation for nonlinear nonGaussian state space models and
Bayesian inference for complex static models. Like many Monte Carlo sampling
schemes, they rely on proposal distributions which have a crucial impact on
their performance. We introduce here a class of controlled sequential Monte
Carlo algorithms, where the proposal distributions are determined by
approximating the solution to an associated optimal control problem using an
iterative scheme. We provide theoretical analysis of our proposed methodology
and demonstrate significant gains over stateoftheart methods at a fixed
computational complexity on a variety of applications. 
June
18 2018 
Mihai Cucuringu Oxford
University 
Title:
Laplacianbased methods for ranking and constrained clustering We consider
the classic problem of establishing a statistical ranking of a set of n items
given a set of inconsistent and incomplete pairwise comparisons between such items.
Instantiations of this problem occur in numerous applications in data
analysis (e.g., ranking teams in sports data), computer vision, and machine
learning. We formulate the above problem of ranking with incomplete noisy
information as an instance of the group synchronization problem over the
group SO(2) of planar rotations, whose usefulness has been demonstrated in
numerous applications in recent years. Its least squares solution can be
approximated by either a spectral or a semidefinite programming relaxation,
followed by a rounding procedure. We perform extensive numerical simulations
on both synthetic and realworld data sets, showing that our proposed method
compares favorably to other algorithms from the recent literature. We also
briefly discuss ongoing work on extensions and applications of the group
synchronization framework to kway synchronization, list synchronization,
synchronization with heterogeneous information and partial rankings, and
phase unwrapping. We also present
a simple spectral approach to the wellstudied constrained clustering
problem. It captures constrained clustering as a generalized eigenvalue
problem with graph Laplacians. The algorithm works in nearlylinear time and
provides concrete guarantees for the quality of the clusters, at least for
the case of 2way partitioning, via a generalized Cheeger inequality. In
practice this translates to a very fast implementation that consistently
outperforms existing spectral approaches both in speed and quality. 
June
27 2018 Wednesday
14h 
Emtiyaz Khan Riken,
Tokyo 
Title: Fast yet Simple
NaturalGradient Variational Inference in Complex Models Approximate
Bayesian inference is promising in improving generalization and reliability of
deep learning, but is computationally challenging. Modern
variationalinference (VI) methods circumvent the challenge by formulating
Bayesian inference as an optimization problem and then solving it using
gradientbased methods. In this talk, I will argue in favor of
naturalgradient approaches which can improve convergence of VI by exploiting
the information geometry of the solutions. I will discuss a fast yet simple
naturalgradient method obtained by using a duality associated with
exponentialfamily distributions. I will summarize some of our recent results
on Bayesian deep learning, where naturalgradient methods lead to an approach
which gives simpler updates than existing VI methods while performing
comparably to them. Joint
work with Wu Lin (UBC), Didrik Nielsen (RIKEN), Voot Tangkaratt (RIKEN),
Yarin Gal (UOxford), Akash Srivastva (UEdinburgh), Zuozhu Liu (SUTD). Based
on: https://arxiv.org/abs/1806.04854 https://arxiv.org/abs/1703.04265 