PhD Proposal - UQAM, Canada The use of Determinantal point processes for Monte-Carlo experiments and computer experiments Jean-Fran¸cois Coeurjolly & Pierre-Olivier Amblard

Introduction

Point processes model random sets of points or events in interaction. A point process X on S (S is usually a subset of Rd or a subset of {1, . . . , N}) is a locally finite measure on S whose realization is of the form x = {x1 , . . . , xn } where n is random and xi ∈ S are the events. Such mathematical objects have a long story, see e.g. the pioneering work by John Snow whose study was to understand the locations of cholera cases observed in a quarter of London and to connect them to public water pumps (see e.g. [7]). The formal definition and the development of appropriate statistical methodologies are much more recent and go back from the 90s ([2, 3, 11, 6, 4]), which partly explains why this domain is still in expansion and requires the attention of statisticians. There exist many classes of spatial point processes models which can generate attractive patterns or repulsive patterns or both. We can cite the classes of Poisson point processes (which generate point without interaction), Cox point processes, Gibbs point processes, Determinantal point processes. The latter one, DPP for short, will be the focus of the thesis. DPP have been introduced in the statistics community very recently (see [9]). When d d S = Rd and given a symmetric  kernelfunction  K : R × R → R, the k-th order intensity function of a DPP is given by ρ(k) (u1 , . . . , uk ) = det K(ui , u j ) , where roughtly speaking ρ(k) (u1 , . . . , uk )du1 . . . duk is the probability 1≤i, j≤k

to observe k points in infinitesimal balls at locations u1 , . . . , uk . Conditions on K are of course required to ensure the existence of such models. DPP form an interesting class due to its tractability: the intensity functions as well as the Papangelou conditional intensity are explicit. Finally, it is to be noticed that DPP produce only repulsive point patterns. In the field of computer experiments, simulation of designs replace the real data generating process. Under a lack of information on how inputs are linked to outputs, one strategy is to spread points evenly throughout the experimental region to cover all the input space. This technique is called space-filling design (see [12, 13]) and the goal can be summarized by generating n points in [0, 1]d which “nicely cover” [0, 1]d (Figure 1 gives an idea when d = 3). Latin hypercubes, low discrepancy sequences, quasi-Monte-Carlo methods are standard methods to generate designs. It has been recently proposed by [5] to use one specific class Figure 1: of spatial point processes for this task. DPP for computer experiments

As illustrated by Figure 1, point processes have recently been used to generate space-filling designs (see [5]). The authors made use of a specific Gibbs point process namely the Strauss model with log-Papangelou conditional intensity P log λ(u, x) = θ1 + θ2 v∈x 1(kv − uk ≤ R), where k · k is the ddimensional Euclidean norm and R > 0 is some fixed number. To give a quick interpretation, when θ2 > 0, it means that it 1

Figure 2:

is less likely to have a point at u with a high number of R-closed neighbors than for the Poisson point process. This will result in quite regular point patterns for large values of θ2 . Moments for Gibbs point processes are not explicit and these Gibbs point processes are not repulsive in the sense of [11] or [6, Section 6.5]: the probability to observe a pair of distinct points should be smaller than the equivalent under independence. Another problem: a desirable property, that Latin hypercubes are able to handle, is that the configuration of projected points on every margin keeps similar properties than the initial design. Figure 2 shows planar projections of the 3dimensional configuration of points from Figure 2. The previous model can obviously not satisfy the constraint. P P To overcome this, [5] proposed to add terms like dj=1 v∈x f (|u j − v j |) where f is some nonnegative increasing function and where u j is the jth coordinate of a point u. The resulting projected point patterns seem to be more regular however from a theoretical point of view, it is absolutely unclear what the properties of the final design are. The current solutions/approach is not satisfactory. One objective of the PhD is to investigate the use of DPP, a flexible and tractable class of repulsive patterns, for space-filling designs. The idea is the following: find a kernel K such that any projection has intensity functions similar to the ones of a DPP. If such a construction is possible then necessarily the projected point pattern is a DPP and therefore will exhibit regularity. The idea is very promising and innovative and deserves to be deeply studied both from a theoretical point of view (optimization of the kernel K) and computational point of view (DPP are costly to simulate, see e.g. [8, 9]). It has also to be compared with standard methods. From there, many extensions could be thought of (incorporation of global sensitivity information, sequential sampling,. . . ). Subsampling and machine learning Let us illustrate the last part with the following simple computations. Consider a population of N individuals, P over which is measured a characteristic y1 , . . . , yN and consider the problem of estimating θ = i yi . Subsampling consists in picking n individuals out of N. From a point process point of view, let X be a point process on P {1, . . . , N}. An estimate of θ is given by θˆ = i∈X yi /πi where πi = P(i ∈ X). It can easily be shown that θˆ ˆ = Pi y2 πi (1 − πi ) + Pi, j yi y j (πi j − πi π j )/πi π j , where for i , j, is an unbiased estimator of θ with variance Var(θ) i πi j = P({i, j} ∈ X). Such an equation is the main reason why point processes, and in particular DPP, have gained interest in sample surveys and machine learning ([8, 10]). Indeed, the use of a DPP ensures that πi j < πi π j . This P implies, if the yi ’s have the same sign, that the variance of θˆ using a DPP sampling is smaller than i y2i πi (1 − πi ), which corresponds to the variance estimate for a Poisson sampling. This is the key-point of the second objective oif the PhD: investigate point processes models to improve subsampling procedures. Let me cite two key-objectives which will be considered. First, a recent and promising result obtained by [1] concerns Monte-Carlo integration of a smooth function, say h. √ [1] conR structed a smart DPP and were able to obtain an estimator of [−1,1]d h(x)dx with rate of convergence n1+1/d , where n is the number of sampling points. It is worth understanding how the theory developed by [1] could be transfered to discrete point processes. Second, assume yi is now a centered stationary process with variance 1 and correlation function c. Then, the term yi y j in the previous double sum reduces to c( j − i), which for some processes can be negative. Models for which πi j > πi π j , i.e. discrete clustered point processes, are needed. As far as we know, such models have never been studied in the literature. Prerequisites: Strong background in statistics, probability and mathematics; master in statistics; skills in programming in R (or C++, Python, matlab). PhD Grant: To be discussed; apart of the PhD grant the candidate may be awarded an excellence grant from UQAM (excellence program FARE, $6,000 a year), may be assistant teacher for one or two courses (approximately $24 per hour for 60 hours). It is also expected that the candidate participates to the prospecting of funds. Context: The PhD will take place at UQAM (Universit´e du Qu´ebec `a Montr´eal) and will be supervised by Jean-Fran¸cois Coeurjolly, [email protected] (UQAM, Pavillon Kennedy, PK-5125) and Pierre-Olivier Amblard (Gipsa-lab, Universit´e de Grenoble Alpes, France). The standard duration of a PhD in Canada is four years including a series of courses the candidate has to follow the first year (note also that during the first year the candidate has to pass with sucess two exams in statistics to be allowed to continue the doctoral program). 2

References [1] R. Bardenet and A. Hardy. arXiv:1605.00361, 2016.

Monte Carlo with determinantal point processes.

arXiv preprint

[2] N. A. C. Cressie. Statistics for Spatial Data. Wiley, New York, second edition, 1993. [3] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume I: Elementary Theory and Methods. Springer, New York, second edition, 2003. [4] P. Diggle. Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, 2013. [5] D. Dupuy, C. Helbert, and J. Franco. DiceDesign and DiceEval: two R packages for design and analysis of computer experiments. Journal of Statistical Software, 65(11):1–38, 2015. [6] J. Illian, A. Penttinen, H. Stoyan, and D. Stoyan. Statistical Analysis and Modelling of Spatial Point Patterns. Statistics in Practice. Wiley, Chichester, 2008. [7] S. Johnson. The ghost map: The story of London’s most terrifying epidemic–and how it changed science, cities, and the modern world. Penguin, 2006. [8] A. Kulesza and B. Taskar. arXiv:1207.6083, 2012.

Determinantal point processes for machine learning.

arXiv preprint

[9] Fr´ed´eric Lavancier, Jesper Møller, and Ege Rubak. Determinantal point process models and statistical inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(4):853–877, 2015. [10] V. Loonis and X. Mary. Determinantal sampling designs. arXiv preprint arXiv:1510.06618, 2015. [11] J. Møller and R.P. Waagepetersen. Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton, 2004. [12] J. Sacks, W. Welch, T. Mitchell, and H. Wynn. Design and analysis of computer experiments. Statistical Science, pages 409–423, 1989. [13] T.J. Santner, B.J. Williams, and W.I. Notz. The design and analysis of computer experiments. Springer Science & Business Media, 2013.

3

PhD Proposal - UQAM, Canada The use of ...

Monte-Carlo experiments and computer experiments. Jean-François ... throughout the experimental region to cover all the input space. This technique is called ...

333KB Sizes 3 Downloads 204 Views

Recommend Documents

PhD (or MSc) opportunity [Memorial University, Canada] Atlantic ...
Applications are sought for a PhD position under the ... Electronic tagging of upstream migrating salmon & subsequent tracking to spawning ... to this system.

Cifre PhD Proposal: “Learning in Blotto games and ... - Eurecom
Keywords: Game theory, sequential learning, Blotto game, social networks, modeling. Supervisors ... a list of courses and grades in the last two years (at least),.

salledepresse-uqam-ca.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. salledepresse-uqam-ca.pdf. salledepresse-uqam-ca.pdf. Open. Extract. Open with. Sign In. Main menu.

PhD Scholarship on Land-use Scenario Development for ... - AfricaRice
The PhD candidate focusing on land-use scenario development for African ... 1) Application letter, stipulating the motivation for pursuing a PhD degree and ...

*PhD in Conservation Biology (Turtles, Land use, and Climate Change ...
Applicants must possess bachelor's degree and preferably a master's degree in animal ecology or closely related field. Applicants with strong quantitative skills ...

*PhD in Conservation Biology (Turtles, Land use, and Climate Change ...
Applicants must possess bachelor's degree and preferably a master's degree in animal ecology or closely related field. Applicants with strong quantitative skills ...

PhD - Memorial University of Newfoundland
Data indicate that there is a particularly poor stock/recruitment relationship. ... The ideal candidate will have a background in ecology or fisheries, & field work ...

The Maharaja Sayajirao University of Baroda PhD Faculty of Social ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

The Maharaja Sayajirao University of Baroda PhD Faculty of ...
14) Minya Shuori related to which vocal form of. singing ... Displaying The Maharaja Sayajirao University of Baroda PhD Faculty of Performing Arts 2015.pdf.

The Maharaja Sayajirao University of Baroda PhD Faculty of Family ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

The Use of GIS - Esri
ary school education. ... power, international trade, industrializa- ... eight years ago at DevelsteinCollege at ... 6 THE LEARNING TEACHER MAGAZINE 1/2016.

Food Sources of Fibre - Dietitians of Canada
Papaya. ½ fruit. 2.6. Apple, with skin. 1 medium. 3.5. Star fruit. 1 medium. 2.5. Raisins. 60 mL (1/4 cup). 2.5. Nectarine. 1 medium. 2.3. Grapefruit (pink, red, white).

Panel Proposal
Choose an option. ( ) Member of SAAS ( ) Member of ASA ( ) Processing Membership. Title of Proposed panel: Panel Abstract (200-300 words): Please, complete this form and send it, in electronic format (via e-mail), to board members. Rodrigo Andrés (r

PHD Media Agency of the Year.pdf
... apps below to open or edit this item. PHD Media Agency of the Year.pdf. PHD Media Agency of the Year.pdf. Open. Extract. Open with. Sign In. Main menu.

PhD Opportunity: University of St Andrews and The ... -
PhD Opportunity: University of St Andrews and The James Hutton Institute, UK. Designing a Diagnostic Tool for the Detection of Viral Diseases in Agricultural ...

Keith Lohse, PhD , Lara Boyd, PT PhD , and ...
R package version 2.5. http://CRAN.R-project.org/package=wordcloud. 3. Meyer, D., Hornik, K., & Feinerer, I. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5): 1-54. URL: http://www.jstatsoft.org/v25/i05/. 4. R Core Team

PhD Studentship at the University of Leeds Further ... -
... and Birkbeck College, University of London, from April 2014 to March 2018. ... your educational history with degree and exam results, and any awards; special ...

CARTA PhD Fellowships - University of Nairobi
CARTA aims to achieve reforms in higher education by: 1) Assisting universities to develop ... higher education in addressing the training and retention of the next generation of academics in the re- gion. Women are particularly encouraged to ... A M