LECTURE 3: MORE STATISTICS AND INTRO TO DATA MODELING Summarizing the posterior information: mean or mode, and variance. Typically we are interested in more than mean and variance Posterior intervals: e.g. 95% credible interval can be constructed as central (relative to median) or highest posterior density. Typically these agree, but:

How to choose informative priors? • Conjugate prior: where posterior takes the same form as the prior • Example: beta distribution is conjugate to binomial (HW 2) • Can be interpreted as additional data • For gaussian with known s: • Posterior: • Completing the square:

Posterior predictive distribution • Predicting future observation conditional on current data y

Two sources of uncertainty!

Non-informative priors

• No prior is truly non-informative, because the transformation of variable changes it • Priors can be improper: do not integrate to 1. But posteriors must be proper (this must be checked) • Jeffrey’s prior based on Fisher information matrix (to be discussed later): not a universal recipe • Pivotal quantity has distribution independent of y and parameter l: if this is y- l then this is a location parameter: uniform prior. E.g. mean of a gaussian • Scale parameter: pivotal in y/l. This leads to uniform prior in log l. E.g. variance of a gaussian • Prior is rarely an issue in 1-d: either the data are good in which case prior does not matter or are not (so get more data!) • Priors can become problematic in many dimensions, specially if we have more parameters than needed by the data: posteriors can be a projection of multi-dimensional priors without us knowing it: care must be taken to avoid this (we will discuss further)

Modern statistical methods (Bayesian or not) Gelman etal, 3rd edition

INTRO TO MODELING OF DATA We are given N number of data measurements (xi,yi) Each measurement comes with an error estimate si We have a parametrized model for the data y=y(xi) We think the error probability is gaussian and the measurements are uncorrelated:

We can parametrize the model in terms of M free parameters y(xi|a1,a2,a3,…,aM) Bayesian formalism gives us the full posterior information on the parameters of the model

We can assume a flat prior p(a1,a2,a3,…,aM) = const In this case posterior prop. to likelihood Normalization (evidence, marginal) p(yi) not needed if we just need relative posterior density

Maximum likelihood estimator (MLE) Instead of the full posterior we can ask what is the best fit value of parameters a1,a2,a3,…,aM We can define this in different ways: mean, median, mode Choosing the mode (peak posterior or peak likelihood) means we want to maximize the likelihood: maximum likelihood estimator (or MAP for non-uniform prior)

Maximum likelihood estimator

Since si does not depend on ai MLE means minimizing c2

This is a system of M nonlinear equations for M unknowns

Fitting data to a straight line

Solve this with linear algebra

What about the errors? • We approximate the log posterior around its peak with a quadratic function • The posterior is thus approximated as a gaussian • This goes under name Laplace approximation • Note that the errors need to be described as a matrix

C-1=a is called precision matrix

Asymptotics theorems (Le Cam 1953, adopted to Bayesian posteriors)

• Posteriors approach a multi-variate gaussian in the large N limit (N: number of data points): this is because the 2nd order Taylor expansion of ln L is more and more accurate in this limit, ie we can drop 3rd order terms • The marginalized means approach the true value and the variance approaches the Fisher matrix, defined as ensemble average of precision matrix • The likelihood dominates over the prior in large N limit • There are counter-examples, e.g. when data are not informative about a parameter or some linear combination of them, when number of parameters M is comparable to N, when posteriors are improper or likleihoods are unbouded… Always exercise care! • In practice the asymptotic limit is often not achieved for nonlinear models, ie we cannot linearize the model across the region of non-zero posterior: this is why we often use sampling to evaluate the posteriors instead of gaussian

Bayesian view • The posterior distribution p(a,b|yi) is described as a 2-d C-1 ellipse in (a,b) plane • At any fixed value of a (b) the posterior of b (a) is a gaussian with variance [C-1bb(aa)]-1 • If we want to know the error on b (a) independent of a (b) we need to marginalize over a (b) • This marginalization can be done analytically, and leads to Cbb(aa) as the variance of b (a) • This will increase the error: Cbb(aa)>[C-1bb(aa)]-1

Multivariate linear least squares • We can generalize the model to a generic functional form yi=a0X0(xi)+a1X1(xi)+…+aM-1XM-1(xi) • The problem is linear in aj and can be nonlinear in xi, e.g. Xj(xi)=xij • We can define design matrix Aij=Xj(xi)/si and • bi=yi/ si

Design matrix NR, Press etal

Solution by normal equations

Gaussian posterior Marginalization over nuisance parameters

• If we want to know the error on j-th parameter we need to marginalize over all other parameters • In analogy to 2-d case this leads to sj2=Cjj • So we need to invert the precision matrix a=C-1 • Analytic marginalization is only possible for a multi-variate gaussian distribution: a great advantage of using a gaussian • If the posterior is not gaussian it may be made more gaussian by a nonlinear transformation of the variable

What about multi-dimensional projections? • Suppose we are interested in n components of a, marginalizing over remaining M-n components. • We take the components of C corresponding to n parameters to create n x n matrix Cproj • Invert the matrix to get precision matrix Cproj-1 • Posterior distribution is proportional to exp(-daprojTCproj-1 daproj/2), which is distributed as exp(-Dc2/2), i.e. c2 with n degrees of freedom

Credible intervals under gaussian posterior approximation

• We like to quote posteriors in terms of X% credible intervals • For gaussian likelihoods most compact posteriors correspond to a constant change Dc2 relative to MAP/MLE • The intervals depend on the dimension: example for X=68

We rarely go above n=2 dimensions in projections (difficult to visualize)

To solve the normal equations to obtain best fit values and the precision matrix we need to learn linear algebra numerical methods: topic of next lecture

Literature • Numerical Recipes, Press et al., Ch. 15 (http://apps.nrbook.com/c/index.html) • Bayesian data analysis, Gelman et al. Ch. 1-4

lecture 3: more statistics and intro to data modeling - GitHub

have more parameters than needed by the data: posteriors can be ... Modern statistical methods (Bayesian or not) .... Bayesian data analysis, Gelman et al.

2MB Sizes 0 Downloads 274 Views

Recommend Documents

lecture 2: intro to statistics - GitHub
Continuous Variables. - Cumulative probability function. PDF has dimensions of x-1. Expectation value. Moments. Characteristic function generates moments: .... from realized sample, parameters are unknown and described probabilistically. Parameters a

Lecture I: Course Overview, Intro to Data Science, and R - GitHub
Lecture I: Course Overview,. Intro to Data Science, and R. Data Science for Business Analytics. Thibault Vatter . Department of Statistics, Columbia University and HEC Lausanne, UNIL. 26.02.2018 ...

Intro to Webapp - GitHub
The Public Data Availability panel ... Let's look at data availability for this cohort ... To start an analysis, we're going to select our cohort and click the New ...

Intro to Webapp IGV - GitHub
Home Page or the IGV Github Repository. We are grateful to the IGV team for their assistance in integrating the IGV into the ISB-CGC web application.

Intro to Google Cloud - GitHub
The Cloud Datalab web UI has two main sections: Notebooks and Sessions. ... When you click on an ipynb file in GitHub, you see it rendered (as HTML).

Intro to Google Cloud - GitHub
Now that you know your way around the Google Cloud Console, you're ready to start exploring further! The ISB-CGC platform includes an interactive Web App, ...

Intro to Webapp SeqPeek - GitHub
brought to you by. The ISB Cancer Genomics Cloud. An Introduction to the ISB-CGC Web App SeqPeek. Page 2. https://isb-cgc.appspot.com. Main Landing ...

Intro to Google Cloud - GitHub
known as “Application Default Credentials” are now created automatically. You don't really need to click on the “Go to. Credentials”, but in case you do the next ...

intro slides - GitHub
Jun 19, 2017 - Learn core skills for doing data analysis effectively, efficiently, and reproducibly. 1. Interacting with your computer on command line (BASH/shell).

Lecture 3
Oct 11, 2016 - request to the time the data is available at the ... If you want to fight big fires, you want high ... On the above architecture, consider the problem.

Lecture 1 - GitHub
Jan 9, 2018 - We will put special emphasis on learning to use certain tools common to companies which actually do data ... Class time will consist of a combination of lecture, discussion, questions and answers, and problem solving, .... After this da

Transcriptomics Lecture - GitHub
Jan 17, 2018 - Transcriptomics Lecture Overview. • Overview of RNA-Seq. • Transcript reconstruc囉n methods. • Trinity de novo assembly. • Transcriptome quality assessment. (coffee break). • Expression quan懿a囉n. • Differen鶯l express

Data 8R Tables and more Visualizations Summer 2017 1 ... - GitHub
Jul 11, 2017 - At the same time, the researcher also records the number of ... A business has graphed the proportion of outputs in each year as a bar chart.

Data 8R Tables and more Visualizations Summer 2017 1 ... - GitHub
number of colds each volunteer gets. Is this an observational ... questions about it. A business has graphed the proportion of outputs in each year as a bar chart.

Data 8R Intro to Visualizations Summer 2017 1 Similarity and ... - GitHub
Jun 27, 2017 - The chips that are present in your computer contain electrical components called transistors. ... Here's another attempt to improve the plot:.

lecture 15: fourier methods - GitHub
LECTURE 15: FOURIER METHODS. • We discussed different bases for regression in lecture. 13: polynomial, rational, spline/gaussian… • One of the most important basis expansions is ... dome partial differential equations. (PDE) into ordinary diffe

Scientific python + IPython intro - GitHub
2. Tutorial course on wavefront propagation simulations, 28/11/2013, XFEL, ... written for Python 2, and it is still the most wide- ... Generate html and pdf reports.

Statistics for Data Exploration and Modeling (SDEM-01) Participant ...
Jun 29, 2014 - E-mail : Phone: Fax. : Mobile: Details of the participants attending the program: ... Send completed form with fees and attachments to : Program ...