Bayesian Model Specification and Bayesian Hierarchical Modeling

• Instructor: David Draper (draper@ams.ucsc.edu, (831) 459-1295)
• Venue: one-day short course sponsored by the San Francisco Chapter of the American Statistical Association, as a follow-on to a previous introductory one-day short course (on 9 Oct 2010) on Bayesian modeling, inference and prediction
• Date: Sunday 6 Nov 2011
• Location: San Francisco Airport Marriott Waterfront, 1800 Old Bayshore Highway, Burlingame, CA 94010 (telephone: 650-692-9100); for a map and directions click here.
• Prerequisites: For those who did not take the first course in Oct 2010, please consult the web page for that course to see the prerequisite background for this second course.
• Registration details are available here.
• Summary of Short Course Contents: This is a follow-on to the short course Bayesian Modeling, Inference and Prediction I gave on 9 Oct 2010, also sponsored by the San Francisco Chapter of the American Statistical Association. In this second course we'll concentrate on two topics: Bayesian model specification and Bayesian hierarchical modeling.
• Bayesian model specification: Suppose You (a generic person wishing to reason sensibly in the presence of uncertainty) are uncertain about something: let's call it θ (this could be almost anything, but for concreteness think of a vector of real numbers of length k). You have an information source (data set) D that You judge to be relevant to decreasing Your uncertainty about θ (D could again be almost anything, but for concreteness think of a vector of real numbers of length n), and You also have background information B summarizing problem context and how D was gathered.

A foundational theorem, proved independently by the statistician and actuary Bruno de Finetti (1937) and the physicist Richard Cox (1946), says that if You wish to summarize Your information about θ, in light of D and B, in a logically-consistent manner, then You need to specify two ingredients, for inference about θ and prediction of future data D* -- namely, p ( θ | B ) (Your prior distribution for θ) and p ( D | θ B ) (Your sampling distribution for D) -- and two more ingredients for decision-making: a set A of possible actions and a utility function U ( a, θ ) quantifying Your judgments about the rewards (monetary or otherwise) that would ensue if You chose action a and the unknown actually took the value θ. This theorem then goes on, through a series of corollaries, to tell You how these ingredients should be combined for logically-consistent inference, prediction and decision-making, using the basic rules we examined in the first short course in Oct 2010 (for example, Bayes's Theorem for inference).

The de Finetti-Cox theorem is great as far as it goes -- it tells You which four ingredients You have to specify -- but it's almost entirely silent on how to specify those ingredients. The first half of this short course will be about how to perform Bayesian model specification: how to go from {the context of the problem and the design of the data-gathering process} to good choices for the ingredients mentioned above.

Let's agree to call { p ( θ | B ), p ( D | θ B ) } Your model M for Your uncertainty about θ -- such a model is needed for inference and prediction. In this short course I'll concentrate mainly on specifying M, not because decision-theory is unimportant but for lack of time to do justice to both inference/prediction and decision-making. Topics to be covered on model specification will include the following:

• Four principles for good Bayesian model-building: the Calibration Principle, the Modeling-As-Decision Principle, the Prediction Principle and the Decision-Versus-Inference Principle.
• How to specify diffuse (low-information-content) and non-diffuse (high-information-content) prior distributions.
• How to deal with model uncertainty: Bayesian model averaging, Bayesian non-parametric methods, and {Bayesian cross-validation as a way to use Your data to find good models while paying the right price for Your data-driven search}.
• How to provide good Bayesian answers to the questions "Is model M1 better than model M2?" and "Is model M1 good enough to stop searching?". Answers to the first type of question involve Bayes factors and log scores; answers to the second type involve a properly-calibrated version of posterior predictive P-values.
• Bayesian hierarchical modeling: Many data sets have a nested, or multilevel, or hierarchical character. Two examples of fields in which this type of data arises frequently include education (when measuring educational quality, Your data set may have students nested in classrooms nested in schools nested in school districts) and medicine (when measuring quality of health care, Your data set may involve patients nested within hospitals nested within geographical regions).

Another example involves the discipline of meta-analysis, in which there are a number of studies of the same basic phenomenon and Your goal is to combine information across studies to produce a more accurate answer than that obtainable from any single study.

Even when Your data set is not hierarchical, there are many situations in which You'll find it useful to employ Bayesian hierarchical modeling, especially when You want to build a mixture model. Mixture models are helpful in settings (which occur frequently) in which You discover that Your data values exhibit unexplained heterogeneity and You want to realistically describe this heterogeneity to produce well-calibrated inferences and predictions.

As usual I'll motivate each unit of methodology in the context of a case study. The case studies in this short course will be drawn from health policy, medicine and the physical sciences, but the methods illustrated will apply to a broad range of subject areas in the natural and social sciences, business (including topics of direct relevance to pharmaceutical companies), and public policy.

The course will liberally illustrate user-friendly implementations of the methods covered via the freeware Bayesian analysis program WinBUGS and the freeware statistical computing package R.

The course is intended mainly for people who often use statistics in their research; some graduate coursework in statistics will provide sufficient mathematical background for participants, and -- since this is a follow-on course from the one a year ago -- I'll assume some familiarity with Bayesian reasoning and methods. To get the most out of the course, participants should be comfortable with hearing the course presenter mention (at least briefly) (a) differentiation and integration of functions of several variables and (b) discrete and continuous probability distributions (joint, marginal, and conditional) for several variables at a time, but all necessary concepts will be approached in a sufficiently intuitive manner that rustiness on these topics will not prevent understanding of the key ideas.

• Biography of Instructor: David Draper is a Professor of Statistics in the Department of Applied Mathematics and Statistics at the University of California, Santa Cruz (USA).

He is a Fellow of the American Association for the Advancement of Science, the American Statistical Association (ASA), the Institute of Mathematical Statistics, and the Royal Statistical Society; from 2001 to 2003 he served as the President-Elect, President, and Past President of the International Society for Bayesian Analysis (ISBA).

He is the author or co-author of more than 100 contributions to the methodological and applied statistical literature, including articles in the Journal of the Royal Statistical Society (Series A, B and C), the Journal of the American Statistical Association, the Annals of Applied Statistics, Bayesian Analysis, Statistical Science, the New England Journal of Medicine, and the Journal of the American Medical Association; his 1995 JRSS-B article on assessment and propagation of model uncertainty has been cited more than 900 times.

His research is in the areas of Bayesian inference and prediction, model uncertainty and empirical model-building, hierarchical modeling, Markov Chain Monte Carlo methods, and Bayesian nonparametric methods, with applications mainly in medicine, health policy, education, and environmental risk assessment.

When he gave an earlier version of the Oct 2010 short course at the Anaheim Joint Statistical Meetings (JSM) in 1997 it received the 1998 ASA Excellence in Continuing Education Award, and a short course he gave on intermediate and advanced-level topics in Bayesian hierarchical modeling at the San Francisco JSM in 2003 received the 2004 ASA Excellence in Continuing Education Award.

He has won or been nominated for major teaching awards everywhere he has taught (the University of Chicago; the RAND Graduate School of Public Policy Studies; the University of California, Los Angeles; the University of Bath (UK); and the University of California, Santa Cruz).

He has a particular interest in the exposition of complex statistical methods and ideas in the context of real-world applications.

• Approximate Structure of the Short Course:

• 8.00-8.30am: Check-in and continental breakfast

• 8.30-9.30am: Session 1: Probability as a way of quantifying information about true/false propositions; the four ingredients in Bayesian model specification
• 9.30-9.40am: Short break
• 9.40-10.40am: Session 2: The Calibration Principle, the Modeling-As-Decision Principle, the Prediction Principle and the Decision-Versus-Inference Principle
• 10.40-10.50am: Coffee break
• 10.50-11.50am Session 3: How to specify diffuse (low-information-content) and non-diffuse (high-information-content) prior distributions
• 11.50am-1.00pm: Lunch
• 1.00-2.00pm Session 4: How to deal with model uncertainty: Bayesian model averaging, Bayesian non-parametric methods, and Bayesian cross-validation
• 2.00-2.10pm: Short break
• 2.10-3.10pm Session 5: How to provide good Bayesian answers to the questions "Is model M1 better than model M2?" and "Is model M1 good enough to stop searching?": Bayes factors, log scores, and a properly-calibrated version of posterior predictive P-values.
• 3.10-3.20pm Coffee break
• 3.20-4.20pm Session 6: Bayesian hierarchical models for data with a nested or clustered character
• 4.20-4.30pm: Short break
• 4.30-5.30pm Session 7: Bayesian mixture modeling, and how it can be implemented hierarchically

• Lecture notes: (PDF format)

• Lecture notes: part 1 (Bayesian model specification) (140 pages)
• Lecture notes: part 2 (Bayesian hierarchical modeling) (79 pages)
• A paper fitting Bayesian parametric and nonparametric models to the IHGA data: Krnjajic, Kottas, and Draper (2008, Parametric and nonparametric Bayesian model specification: A case study involving models for count data, Computational Statistics and Data Analysis, 52, 2110-2128).
• Material on log scores and a well-calibrated answer to the question "Could the data have come from model M?", from a short course I gave at Yahoo a few years ago.
• Documentation for the CRAN package DPpackage for fitting Dirichlet-process mixture models and Polya-tree models.
• Datasets and WinBUGS Code: (text format)

• R code for the Bayesian qualitative-quantitative inferential analysis of the IHGA data.
• R code for the Polya tree simulations.
• R code for the NB10 Polya-tree analysis.
• NB10 example:

• Aspirin example:

• IHGA example:

visits since web counter installed on 15 October 2011.