YSM 2007 - final booklet - web version

Viewer
Transcript

Young Statisticians’ Meeting 2007

University of the West of England Bristol

11 – 12 April 2007

Organised by Joanne Blackwell, Alex Hudson, Sue Pioli and Helen Thomas, with thanks to the other young statisticians at UK Transplant

SPONSORS The YSM team would like to thank all our sponsors for their very generous support. Medical Research Council

www.mrc.ac.uk/Careers

Winton Capital Management

www.wintoncapital.com

The Office for National Statistics

www.statistics.gov.uk

The Royal Statistical Society

www.rss.org.uk

Mango Solutions

www.mango-solutions.com

GlaxoSmithKline

www.gsk.com

Man Investments

www.mangroupplc.com

Pfizer

www.pfizer.co.uk

The Institute of Mathematics and its Applications

www.ima.org.uk

Blackwell Publishing

www.blackwellpublishing.com

National Foundation for Educational Research

www.nfer.ac.uk

ATASS Sports

www.atassltd.co.uk

UK Transplant

www.uktransplant.org.uk

The University of the West of England

www.uwe.ac.uk

Exhibiting sponsors: Medical Research Council, Winton Capital Management, The Office for National Statistics, The Royal Statistical Society, Mango Solutions, Pfizer, Man Investments, GlaxoSmithKline, National Foundation for Educational Research and UK Transplant

2

YSM 2007 PROGRAMME

Wednesday 11 April 2007 Time

Event

Location

13:45 – 14:30 Registration & refreshments

Street Corridor & Cafe

14:30 – 15:40 Welcome & plenary session with Prof. Dave Collett

Lecture theatre 2B020

15:40 – 16:00 Refreshments

Street Corridor & Cafe

16:00 – 17:30 Parallel session 1

Rooms 2S601 & 2S603

17:30 – 18:00 Sponsors’ plenary session

Lecture theatre 2B020

18:00 – 19:00 Sponsors’ wine reception & poster session

Street Corridor & Cafe

19:00 – 20:30 Hot buffet dinner

OneZone dining area

20:30 – 20:45 Luggage collection

Room 2B026

20:45

UWE main bus stops

Coaches to Ibis Hotel (30 mins)

Thursday 12 April 2007 Time

Event

Location

08:40

Coaches to UWE (30 mins)

Ibis Hotel (central Bristol)

09:30 – 11:10 Parallel session 2

Rooms 2S601 & 2S603

11:10 – 11:40 Refreshments

Street Corridor & Cafe

11:40 – 13:00 Parallel session 3

Rooms 2S601 & 2S603

13:00 – 14:00 Buffet lunch

Street Corridor & Cafe

14:00 – 15:00 Plenary session with Dr John Haigh

Lecture theatre 2B020

15:00 – 15:20 Refreshments

Street Corridor & Cafe

15:20 – 17:00 Parallel session 4

Rooms 2S601 & 2S603

17:00 – 18:00 RSS debate with Prof. Peter Green

Lecture theatre 2B020

18:00

UWE main bus stops

Coaches to Ibis Hotel (30 mins)

20:00 – 02:00 Pre-dinner drinks, conference dinner & prize giving

3

Byzantium (central Bristol)

PROGRAMME FOR PLENARY SESSIONS

Plenary session with Prof. Dave Collett - Wednesday 11 April, 14:30, Lecture theatre 2B020 Chaired by Joanne Blackwell Some Statistical Problems in Organ Donation and Transplantation Prof. Dave Collett, Director of Statistics and Audit at UK Transplant

Sponsors’ plenary session - Wednesday 11 April, 17:30, Lecture theatre 2B020 Chaired by Sue Pioli Sponsor talks from:

Medical Research Council, Winton Capital Management, The Office for National Statistics

Plenary session with Dr John Haigh - Thursday 12 April, 14:00, Lecture theatre 2B020 Chaired by Helen Thomas Uses and Limitations of Mathematics in Sport Dr John Haigh, Reader of Statistics at the University of Sussex

RSS Debate with Prof. Peter Green - Thursday 12 April, 17:00, Lecture theatre 2B020 Chaired by Alex Hudson A number of general interest topics relevant to today’s young statisticians will be put forward for debate amongst the panel and opened up to the floor for comment.

4

PROGRAMME FOR PARALLEL SESSIONS Parallel session 1A – Wednesday 11 April, 16:00, Room 2S601 Chaired by Mark Chatfield 16:00

Atopic dysfunction and risk of Central Nervous System Tumours in children Nick Harding

16:20

A sensitivity analysis for exploring misclassification in self-reported exposures for the study of association between parental occupational exposure and childhood cancer Olaide Raji

16:40

Psychotic-like symptoms in the ALSPAC birth cohort at age 12 years Kate Thomas

17:00

The Relationship between prescribing in Scotland Mary-Jane Anderson

Hypnotics/Anxiolytics

and

Antidepressant

Parallel session 1B – Wednesday 11 April, 16:00, Room 2S603 Chaired by Tiffany Lay 16:00

Flexible Parametric Models for Relative Survival, with Application in Coronary Heart Disease Christopher Nelson

16:20

Model selection for high-dimensional, censored survival data Claudia-Martina Messow

16:40

Factors affecting time to and development of a malignancy after a solid organ transplant Lisa Mumford

17:00

Applications of Survival Analysis in Financial Services Mohit Dhillon

5

Parallel session 2A – Thursday 12 April, 09:30, Room 2S601 Chaired by Patrick Phillips 09:30

Statistical Process Control Methods For Monitoring Blood Components Elinor Curnow

09:50

Local & Marginal Cumulative Sum Charts for Comparative Monitoring of Methicillin-Resistant Staphylococcus Aureus Rates in UK NHS Trusts Olivia Grigg

10:10

A classification method for diagnostic settings with a repeatedly measured biomarker: Longitudinal quadratic discriminant analysis Mareike Kohlmann

10:30

Semi-Markov and hidden semi-Markov models for interval censored multistate data Andrew Titman

10:50

Do people eat more fish on Friday? An introduction to a non-parametric repeated measures test when there are missing data, or an unbalanced design, applied to nutritional data Mark Chatfield

Parallel session 2B – Thursday 12 April, 09:30, Room 2S603 Chaired by Matthew Nunes 09:30

The use of ordinal regression models in epidemiology: an example from a study of mobility limitation Sara Mottram

09:50

Some Aspects of Errors in Variables Modelling: Fitting a Straight Line Jonathan Gillard

10:10

Model averaging in multivariate regression Edward Cripps

10:30

Combining Two or More Graphical Gaussian Models: a Missing Data Perspective Maria Sofia Massa

10:50

On the impact of contaminations in Graphical Gaussian models Simona Pacillo

6

Parallel session 3A – Thursday 12 April, 11:40, Room 2S601 Chaired by Edward Cripps 11:40

Modelling longitudinal spatial curve data Sarah Barry

12:00

On intensity-dependence of tree properties in a marked point pattern of trees Mari Myllymäki

12:20

Bayesian estimation of space and size distribution of trees from LIDAR measurements Aki Niemi

12:40

Spatial point processes and graphs Tuomas Rajala

Parallel session 3B – Thursday 12 April, 11:40, Room 2S603 Chaired by Maria Sofia Massa 11:40

Design of Experiments for Data Networks Ben Parker

12:00

Identification and Bayesian Inference of the MAPK Pathway Using Western Blot Data Vilda Purutçuoğlu

12:20

A Nondecimated Second Generation Wavelet Transform Marina Knight

12:40

A Variance-stabilising algorithm for Binomial intensity estimation Matthew Nunes

7

Parallel session 4A – Thursday 12 April, 15:20, Room 2S601 Chaired by Olaide Raji 15:20

Meta Analysis of Pre-Clinical/Experimental Stroke Studies Laura Gray

15:40

Variations in Primary Open-Angle Glaucoma Prevalence by Age, Gender, and Race: A Bayesian Meta-Analysis Shahrul Mt-Isa

16:00

Meta-analysis of Mendelian randomization studies Tom Palmer

16:20

Using Statistical Models to Identify Factors that have a Role in Defining the Abundance of Ions Produced by Tandem Mass Spectroscopy Sheila Barton

16:40

Identifying and Evaluating Prognostic and Surrogate Markers for Response in the Treatment of Tuberculosis Patrick Phillips

Parallel session 4B – Thursday 12 April, 15:20, Room 2S603 Chaired by Sarah Barry 15:20

Design Issues in Drug Court Trials Elizabeth Merrall

15:40

A Statistical and Multivariate Longitudinal Analysis of Poverty Indices in Middle East & North Africa and Africa Yarim Shamsan

16:00

Offshore Compliance: Using Regression and Risk Modelling to Select Cases from Large Datasets Nadeer Khan

16:20

What does it mean to be a General Insurance Statistician? Laura Williams & Peter Grahame

16:40

WTS (Web Technologies for Statisticians) Romain Francois

8

POSTER SESSION Wednesday 11 April, 18:00, Street Corridor Automatic and Selective Editing in the Office for National Statistics Robert Bucknall A Multiple Regression Model for Country Risk Assessment Aniela Danciu Applications of Survival Analysis in Financial Services Mohit Dhillon Confidence Intervals and P-values for Meta- Analysis with Publication Bias Masayuki Henmi SBV Discriminant Analysis Hayley Johnson Using satellite data to validate integral processes within ecosystem models Laura Limer Quasi-stationarity of stochastic models for the spread of infectious diseases Sang Mendy Drugs-related deaths in the fortnight after release from prison: a meta-analysis Elizabeth Merrall A Study of Chaotic Intermittency Maps and an Analysis of Consumer Data David Natsios Identifying and Evaluating Prognostic and Surrogate Markers for Response in the Treatment of Tuberculosis Patrick Phillips A Statistical and Multivariate Longitudinal Analysis of Poverty Indices in Middle East & North Africa and Africa Yarim Shamsan Does Population Mixing Measure Infectious Exposure at the Community Level? John Taylor

9

Abstracts

10

Atopic dysfunction and risk of Central Nervous System Tumours in children Harding NJ1, Birch JM2, Hepworth SJ1, McKinney PA1, on behalf of the UKCCS investigators 1 Paediatric Epidemiology Group, 30-32 Hyde Terrace, University of Leeds, LS2 9LN 2 Cancer Research UK Paediatric and Familial Cancer Research Group, Central Manchester and Manchester Children’s University Hospitals NHS Trust, Manchester Presented by: Nick Harding Central Nervous System (CNS) tumours comprise approximately 20% of all childhood cancer, making them the second most common group of childhood cancers after leukaemia. Only ionising radiation has been conclusively linked to an increased risk of childhood CNS tumours. Rare genetic disorders such as neurofibromatosis I, Turcot’s syndrome and Gorlin’s syndrome can also predispose children to CNS tumours, but are observed in less than 5% of cases. Atopy is allergic hypersensitivity stemming from overproduction of IgE antibodies, typically associated with a Th2 response, against common environmental allergens. The prevalence of atopy has increased in the UK in recent years, and is an important causal factor of the allergic diseases asthma and eczema. Several case control studies have reported a negative association between allergic disease and CNS tumours. One popular explanation for the observed association is that atopic individuals have raised immuno-surveillance and so are better equipped to purge nascent tumour cells when they arise, but others include confounding, reverse causality and bias. This study is based on questionnaire data from the UK Childhood Cancer Study (http://www.ukccs.org), a nationwide case control study with 686 CNS tumour patients and 7,621 controls.

11

Parallel Session 1A

A sensitivity analysis for exploring misclassification in self-reported exposures for the study of association between parental occupational exposure and childhood cancer Olaide Raji, Richard Feltbower, and Patricia McKinney Paediatric Epidemiology, Centre for Epidemiology and Biostatistics, University of Leeds Presented by: Olaide Raji Misclassification is the erroneous measurement of categorical variable(s); this occurs when observed measurements fail to reflect the truth. In epidemiological case-control studies, misclassification occurs more frequently as exposure data are usually collected retrospectively from a variety of sources. One source often used is self-reported occupational information, which may suffer from recall bias. Although misclassification is a well-recognised problem in case-control studies, the analysis of the data rarely acknowledges the scale of the problem. Many studies have often carried out a conventional analysis, which does not reflect any source of uncertainty other than random error. This type of analysis only allows informal judgements regarding the magnitude and the effect of misclassification bias on the risk estimates or measures of uncertainty such as standard errors. We will use the occupational data from the United Kingdom Childhood Cancer Study (UKCCS) to explore the possible misclassification bias in the study of association between childhood leukaemia and parental occupational exposures. The results of an exploratory analysis of recall bias in the self-reported exposure to specific agents will be presented. The semi-automated sensitivity analysis, which reconstructs the data that would have been observed had the misclassified variable been correctly classified, will be used to explore the magnitude of the bias on the risk estimates The results from these analyses will provide insight into the extent of a problem that is often overlooked when only conventional analysis is carried out. Necessary extension of the model will be discussed, and the future direction of this work will be highlighted during the talk.

12

Parallel Session 1A

Psychotic-like symptoms in the ALSPAC birth cohort at age 12 years Kate Thomas University of Bristol Presented by: Kate Thomas Aims: Hallucinations (such as hearing voices) and delusions (false beliefs) are common psychotic like symptoms (PLIKS). It has been proposed that PLIKS in otherwise healthy children could be considered as risk factors for developing schizophrenia, which manifests in adulthood and has a prevalence of approximately 1%. Research suggests that patients diagnosed with schizophrenia are also likely to have other disorders such as depression, anxiety disorders and social problems. There is currently little information on the occurrence of PLIKS in children. The aim of this study is to investigate the prevalence of PLIKS in a group of children from a non-clinical population-based birth cohort known as ALSPAC (Avon Longitudinal Study of Parents and Children). Methods: As part of an annual assessments held at Bristol University, children participated in a face-to-face interview with trained psychologists, investigating their experience of PLIKS in the previous 6 months. 6455 children completed the PLIKS interview, with a mean age of 12.9 years. The instrument comprised of 12 questions eliciting key psychotic symptoms: including hallucinations, delusions and thoughts disorders. Results of this work will be presented.

13

Parallel Session 1A

The relationship between Hypnotics/Anxiolytics and Antidepressant prescribing in Scotland Mary-Jane Anderson NHS National Services Scotland Presented by: Mary-Jane Anderson Aim: This paper aims to investigate whether the rise in antidepressant prescribing is due in part to the fall in benzodiazepine prescribing. Three different ways of measuring prescribing were used successively removing the confounding variables. Method: This was an analysis of routinely collected NHS data held by Information Services Division Scotland. The amount of hypnotics/anxiolytics and antidepressants prescribed from 1992 to 2006 were compared. Practice level correlations for prescribing data for the year 2005/06 were calculated, using three different measures of prescribing: defined daily doses, standardised prescribing rates and regression residuals. This successively reduces the amount of confounding variables. Results: The total amount of hypnotics/anxiolytics prescribed has fallen over the period 1992 to 2006, and the total amount of antidepressants has risen. The combined total has increased. Correlation results: comparing the volume of drugs: + 0.059. Using the standardised prescribing rates to factor out age and sex: + 0.185. Using the regression residuals to factor out all population, GP and practice characteristics: + 0.084. All correlations are significant at the 0.01 level. Conclusions: The weak, positive correlation between hypnotics/anxiolytics and antidepressant prescribing found when comparing defined daily doses increased when the influence of population, GP and practice characteristics were removed. This shows that the same variables act in different ways on the two different groups of drugs. A weak positive correlation remains when all the known confounding variables are removed. Despite the fall in hypnotics/anxiolyticsprescribing and rise in antidepressant prescribing, practices which are high prescribers for one drug category tend to be slightly higher prescribers for the other.

14

Parallel Session 1A

Flexible Parametric Models for Relative Survival, with Application in Coronary Heart Disease Christopher P Nelson1, Paul C Lambert1, Iain B Squire2 & David R Jones1 Centre for Biostatistics and Genetic Epidemiology, University of Leicester 2 Department of Cardiovascular Sciences, University of Leicester

1

Presented by: Christopher Nelson Deaths in a cohort of subjects suffering from a particular disease can be associated with a variety of causes that are not necessarily due to the disease under study. Relative survival methods are used to estimate the mortality rate for a particular disease after correction for mortality due to other causes. Relative survival is estimated from the ratio of the observed (all-cause) survival with the expected survival obtained from a comparable group in the general population. In relative survival models the observed mortality rate, h(t) can be expressed as, h(t) = h∗(t) + λ(t), where h∗(t) is the background mortality, made up from death due to other causes (obtained from routine data), and λ(t) is the excess mortality, associated with the disease of interest. Relative survival is becoming a standard method in population based cancer studies and can also be used in other clinical areas. There is a growing interest in the modelling of relative survival. Most current models split the time scale; a number of models are appropriate only for grouped data [1] and fit piecewise estimates for the excess mortality rate. Splitting the time scale too finely may cause zero cells in subgroups which can prevent model convergence and unduly influence inferences in later time periods. In this paper we develop a flexible approach to modelling relative survival using restricted cubic splines. These can be fitted to individual level data without the need to split the time scale. We extend the flexible parametric survival models proposed by Royston and Parmar [2] to incorporate background mortality and thus make them suitable for modelling of relative survival. These models make use of restricted cubic splines on the log cumulative excess hazard scale for baseline effects. The flexible parametric model was motivated when attempting to model a heart disease dataset. Heart disease is the single biggest killer in the UK and yet relative survival analysis methods have yet to be widely adopted despite their potential in this area. We therefore illustrate the method on 4748 patients admitted to a coronary care unit in Leicester, UK, following acute myocardial infarction. By including covariates in the restricted cubic spline terms the model can allow flexible time-dependent effects for a variety of covariates allowing the modelling of non-proportional excess hazards. Parameter estimates are obtained using maximum likelihood using the Newton Raphson technique. References 1. Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival. Statistics in Medicine 2004; 23: 51-64. 2. Royston, P. and Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine, 2002; 21: 2175-97

15

Parallel Session 1B

Model selection for high-dimensional, censored survival data Messow CM, Victor A, Hommel G, Blettner M Institut für Medizinische Biometrie, Epidemiologie und Informatik, Universität Mainz, Deutschland Presented by: Claudia-Martina Messow Introduction: Model selection for a proportional-hazard model cannot be performed using standard methods if the number of covariates is large compared to the number of patients. However, this is commonly found in the context of gene expression studies. Several attempts to overcome this problem have been published. One of the most frequently cited papers in the context of prognosis using gene expression microarray data has been published by Van ‘t Veer et al. [1]. They propose a rather simple approach which consists of dividing the patients into two groups according to their five-year-survival. They then select genes to form a gene signature based on the correlation of the expression values with the outcome. The number of genes used for the signature is determined by leave one out cross validation. The classification of new patients into the good (or bad) prognosis group is based on their correlation with the mean expression in the good and the bad prognosis group. Wang et al. [2] have proposed a method based on using univariate proportional hazards regression and ROC (Receiver-Operating-Characteristic)-Curves to generate a risk score. In order to ensure robustness, bootstrapping is introduced into the model building process. Methods: In order to assess the methods, we have applied them to a real data set of biological markers and to simulated data with different assumptions on correlations and the strength of the relationship between the variables and the survival. Results: Whilst it seems that the method proposed by Wang et al. can be a useful tool in some circumstances, it has some serious shortcomings. Survival information has to be dichotomised several times during the process, causing a loss of information, but also requiring the choice of a threshold at which to dichotomise. Several other parameters have to be chosen by the user. This makes the method adaptable to different situations, but it also increases the risk of overfitting and/or biasing the results. Most importantly, the method does not take the dependencies of the variables into account, as variables are entered into the model based on their p-value in a univariate proportional hazards regression. The method used by Van ‘t Veer et al. shows a substantially poorer performance than the method of Wang et al.. Literature: [1]

[2]

Van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; vol. 415, Issue 6871: 530-36 Wang Y, Klijn JGM, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijervan Gelder ME, Yu J, Jatkoe T, Berns EMJJ, Atkins D, Foekens JA. Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; vol. 365, Issue 9460: 671-9

16

Parallel Session 1B

Factors affecting time to and development of a malignancy after a solid organ transplant Lisa Mumford University of the West of England / UK Transplant Presented by: Lisa Mumford The success of transplantation may be counterbalanced by the development of a malignancy. As a result of this the need to find out the causes of malignancy in transplant recipients is increasingly more important. Using data held on the National Transplant Database, 7785 Group 1 elective cadaveric solid organ transplants performed in England, Scotland and Wales between 01 January 1999 and 31 December 2003 were analysed (4876 renal, 1040 cardiothoracic, 1869 liver) of which 294 recipients had developed a malignancy within 5 years after transplantation. Using a Cox proportional hazards model, factors affecting time to the development of a malignancy were found. Nominal regression was also used on patients who had developed a malignancy to see what factors affected the development of certain types of malignancies. Recipient age was found to be significantly associated with the development of a malignancy (hazard ratio = 0.122, p=0.01) as well as recipient sex (hazard ratio = 0.175(male), p=0.001) and if the recipient was of an ethnic minority group (hazard ratio = 36.2(other), p=0.02). Immunosuppressant drugs did appear to be significantly contributing to the development of a malignancy, but due to a large amount of missing data no accurate conclusions could be drawn. The most common type of cancer found was skin cancer, with cancer of the digestive organs a close second. The immunosuppressant drug Tacrolimus was significantly contributing to the development of cancer of the digestive organs, while recipient age and ethnicity were contributing factors in the development of skin cancer.

17

Parallel Session 1B

Applications of Survival Analysis in Financial Services Mohit Dhillon Barclays Bank Presented by: Mohit Dhillon Most statisticians will be familiar with the importance and use of survival analysis in the biological sciences. Indeed, the majority of techniques used today have been developed and refined over many years to address issues faced within the life sciences. During this time, few may have envisaged that Cox’s Proportional Hazards model or Gamma regression, or any of the many other survival analysis techniques, critical, for example, to the continued success of the pharmaceutical industries, would have important applications within financial services as well. The financial services industry is facing a continuous period of change. The pace of this change – fuelled by increasing regulation, the demands for transparency, and the need to continue to deliver shareholder value – has added to an the increasing pressure for financial institutions to seek alternative ways of gaining and maintaining competitive advantage. Survival analysis is one such avenue which is under investigation. The speaker will describe briefly the research completed for his MSc, what major data issues were, how these were overcome, and how these may differ from real-life practice. He will then explain how these same techniques have potential applications within financial services. The speaker will give a brief overview of the new regulatory regime being introduced in financial services aimed at strengthening capital adequacy standards – the so called ‘Basel regulations’, The three main areas of risk covered by the regulations are market risk, operational risk and credit risk. The speaker will cover some of the tools and techniques being considered within the credit risk arena, the data issues faced, including missing and censored observations, and the similarities and differences found. The presentation may be useful in explaining to both graduates and undergraduate statisticians alike, why a career in financial services may indeed be as academically challenging and rewarding as the medical sciences.

18

Parallel Session 1B

Statistical Process Control Methods for Monitoring Blood Components Elinor Curnow UK Transplant Presented by: Elinor Curnow Quality control of blood components is of vital importance. Appropriate monitoring maintains sufficient and consistent concentrations of components such as platelets and red blood cells. In addition, monitoring of leucodepleted blood products ensures very low risk of transmission of life-threatening diseases such as variant Creutzfeldt-Jakob disease (vCJD) from donor to recipient. Statistical process control (SPC) techniques based on control charts were reviewed and compared. Simulations from a normal distribution were created to represent samples of typical platelet data. Similarly, a binary distribution was used to represent leucocyte data where machine sensitivity was limited. Simulated data were used to describe the effect of different forms of process change on Shewhart ( X and R) and CUSUM charts. In addition, average run-lengths were estimated and used to evaluate detection rates for each SPC technique in the presence of an in-control or out of control process. Results are discussed within the context of blood component production. In particular, the different requirements when monitoring platelets and leucocytes are described and used to inform choice of SPC technique.

19

Parallel Session 2A

Local & Marginal Cumulative Sum Charts for Comparative Monitoring of Methicillin-Resistant Staphylococcus Aureus rates in UK NHS Trusts Olivia Grigg Medical Research Council Biostatistics Unit Presented by: Olivia Grigg I will present a sequential analysis of the six-monthly mandatorily reported MethicillinResistant Staphylococcus Aureus (MRSA) rates in UK NHS Trusts over the period April 2001 – September 2005 [1]. We apply multiple Local & Marginal Cumulative Sum (L&M CUSUM) charts to test for change relative to initial local levels and to the standard level across trusts of a similar type. The statistic fed into the CUSUMs is of observed-expected (O-E) form and adjusts for hospital volume, measured by bed-days, and trust type. We set up separate null models for the local and marginal tests, using the first four data points to robustly specify each. For the local null, we assume that the trust effects, θ, are known and fixed, and drawn from the same distribution, so fit a normal random-effects model that specifies the null θ’s and within- and between-trust variance components. For the marginal null, we do not assume fixed values for the individual θ’s, just that they are similar across trusts, so fit a marginal model where the θ’s are integrated out and a common variance is estimated. A robust fit is sought by trimming trusts with extreme estimated effects, and (for the local null) shrinking the estimated θ’s towards the group mean. Aiming to control the multiplicity at each time point of false signals amongst all signals across trusts, we assign time-marginal p-values to CUSUM values [2] and apply a False Discovery Rate controlling procedure [3] to those p-values. I will show how the L&M CUSUMs pick out relationships between the observed and expected counts, and give summary results. [1] The Department of Health. MRSA surveillance data April 2001 – September 2005. Available from http://www.dh.gov.uk/assetRoot/04/12/79/13/04127913.pdf Accessed on 08/12/06. [2] Grigg, O. & Spiegelhalter, D. (2006) ‘The null steady-state distribution of the CUSUM statistic’. Tech. Report. Available from http://www/mrc-bsu.cam.ac.uk/BSUsite/Publications/ [3] Benjamini, Y. and Hochberg, Y. (1995) `Controlling the False Discovery Rate: a practical and powerful approach to multiple testing’. J. R. Statist. Soc. B, 57(1), 289―300.

20

Parallel Session 2A

A classification method for diagnostic settings with a repeatedly measured biomarker: Longitudinal quadratic discriminant analysis Mareike Kohlmann1, Veit P. Grunert2, Leonhard Held3 Statistics Department, University of Munich, Germany 2 Biostatistics Department, Roche Diagnostics GmbH, Penzberg, Germany 3 Institute of Social and Preventive Medicine, University of Zurich, Switzerland 1

Presented by: Mareike Kohlmann We propose an adaptation of the quadratic discriminant analysis (QDA) which is suitable for diagnostic classification problems based on a repeatedly measured biomarker. The main idea of this longitudinal quadratic discriminant analysis is to account for the temporal structure by estimating the class-specific means and covariance matrices by linear mixed models. The linear mixed models under consideration are those with random effects or with an additional continuous AR(1) structure in the covariance matrix. The resulting estimates are plugged in the classic quadratic discriminant rule afterwards. We use Monte Carlo cross validation to establish the discriminant rule on training sets and to validate it independently on test sets. The classification performance of a biomarker is evaluated by the area under the curve (AUC) and the Brier Score with its decompositions, complemented by graphical assessment via the receiver operating characteristic (ROC) curve and the calibration plot. Simulation studies illustrate the need to apply the longitudinal variant of quadratic discriminant analysis for repeatedly measured biomarkers in contrast to the classic QDA and we explore the characteristics of a good time-dependent classifier. In addition, we examine the importance of choosing the appropriate linear mixed model within the estimation of the means and the covariances and its influence on the classification performance.

21

Parallel Session 2A

Semi-Markov and hidden semi-Markov models for interval censored multi-state data Andrew Titman MRC Biostatistics Unit, Cambridge Presented by: Andrew Titman Multi-state models are used extensively to model categorical longitudinal data in continuous time. In many studies subjects are only observed at discrete irregular time points, meaning the data are incomplete. In this situation it is usually assumed that the data are Markov, often with the further assumption of time homogeneity so that the sojourn times in each state are exponentially distributed and the transition intensities between states are constant through time. Semi-Markov models allow the sojourn times in each state to have arbitrary distributions so that transition intensities depend on the amount of time spent in the current state. Often this may be a more appropriate assumption, for instance in modelling a disease where the hazard of death increases with time since onset of disease. Semi-Markov models have rarely been used in modelling this type of interval censored data because in general the likelihood is intractable. However, I will show that if the sojourn times in the semi-Markov model are given phase-type distributions, the likelihood takes the form of a hidden Markov model, for which inference methods are well known. The observed states in studies may be based on imperfect measures like angiograms and will therefore be subject to misclassification error, producing a class of hidden semi-Markov model. This situation can also be accommodated. The methodology will be applied to data on post-heart-transplantation patients who are at risk of cardiac allograft vasculopathy. This example highlights some issues with parameter estimation and identifiability. These issues will be discussed.

22

Parallel Session 2A

Do people eat more fish on Friday? An introduction to a non-parametric repeated measures test when there are missing data, or an unbalanced design, applied to nutritional data Mark D. Chatfield, Christopher W. Thane, Adrian P. Mander & Alison M. Stephen MRC Human Nutrition Research, Cambridge Presented by: Mark Chatfield The Friedman test is a non-parametric alternative to repeated measures ANOVA. When the same individuals are measured over multiple (>2) testing conditions/occasions, the Friedman test can be used to test whether the measures differ between testing conditions/occasions. However, the Friedman test will not use all of the available information if there are missing data, or in an unbalanced design. Instead, the Skillings-Mack statistic, as described in Hollander and Wolfe’s book “Nonparametric Statistical Methods” (2nd edition) can be used. Its simple construction will be described, and how it has been applied to examine whether food consumption varies between days of the week. The Skillings-Mack statistic has now been implemented (in STATA). In addition, when there are many ‘ties’, the package simulates the null distribution of the Skillings-Mack statistic (equivalent to the Friedman statistic when there are no missing data, and a balanced design), and uses this in place of the χ2 distribution (which is a poor approximation in this particular case). Summaries of within-person ranks will indicate where the differences lie in day-to-day variation of food consumption.

23

Parallel Session 2A

The use of ordinal regression models in epidemiology: an example from a study of mobility limitation Sara Mottram, George Peat, Elaine Thomas, Ross Wilkie, Peter Croft Primary Care Musculoskeletal Research Centre, Primary Care Sciences, Keele University, Keele, Staffordshire, ST5 5BG Presented by: Sara Mottram Epidemiology often deals with binary outcomes (e.g. disease present/absent). However, health status measures frequently have several categories of ordered response (e.g. mild, moderate, severe). Such ordinal outcomes are often dichotomised for perceived ease of analysis and interpretation. This can result in arbitrary decisions about where the dichotomy lies, and loss of statistical power, efficiency and information. We explore the potential value of ordinal regression models by comparing a partial proportional odds model (PPOM) against standard binary logistic regression model in a large cross-sectional study of mobility limitation. Limitation was measured by a single item with three ordered response categories (not limited, limited a little, limited a lot). Determinants of mobility limitation included gender, age, occupation, education, adequacy of income, living arrangement and pain location. The PPOM is an extension of the proportional odds model, allowing the assumption of proportional odds to be relaxed for some or all of the explanatory variables as necessary. A binary logistic regression model, dichotomising the outcome between “a little” and “a lot” of limitation, was used as the basis for comparison with the PPOM. We consider the significance of different variables, the conclusions drawn and the ease of interpretation. Some variables found to be non-significant in the binary model were significant in the PPOM. The strength of association was generally similar in the two models. However, weaker nonproportional effects identified in the PPOM were not found in the binary model. PPOMs can provide more detailed information on the association between risk factors and ordinal outcomes than a binary model. Although more parameters are estimated, the PPOM is easily fitted and its interpretation is little different to the binary model. Ordinal data need not be forced into a binary model, the choice of model largely depending on the research question and the nature of the data.

24

Parallel Session 2B

Some Aspects of Errors in Variables Modelling: Fitting a Straight Line Jonathan Gillard School of Mathematics, Cardiff University Presented by: Jonathan Gillard There is an abundance of scientific papers that use simple linear regression to fit a straight line. Simple linear regression assumes that error is present only in one variable – the dependent variable, and so, the usual method of attack is to minimise some function of the residuals. Least squares theory for example, suggests that we minimise the sum of these squared residuals and we obtain the usual least squares estimators. All available statistical packages can deal with this situation, and solutions are readily obtained. What if, however, as well as having error in the dependent variable (y) there is error in the independent variable (x)?. This is referred to as an ‘errors in variables’ situation, and proves to be a much more complex model. This talk will look at some of the quirks of errors in variables modelling, and how one might get around the problem in practice. Due to the extra error component, simple linear regression is not appropriate and provides biased results. The talk is motivated by an example for Down’s syndrome screening.

25

Parallel Session 2B

Model averaging in multivariate regression Edward Cripps1, Chris Carter2, Robert Kohn3 1 University of Sheffield 2 Australian Graduate School of Management 3 University of New South Wales Presented by: Edward Cripps This presentation outlines a general framework for Bayesian variable selection and covariance selection in a multivariate regression model with Gaussian errors. By variable selection we mean allowing certain regression coefficients to be zero. By covariance selection we mean allowing certain elements of the inverse covariance matrix to be zero. We estimate all the model parameters by model averaging using a Markov chain Monte Carlo simulation method. The methodology is illustrated by applying it to longitudinal data and cross-sectional data. The effectiveness of variable selection and covariance selection in estimating the multivariate regression model is assessed by using several loss functions and simulated data based on parameter estimates obtained from corresponding real data.

26

Parallel Session 2B

Combining Two or More Graphical Gaussian Models: a Missing Data Perspective

1

Maria Sofia Massa1, Steffen Lauritzen2 Department of Statistics, University of Padova (Italy) 2 Department of Statistics, University of Oxford Presented by: Maria Sofia Massa

We present the general problem of combining the information given by two or more Gaussian graphical models. It arises in several contexts that are the motivation of this study. If different laboratories study the same phenomenon using quite different approaches, the interest could lie in elaborating a unified model that merges the knowledge carried by the single ones. Given two biological networks with some genes in common, it is interesting to describe a possible joint network that unifies the information acquired through the marginal models. We give the precise specification of the problem and the necessary consistency assumptions to build a maximum entropy joint model. We provide examples of combining two or three marginal models and estimate the parameters of the joint model, exploiting a missing data perspective.

27

Parallel Session 2B

On the impact of contaminations in Graphical Gaussian models Simona Pacillo1, Anna Gottard2 1 University of Sannio 2 University of Florence Presented by: Simona Pacillo The effects of some kinds of contaminant in terms of model selection procedure in undirected graphical models are examined. To explore against which kind of contaminants a model selection procedure perform a more robust behaviour, different sources of distortion are considered, in order to measure the effect of gross errors, model deviations and model misspecification. The chosen model selection procedure is based on the SINful approach (Drton and Perlman, 2004). This procedure takes into account simultaneous confidence intervals for the partial correlation coefficients, in order to control the overall error rate for incorrect edge exclusion. The analysis is based on simulation study. In this setup, a covariance selection model with a given graph are fitted variables generated from a multivariate normal joint distribution. Outliers are generated following the next scheme. The first case takes into account that observations are measured with error for only one variable in

X V ( v = 1,K , p ). Afterwards some cases of model deviation are considered: i)

observations came from a population having the same conditional independence structure but a different multivariate normal distribution; ii) observations draw from a different conditional independence structure with a different multivariate normal distribution; iii) observations come from a non normal distribution (in particular form a Skew Normal distribution). As possible solution to the problem taken into account, robust methods for the covariance matrix estimate, such as the MCD and the robustified maximum likelihood estimation (Miyamura and Kano, 2006), are considered and compared. References Drton M, Perlman MD (2004) A SINful approach to Gaussian graphical model selection, available in http://www.stat.washington.edu/drton/Papers/2005statsci.pdf Gottard A, Pacillo S (2006) On the impact of contaminations in graphical Gaussian models, Statistical Methods & Applications. Forthcoming Miyamura M, Kano Y (2006) Robust Gaussian graphical modelling, Journal of Multivariate Analysis 97, 1525 – 1550

28

Parallel Session 2B

Modelling longitudinal spatial curve data Sarah Barry University of Glasgow Presented by: Sarah Barry Shape data arise in several different fields, such as computer science, medicine and statistics. Though much work has been done in the area of modelling two-dimensional shape data in statistics, there has been relatively little analysis of three-dimensional shape data, particularly when they are of a longitudinal nature. We present a pairwise mixed effects modelling approach for longitudinal data of high dimension, introduced in Fieuws and Verbeke (2006), and apply it to data from a study of the facial shapes of infants suffering from cleft-lip and palate. Both landmarks and curves have been used to describe the facial shapes, and to demonstrate the applicability and benefits of using the pairwise approach for such kinds of data, but here we focus solely on curves. The approach of Fieuws and Verbeke (2006) is extended to include a quadratic test of model fixed effects, which may be applied to the analysis of either landmarks or curves, and parametric bootstrapping is employed to verify the accuracy of such a test. Analysis of the facial curves proceeds by fitting a B-spline to the data and using the spline coefficients as the model responses. Informal 95% confidence intervals for the curve model estimates are presented, followed by a discussion of the appropriateness of this approach and a comparison with the results obtained from the quadratic test of the fixed effects. We will go on to compare the pairwise mixed effects modelling approach with a similar one assuming independent random effects and a correlated random error matrix by exploiting the spatial nature of the data.

29

Parallel Session 3A

On intensity-dependence of tree properties in a marked point pattern of trees Mari Myllymäki University of Jyväskylä, Finland Presented by: Mari Myllymäki A marked point pattern is a collection of random locations of objects, ‘points’ (for example trees), each provided with a measured quantity called ‘marks’ (for example tree size). An aim is to provide models for such data with the property that the marks depend on the local point intensity. Intensity-dependence of marks is considered through an example in forestry. A model for marks that depend negatively on the point intensity is introduced and applied to an marked point pattern of trees.

30

Parallel Session 3A

Bayesian estimation of space and size distribution of trees from LIDAR measurements Aki Niemi1,2, Erkki Tomppo2, Antti Penttinen1 1 University of Jyväskylä, Finland 2 Finnish Forest Research Institute Presented by: Aki Niemi A general aim in developing new forest inventory methods is to utilise technological advances so that the amount of expensive field sampling effort could be reduced. One of the most active areas of current research are applications with LIDAR (Light Detection and Ranging). LIDAR is laser-based technology which in forest inventory is used for airborne height measurement of trees. For upper canopy layer, i.e. big trees, dense LIDAR data gives tree locations and heights with very high accuracy. However, most of the small trees will not be detected since they are shadowed under the big trees. Hence, if LIDAR data is to be used for estimation of forest variables, the effect of missing small trees needs to be corrected somehow. This talk presents a Bayesian approach with marked point processes, aiming at tackling the problem of missing small trees from LIDAR data. For big (detected) trees, we construct an explicit marked point process model, and the summarised effect of small trees, conditioned on big trees, is modelled by a random field. A crucial part of our modelling is elicitation of prior distributions from extensive data sets at Finnish Forest Research Institute. Posterior inference is calculated via MCMC simulation.

31

Parallel Session 3A

Spatial point processes and graphs Tuomas Rajala University of Jyväskylä, Finland Presented by: Tuomas Rajala The study of spatial point processes aims to understand dependency structures present in spatial point pattern data. So far the second order analysis methods have been popular but new analysis tools of different nature are needed for detecting features other than clustering and regularity. One of the topics of recent interest is the theory of random graphs. These are graphs generated from point patterns by deterministic rules. This talk is about harnessing random graphs for the use of spatial point pattern analysis: The data generates a graph which allows the introduction of new point process summaries. Such summaries are discussed together with new graph structures and modelling. Examples and algorithms will also be given.

32

Parallel Session 3A

Design of Experiments for Data Networks Ben M. Parker Queen Mary, University of London Presented by: Ben Parker Network simulation is a tool which is widely used in engineering, for example when commissioning computer networks, or investigating problems in existing ones. Evaluating performance of networks becomes harder as the underlying systems and the simulations which represent them become more complex, so it is important to develop techniques to select inputs (experimental points at which to perform the simulation) which will provide the most accurate information about the system whilst reducing the time taken by the experiment. The talk provides some background on the statistical theory of design of experiments and on the statistics of queueing theory, and outlines work done so far to find effective designs for experiments on data networks, and the interesting statistical challenges this raises.

33

Parallel Session 3B

Identification and Bayesian Inference of the MAPK Pathway using Western Blot Data Vilda Purutçuoğlu and Ernst Wit Lancaster University, Department of Mathematics and Statistics, Fylde College, LA1 4YF, Lancaster, United Kingdom Presented by: Vilda Purutçuoğlu We implement a Bayesian method for estimating the model parameter, i.e. stochastic rate constants, of a realistic MAPK/ERK signaling pathway via MCMC and data augmentation, and then model the considered pathway by using the real data. In inference of this network structure we use the Euler approximation which is known as the discretized version of a diffusion process. In modeling such a complex system, on the other hand, we face with several major challenges. Firstly, we come across the problem of dependency at every stage of the updates of reaction rates and augmented/unobserved state values, resulting in a significant computational redundancy besides the computational cost for an ordinary Bayesian inference of a network. Then we have very sparse data because of the technical limitations for the measurements in laboratories, thereby, generally, we have to deal with insufficient information about the actual system in the sense that the data sets which we apply in our algorithms have a large number of missing values. Finally the available observed substrates show the convolution problem. By considering all those highlighted challenges we define different models for the MAPK/ERK pathway. In these alternative computational models we reduce the overall number of model parameters by simplifying the concerning structure under distinct assumptions. Then we test our alternatives via various model selection criteria, including BIC and DIC.

34

Parallel Session 3B

A Nondecimated Second Generation Wavelet Transform Marina Knight School of Mathematics, University of Bristol Presented by: Marina Knight Stationary or nondecimated wavelet transforms are attractive for many applications. When the data comes from complex or irregular designs, the usage of second generation wavelets can prove superior to that of classical wavelets. However, the construction of a nondecimated second generation wavelet transform is not obvious. While some examples can be found in the literature, the properties of the proposed constructions are not investigated in depth. We propose a new nondecimated lifting transform based on the lifting algorithm with removal of one coefficient at a time. Simulations show that the usage of such an approach in nonparametric regression problems significantly improves the denoising performance. Our construction also opens avenues for generating a “best” basis, which we shall explore.

35

Parallel Session 3B

A Variance-stabilising algorithm for Binomial intensity estimation Matthew Nunes School of Mathematics, University of Bristol Presented by: Matthew Nunes There exist many different methods for classical nonparametric regression in the statistical literature. However, techniques specifically designed for binomial intensity estimation are relatively uncommon. This talk proposes a new technique for the estimation of the proportion of a binomial process. The technique, called the Haar-NN transformation, uses a combination of a wavelet technique and the asymptotic properties of a certain function of binomial random variables. The method successfully transforms the data to be approximately normal with constant variance. This reduces the binomial proportion problem to the usual ‘function plus normal noise’ regression model and thus any suitable denoising method can be used for the intensity estimation. Simulations demonstrate that the methodology possesses good Gaussianization and variance-stabilizing properties when compared with traditional transformations.

36

Parallel Session 3B

Meta Analysis of Pre-Clinical/Experimental Stroke Studies Laura J Gray, Claire Gibson, Philip MW Bath Division of Stroke Medicine, University of Nottingham, Clinical Sciences Building, Nottingham City Hospital Campus, Nottingham, NG5 1PB Presented by: Laura Gray Introduction: Carrying out meta analyses for clinical trials in humans is common place and the Cochrane Library publishes high quality results from this type of research. Meta analysis of animal work is relatively rare. Experimental work in animals is very important in stroke research. Although, many clinical trials have been carried out in stroke, the treatment options available for patients remains limited. It is therefore highly important to maximise the results from animal studies so that only the most promising agents are carried forward into clinical trials in man. This can be achieved through meta analysis of the results from animal work. Methods: We performed a meta analysis of controlled pre-clinical studies that administered progesterone in acute cerebral injury. Relevant studies were found from searching PubMed, Embase and Web of Science. From 119 identified publications, 18 studies used 480 experimental subjects. The quality of stroke was rated on a 0-5 point scale (with 5 best). Results: Following cerebral ischaemia, progesterone reduced lesion volume regardless of quality score: standardised mean difference (SMD) –0.5, 95% confidence intervals (95% CI) –0.8 to –0.2 (p=0.0002); in contrast, following traumatic brain injury (TBI) there was no overall effect, although a significant decrease in lesion volume was seen in 4 high quality studies: SMD –0.67, 95% CI –1.21 to –0.12 (p=0.02). Progesterone treatment was effective when administered up to 2 hours after cerebral ischaemia (p=0.001); no studies assessed the effect of administration after this time. Data were limited for studies investigating age/hormonal status and dose-response relationship. Conclusion: Progesterone is neuroprotective in models of experimental ischaemia and possibly TBI. However, future studies are needed to investigate prolonged time to treatment, aged and female animals, and dose response. This example highlights the use of meta analysis in experimental research; progesterone is not identified as a potential agent for testing in humans but further areas of pre-clinical research are identified, e.g. using aged animals.

37

Parallel Session 4A

Variations in Primary Open-Angle Glaucoma Prevalence by Age, Gender, and Race: A Bayesian Meta-Analysis Shahrul Mt-Isa Wolfson Institute of Preventive Medicine, Queen Mary, University of London Presented by: Shahrul Mt-Isa Many large population-based studies have been conducted to determine the prevalence of glaucoma. Most have been carried out in white populations. Studies have also been conducted on Asian, multiracial, and black communities. Glaucoma is a leading cause of vision loss worldwide and is of major public health importance: it is estimated to affect 66 million people worldwide, with at least 6.8 million people bilaterally blind from the condition. A recent meta-analysis of six studies confirmed a higher prevalence in black Americans than in white Americans. Estimates of the prevalence of any glaucoma in white populations ranged from 1% to 1.5% in those aged 40 to 65 years, rising to 2% to 7% in those older than 65; estimates in black Americans ranged from 1.5% to 3.6% and 4.6% to 9.8%, respectively, for similar age groups. These racial differences are of particular interest because they may allude to mechanisms of disease. This article is a systematic review of the published literature on OAG prevalence. The aim was to quantify the relation of OAG prevalence to age and gender and how these relationships vary by racial group. Heterogeneity in prevalence between studies attributed to differences in survey methodology and year of publication was also explored. A Bayesian logistic meta-regression of the log odds of OAG was constructed based on age-specific data extracted. As some studies contributed more age groups than the others, producing jagged data, the model accounted for this hierarchical structure by estimating the odds (prevalence) at age 40 separately within each study but the change in log odds of OAG with age was estimated across all studies. Other variables were included as study level covariates. WinBUGS was used in analysis to allow for variations at all levels. Reference: Rudnicka AR, Mt-Isa S, Owen CG, Cook DG, Ashby D. Variations in Primary Open-Angle Glaucoma Prevalence by Age, Gender, and Race: A Bayesian Meta-Analysis. Invest Ophthalmol Vis Sci 2006; 47(10):4254-4261.

38

Parallel Session 4A

Meta-analysis of Mendelian randomization studies Tom Palmer Department of Health Sciences, University of Leicester Presented by: Tom Palmer An increasing number of epidemiological studies now contain a genetic component. An individual’s genotype consists of two alleles, one from each parent, this results in three possible genotype combinations from the two alleles. Information is therefore often available about a patient’s disease status, genotype and a measure of a biological phenotype on the pathway between gene and disease. It is of clinical relevance to make inferences about the relationship between the level of the phenotype and the risk of disease, however this relationship could be confounded by either lifestyle or environmental factors about which information is not available. The method known as Mendelian randomization has been proposed as a way of overcoming this problem. Mendel’s second law states that genes segregate independently, therefore individuals are randomized to a particular genotype irrespective of other confounding factors. It is this random allocation which means that individuals’ genotypes may be used as an instrumental variable in the estimation of the relationship between the phenotype and the disease. The meta-analysis models will also demonstrate a method of estimating a common phenotype disease relationship across both genotype comparisons of the three genotypes of a biallelic polymorphism. Statistical models for the meta-analysis of Mendelian randomization studies will be presented using full-likelihoods and Normal approximations for the study effect size estimates. Estimation methods will be outlined using both maximum likelihood and Bayesian approaches.

39

Parallel Session 4A

Using Statistical Models to Identify Factors that have a Role in Defining the Abundance of Ions Produced by Tandem Mass Spectroscopy Sheila J. Barton University of Southampton / London School of Hygiene and Tropical Medicine Presented by: Sheila Barton A proteome refers to all the proteins produced by an organism, in the same way that a genome is the entire set of genes of an organism. Proteomic technologies will play an important role in drug discovery diagnostics and medicine as they provide the link between genes, protein and disease. The classic workflow in proteomics consists of separation of proteins followed by identification of the individual proteins using mass spectrometry. Peptides or proteins fragment in a mass spectrometer to produce ions composed of shorter chains of amino acids. The intensity or height of a particular peak for one type of ion is a measure of how much of that ion has been formed. Different types of ions are produced according to the bond broken on the original peptide during fragmentation. In this study around five thousand peptides were subjected to mass spectrometry and the resultant spectra studied in order to discover factors influencing the intensity of ion peaks. Several factors were found to have a highly significant influence (p<0.0001) on the intensity of ions formed. These include the actual mass of the ion formed after fragmentation as well as the percentage of the original peptide mass. The composition of the fragmenting peptide was also found to be important. For example peptides containing a basic amino acid, such as arginine, which is capable of sequestering a proton, can fragment differently from other peptides, thus influencing the intensity of ions observed. Amino acids either side of the fragmentation site also influenced the intensity of ions produced. Several amino acids observed to increase intensity if they were adjacent to the fragmentation site agreed with those reported in the literature. In conclusion the models formulated provide useful information about the fragmentation process that could be used to develop improved algorithms for peptide identification.

40

Parallel Session 4A

Identifying and Evaluating Prognostic and Surrogate Markers for Response in the Treatment of Tuberculosis Patrick PJ Phillips MRC Clinical Trials Unit, London / London School of Hygiene and Tropical Medicine Presented by: Patrick Phillips A surrogate marker is used in the context of a clinical trial to substitute for the final endpoint. To be a perfect substitute, it should fully capture the treatment effect on this final endpoint – the test of the null hypothesis of no treatment effect on the surrogate endpoint should also be a valid test of the corresponding null hypothesis based on the true endpoint. A properly validated surrogate marker can shorten trial duration, reduce sample size and save money. Tuberculosis (TB) is a curable disease, but incidence is still growing by 1\% a year globally with 20,000 people developing active disease and 5000 people dying of TB every day. New drugs to increase efficacy, shorten treatment regimens or combat drug resistant strains are in various stages of development, but this is a long process with Phase III trials taking more than five years to complete and requiring large numbers of individuals in order to show a treatment effect. A surrogate marker for long-term response would be an invaluable weapon in the fight against TB. A number of markers have been proposed as possible predictors of treatment response, but no formal statistical analysis has been attempted to evaluate these markers as surrogates. The aim of this research project is to identify and evaluate new and existing markers as surrogate endpoints for long-term response to treatment for TB using data from twelve highly influential TB clinical trials conducted by the MRC across East Africa and East Asia during the 1970s and 80s involving nearly 10,000 participants. Several candidate markers, including longitudinal measures, have been identified and existing statistical methods will be extended to evaluate these as surrogates. The analysis is ongoing, but results from the most promising marker, the two month culture result, will be presented.

41

Parallel Session 4A

Design Issues in Drug Court Trials Elizabeth L.C. Merrall MRC Biostatistics Unit Presented by: Elizabeth Merrall Drug treatment courts have been a revolutionary concept for dealing with crimes fuelled by drug dependencies; they combine the standard judicial process with rehabilitative treatment. For public health researchers and criminologists, this concept has presented a new challenge: court-based experimentation to evaluate the intervention’s effects on recidivism. To date, there has been an absence of randomised drug court evaluations in the UK and a rather meagre presence worldwide. Farrington and Welsh’s (2006) review of randomised experiments on crime and justice identified only three out of 22 court-based experiments evaluating drug courts’ effects on recidivism. Meanwhile, Wilson et al. (2006) retrieved some 55 evaluations of drug court effects on recidivism of which only five were identified as having employed randomization. This paper considers each of these five studies from a design perspective and highlights some key learning points for future studies. These include: • • • • • •

randomisation requires effective communication and understanding to win over clinical and court personnel and to ensure its correct implementation; study outcomes should be comparable across study arms; specify both primary outcome, and its method of analysis, and the associated plausible effect size that the drugs court intervention could deliver; calculate study size so that RCT has at least 80% power to yield a robust evaluation of whether the plausible effect size actually applies; avoid exclusions from analysis post-randomisation; follow-up should commence from the point of randomisation and should be of equal duration for all participants, or take account of censoring appropriately.

Farrington, D.P. & Welsh, B.C. (2006). A Half Century of Randomized Experiments on Crime and Justice. Crime and Justice, 34, 55-132. Wilson, D. B., Mitchell, O., & MacKenzie, D. L. (2007) A systematic review of drug court effects on recidivism. Journal of Experimental Criminology, 2, 459-487.

42

Parallel Session 4B

A Statistical and Multivariate Longitudinal Analysis of Poverty Indices in Middle East & North Africa and Africa Yarim Shamsan Statistics Research and Analysis Group (SRAG) / National Foundation for Educational Research (NFER), The Mere, Upton Park, Slough, Berkshire SL1 2DQ Presented by: Yarim Shamsan The subject of international poverty has attracted a number of leading international organisations over the years. These organisations are trying to tackle poverty in every realistic and possible way. An analysis of poverty indices for some 66 countries in the Middle East & North Africa and Africa has revealed some differences and trends, and country specific trends not simply explainable due to geographical location. Trends in Gross National Product (GNP) poverty indicator shows that the top 25 percent of the 66 countries are increasing in GNP and diverging away from the bottom 50 percent. A generally worsening position of GNP was found for Burundi, Congo Democratic Republic and Sierra Leone. Fertility rate indicator shows an improvement with time as a whole, but there are individual African countries (Angola, Chad, Equatorial Guinea and Gabon) that show a worsening trend and divergent behaviour from other countries. In addition, three Middle East & North African countries (Djibouti, Iraq and Yemen) have profiles more consistent with African countries when considering Fertility Rate. There was found to be a steady increase in life expectancy indicator with the top 25 per cent of the 66 countries moving away from the bottom 50 per cent. There are 23 countries with decreasing life expectancy (predominantly African), in addition to Djibouti and Yemen (Middle East & North Africa) being clustered as being in Africa. Mortality rate seems to be improving in general but there is a worsening position for 12 countries (predominantly African). Three Middle East & North African countries (Djibouti, Iraq and Yemen) have Mortality rates similar to African countries. These conclusions indicate that government targets for tackling world poverty are unlikely to be universally met.

43

Parallel Session 4B

Offshore Compliance: Using Regression and Risk Modelling to Select Cases from Large Datasets Damian Pritchard1 and Nadeer Khan2 1 HM Revenue & Customs 2 HM Treasury Presented by: Nadeer Khan There has been a growing concern within the UK Inland Revenue that individuals are using offshore tax havens mainly as a tax avoidance measure and that large sums of tax revenue are being lost as a result. This concern is re-enforced by independent studies showing vast amounts of deposits held by individuals in offshore jurisdictions. Tax Justice Network, a group of accountants and economists, recently estimated that wealth held in tax havens is costing governments around the world at least $255 billion every year in lost tax revenue. Figures published by Datamonitor, a business analysis company, indicate that approximately $150 billion is held by UK residents in offshore deposits, with just over half in the Channel Islands and the Isle of Man. At the same time, the UK Inland Revenue has obtained information on hundreds of thousands of UK individuals with offshore bank accounts or trusts. Checking these against our internal systems, we can identify which individuals have declared these accounts for tax purposes and which have not. There is nothing illegal about placing capital offshore, provided tax has been paid on the source, but UK domiciled individuals should also declare the income generated by these assets for tax consideration. The UK operates a tax system whereby the majority of employees have their incomes taxed at source by the employer (pay as you earn/ PAYE), but, for people with more complex affairs such as high earners and the self-employed, the system is based on self-assessment (SA). It is a requirement of the UK tax system that UK residents with income that has been generated offshore, such as interest on savings or from trusts, report the existence of such on a self-assessment form even if they are normally PAYE taxpayers. In many cases, however, there is evidence that the tax liability is not being declared, either on the savings-interest or on the source of the capital. The problem we are presented with is how to best manage the information contained in these large sets of data. We have to consider both the short-term goal of collecting yield and the long-term strategy of enabling taxpayers to comply with their tax obligations. We also want to form a picture of taxpayer behavior and the main characteristics for noncompliance. This allows us to carry out rigorous behavioral analysis of all UK taxpayers with offshore accounts to help us understand their motivations for noncompliance, resulting in a significant reduction in the tax gap. http://www.irs.gov/pub/irs-soi/05pritchard.pdf

44

Parallel Session 4B

What does it mean to be a General Insurance Statistician? Laura Williams and Peter Grahame Legal & General Presented by: Laura Williams and Peter Grahame How do insurers decide how much to charge you for your insurance? What role do statisticians have in this process? For a given premium, insurers offer protection against life’s little uncertainties. The general insurance industry must therefore balance the risk of a claim with the potential cost of that claim in order to set profitable premiums. For example, the probability of your house burning down is quite small, but if it does happen, the cost will be very high. Focusing on household insurance, we will discuss how generalised linear models are used to model historic claims experience for different perils (e.g. fire, theft etc.) in order to produce a set of premiums. We will review the types of data available (both internal and external) and the advantages and disadvantages of the size of the data sets we work with. We will also talk briefly about conversion (whether or not a customer takes out a policy when given a quote) and retention (whether or not a customer renews their policy) modelling and their implications for marketing campaigns. We will conclude that there are many interesting challenges facing the statistician in the insurance industry.

45

Parallel Session 4B

WTS (Web Technologies for Statisticians) Romain Francois Statistical Consultant, Mango Solutions Presented by: Romain Francois The past few years have seen many new internet technologies (and many new acronyms!). XML, AJAX, SVG and XUL to name just a few. In this talk, we will review of some of these technologies and focus on how a statistician might can take advantage of them to produce automatic reports or interactive graphics in order to be more productive in his/her daily workflow.

46

Parallel Session 4B

Automatic and Selective Editing in the Office for National Statistics Robert Bucknall Office for National Statistics Presented by: Robert Bucknall Some data collected by ONS business surveys are in error. To find these errors, data validation rules identify suspect data. Rule failures are subsequently queried with respondents, and either confirmed or changed. Due to respondent recontact, validation is a time consuming and expensive process, in terms of ONS resource and respondent burden. Two methods of reducing validation costs are: automatic editing, which corrects systematic errors without the need for recontact; and selective editing, which scores validation failures according to their impact on estimates (only ‘high’ scores are validated). Experience has shown that the presence of systematic errors undermines the setting of selective editing thresholds. In practice, if automatic editing is not in place, these errors are removed by eye. This is clearly inefficient. This paper explores the ‘interaction’ benefits achieved from including both editing processes in the same survey. The outcome is a marked increase in the efficiency of the editing process.

47

Poster Session

A Multiple Regression Model for Country Risk Assessment Aniela Raluca Danciu Academy of Economic Studies Bucharest Presented by: Aniela Danciu The globalization of the world economies, and in particular the internationalization of financial markets in the last decades, have dramatically expanded and diversified investment possibilities, leading to numerous new opportunities, accompanied by new risks. Consequently, there has been growing interest in obtaining reliable estimates of the risk of investing in different countries. These concerns have led to the development of the concept of country risk, and even to the regular publication of country risk ratings by various agencies. Country risk has become a topic of major concern for the international financial community over the last few decades. Various risk rating agencies employ different methods to determine country risk ratings, combining a range of qualitative and quantitative information regarding alternative measures of political, economic and financial risk into associated composite risk ratings. The main objective of this paper is to develop a transparent and a consistent country risk rating system, closely approximating one of the major existing ones (Standard & Poor’s). The proposed model uses economic-financial and political variables, is non-recursive (i.e., it does not rely on the previous year’s ratings) and is constructed using multiple regression.

48

Poster Session

Applications of Survival Analysis in Financial Services Mohit Dhillon Barclays Bank Presented by: Mohit Dhillon Most statisticians will be familiar with the importance and use of survival analysis in the biological sciences. Indeed, the majority of techniques used today have been developed and refined over many years to address issues faced within the life sciences. During this time, few may have envisaged that Cox’s Proportional Hazards model or Gamma regression, or any of the many other survival analysis techniques, critical, for example, to the continued success of the pharmaceutical industries, would have important applications within financial services as well. The financial services industry is facing a continuous period of change. The pace of this change – fuelled by increasing regulation, the demands for transparency, and the need to continue to deliver shareholder value – has added to an the increasing pressure for financial institutions to seek alternative ways of gaining and maintaining competitive advantage. Survival analysis is one such avenue which is under investigation. The speaker will describe briefly the research completed for his MSc, what major data issues were, how these were overcome, and how these may differ from real-life practice. He will then explain how these same techniques have potential applications within financial services. The speaker will give a brief overview of the new regulatory regime being introduced in financial services aimed at strengthening capital adequacy standards – the so called ‘Basel regulations’, The three main areas of risk covered by the regulations are market risk, operational risk and credit risk. The speaker will cover some of the tools and techniques being considered within the credit risk arena, the data issues faced, including missing and censored observations, and the similarities and differences found. The presentation may be useful in explaining to both graduates and undergraduate statisticians alike, why a career in financial services may indeed be as academically challenging and rewarding as the medical sciences.

49

Poster Session

Confidence Intervals and P-values for Meta- Analysis with Publication Bias Masayuki Henmi1, John Copas1, Shinto Eguchi2 1 Department of Statistics, University of Warwick, Coventry 2 Institute of Statistical Mathematics, Tokyo, Japan Presented by: Masayuki Henmi We study publication bias in meta analysis by supposing there is a population (y, sigma) of studies which give treatment effect estimates y ~ N(theta, sigma^2). A selection function describes the probability that each study is selected for review. The overall estimate of theta depends on the studies selected, and hence on the (unknown) selection function. A popular way to handle publication bias is to model the selection function. However, the choice of the model is often problematic because it requires strong assumptions which are not verifiable from the available data. Our previous paper, Copas and Jackson (Biometrics, 2004), studied the maximum bias over all possible selection functions which satisfy the weak condition that large studies (small sigma) are as likely, or more likely, to be selected than small studies (large sigma). This led to a worst-case sensitivity analysis, controlling for the overall fraction of studies selected. However, no account was taken of the effect of selection on the uncertainty in estimation. This paper extends the previous work by finding corresponding confidence intervals and P-values, and hence a new sensitivity analysis for publication bias. Using our method, we re-analyze the data used in the meta analysis of Hackshaw et al. (BMJ, 1997) on the lung cancer risk of passive smoking. The possibility of publication bias in this example has been a matter of some dispute in the literature: our analysis shows that although study selection would imply that the relative risk has been exaggerated, it is unlikely to be sufficient to negate the main conclusion in Hackhaw et al. (1997) that passive smoking does pose a health risk, albeit at a more modest level than has been claimed.

50

Poster Session

SBV Discriminant Analysis Hayley Johnson, Paul White School of Mathematical Sciences, University of the West of England Presented by: Hayley Johnson Human faecal stool samples have been analysed by gas chromatography with mass spectrometry and the presence of volatile organic compounds recorded (see Garner et al., 2007). Four different types of stool were analysed; these being (i) from healthy donors (n=30), (ii) from donors with Ulcerative Colitis (n=18), (iii) from donors with Campylobacter Jejuni (n=31), (iv) from Clostridium Difficile donors (n=22). In total 312 different volatiles have been identified in the 101 specimens (see Garner et al, 2007). Analysis of the raw data has been undertaken and volatiles that effectively discriminate between stool types have been sought. Forward stepwise linear discriminant analysis provides an unwieldy solution and suffers from technical difficulties in using binary discriminators. Four different lists of volatiles have been created to maximally separate between stool types; in each case the decision of stool type is made by a simple count of volatiles present i.e. an equally weighted sum of binary variables, SBV (see Langbehn & Woolson, 1997). We show that volatiles that are good univariate discriminators may not be useful additions to lists depending on the correlation structure between volatiles. For this reason a forward selection heuristic has been developed and implemented to rapidly create the lists. The heuristics works well and SBV totals have been subsequently used as discriminators in classical linear discriminant analysis. References: Garner et al, (2007) Volatile organic compounds from faeces and their potential for gastrointestinal disease diagnoses, Journal of the Federation of American Societies for Experimental Biology Langbehn, D.R. & Woolson.R.F. (1997) Discriminant analysis using the unweighted sum of binary variables: A comparison of model selection methods, Statistics in Medicine, 16, 2679 – 2700.

51

Poster Session

Using satellite data to validate integral processes within ecosystem models Laura Limer Department of Probability & Statistics, University of Sheffield / PDRA for the NERC Centre for Terrestrial Carbon Dynamics Presented by: Laura Limer Soil moisture accounts for only 0.005% of global water sources (1), yet it is a controlling factor for interactions between the hydrosphere, biosphere and atmosphere (2). As such, soil moisture is important in determining vegetation dynamics and consequently carbon I dynamics also. There is great international interest in climate change (in which the C cycle acts as both a major driver of climate warming and as a partial control), with much investment in the development of earth system models. Modellers have the issue of trying to scale up processes which have been reasonably described at the small scale to a much larger scale. The soil water components of some of these models have been compared to ground measurements and found to be lacking (3). Making certain assumptions about properties of the Earth’s surface enables scientists to derive “Earth observation (EO) products” from satellite measurements; the product is an estimate of some physical value of interest. An EO soil moisture product has been favourably compared to ground measurements (4, 5), and is expected to facilitate our understanding of how to model global soil water processes, highlighting weaknesses both models and data. In the NERC Centre for Terrestrial Carbon Dynamics, research is being carried out to compare EO soil moisture data with simulated soil moisture data from the Sheffield Dynamic Global Vegetation Model, SDGVM (6). Since both data sets are derived products, in the sense that inherent assumptions have been made in generating them, statistical techniques are being employed to examine the sensitivities of the data to these assumptions. A Maximin Latin hypercube is used in the parameter input design for this analysis, with both traditional and Bayesian methods employed to analyse the outputs. More confidence can then be given to regional comparisons, with work initially focussing on the UK. References 1. W. Wagner, Vienna University of Technology (1998). 2. K. Scipal, Vienna University of Technology (2002). 3. A. Robock et al., Global Planetary Change 19, 181 (1998). 4. W. Wagner, J. Noll, M. Borgeaud, H. Rott, IEEE Transactions Geoscience and Remote Sensing 37, 206 (1999). 5. C. Prigent, F. Aires, W. B. Rossow, A. Robock, Journal of Geophysical Research 110 (2005). 6. D. J. Beerling, F. I. Woodward, M. Lomas, A. J. Jenkins, Global Ecology & Biogeography 6, 439 (Nov, 1997).

52

Poster Session

Quasi-stationarity of stochastic models for the spread of infectious diseases Sang Taphou Mendy, D Clancy University of Liverpool Presented by: Sang Mendy Many mathematical models for the spread of infectious disease predict that, eventually, the disease will die out of the population. However, for realistic parameter values the expected time until extinction can be long. When this is the case, interest focuses upon the long-term behaviour of the disease process prior to extinction, which is described by the quasistationary distribution. This project looks at the quasi-stationary distribution of the basis SIS model. Cumulant equations are derived and used to approximate the quasi-stationary distribution. The basis SIS model is extended to a two-group SIS model. Deterministic and diffusion approximations are used to derive expressions for the expectations of the number of infected individuals in each group. The variances and covariance are calculated for the diffusion process. Cumulant equations are derived and solved and the results compared with results from the deterministic and diffusion approximation.

53

Poster Session

Drugs-related deaths in the fortnight after release from prison: a meta-analysis Elizabeth L.C. Merrall MRC Biostatistics Unit Presented by: Elizabeth Merrall Being behind bars might be viewed as grim; but coming back out again can be just as tough. This is particularly the case for those with a history of heroin injection, who have had reduced access to opiates whilst incarcerated. Freedom and a reduced tolerance can be a lethal combination. This was first quantified by Seaman, Brettle & Gore (1998) when they showed that, for male HIV-infected injectors, the relative risk of overdose death was eight times higher (95% CI: 1.5-39) in the fortnight after release from prison than per fortnight during the subsequent 10 weeks. Similar findings have since been reported in the wider, general prison populations, worldwide. We have identified studies from Scotland (Bird & Hutchinson, 2002), France (Verger et al., 2003), England (Farrell et al., 2005), Australia (New South Wales, Kariminia et al., 2006), Denmark (Christensen et al., 2006) and USA (Washington State, Binswanger et al., 2007). This poster presents an international meta-analysis of drugsrelated deaths in the 12 weeks post-release from prison, with particular emphasis on relative risk in the first fortnight versus subsequent fortnights.

54

Poster Session

A Study of Chaotic Intermittency Maps and an Analysis of Consumer Data David Natsios, R. Bhansali, T. Cox University of Liverpool / Unilever Presented by: David Natsios Long-memory time series analysis is becoming increasingly popular in time series literature. Several methods have been developed for estimating the long-memory parameter and these estimators are often tested for bias, consistency and prediction capability in simulation studies involving linear long-memory models such as an FAR(p,d) model. In our study, we make use of a relatively new range of chaotic intermittency maps. In previous studies it has been shown that these maps can simulate stationary time series with a full range of values for the long-memory parameter. Further more, asymptotic proofs are available to give the ‘true’ value of d to be taken as known. We carry out a simulation study to test various longmemory estimation techniques when the assumptions of linearity and Gaussian distribution no longer hold. Our results help to reinforce the asymptotic expectations of d, although we show that bias can be quite large, considerably near the boundary conditions of 0 and 0.5. These biases tend to decrease slowly as we increase our series length to over a million observations. We also study a bivariate version of the polynomial map for which much less is known. Taking the x and y co-ordinates as two time series we apply the long-memory estimation methods. We explore the density structures of this new map and consider the possibility of fractional non-linear co-integration between x and y. Finally, we carry out an analysis on some new real data which show long-memory characteristics. This data concerns the movement of seven sensors attached to a human subject whilst applying a deodorant stick to their underarm. As well as looking at the possible long memory parameters of the data, we also look into methods of reducing dimensions, grouping the data and modelling it with the aim of interpretation and simulation of further life like results.

55

Poster Session

Identifying and Evaluating Prognostic and Surrogate Markers for Response in the Treatment of Tuberculosis Patrick PJ Phillips MRC Clinical Trials Unit, London / London School of Hygiene and Tropical Medicine Presented by: Patrick Phillips A surrogate marker is used in the context of a clinical trial to substitute for the final endpoint. To be a perfect substitute, it should fully capture the treatment effect on this final endpoint – the test of the null hypothesis of no treatment effect on the surrogate endpoint should also be a valid test of the corresponding null hypothesis based on the true endpoint. A properly validated surrogate marker can shorten trial duration, reduce sample size and save money. Tuberculosis (TB) is a curable disease, but incidence is still growing by 1\% a year globally with 20,000 people developing active disease and 5000 people dying of TB every day. New drugs to increase efficacy, shorten treatment regimens or combat drug resistant strains are in various stages of development, but this is a long process with Phase III trials taking more than five years to complete and requiring large numbers of individuals in order to show a treatment effect. A surrogate marker for long-term response would be an invaluable weapon in the fight against TB. A number of markers have been proposed as possible predictors of treatment response, but no formal statistical analysis has been attempted to evaluate these markers as surrogates. The aim of this research project is to identify and evaluate new and existing markers as surrogate endpoints for long-term response to treatment for TB using data from twelve highly influential TB clinical trials conducted by the MRC across East Africa and East Asia during the 1970s and 80s involving nearly 10,000 participants. Several candidate markers, including longitudinal measures, have been identified and existing statistical methods will be extended to evaluate these as surrogates. The analysis is ongoing, but results from the most promising marker, the two month culture result, will be presented.

56

Poster Session

A Statistical and Multivariate Longitudinal Analysis of Poverty Indices in Middle East & North Africa and Africa Yarim Shamsan Statistics Research and Analysis Group (SRAG) / National Foundation for Educational Research (NFER), The Mere, Upton Park, Slough, Berkshire SL1 2DQ Presented by: Yarim Shamsan The subject of international poverty has attracted a number of leading international organisations over the years. These organisations are trying to tackle poverty in every realistic and possible way. An analysis of poverty indices for some 66 countries in the Middle East & North Africa and Africa has revealed some differences and trends, and country specific trends not simply explainable due to geographical location. Trends in Gross National Product (GNP) poverty indicator shows that the top 25 percent of the 66 countries are increasing in GNP and diverging away from the bottom 50 percent. A generally worsening position of GNP was found for Burundi, Congo Democratic Republic and Sierra Leone. Fertility rate indicator shows an improvement with time as a whole, but there are individual African countries (Angola, Chad, Equatorial Guinea and Gabon) that show a worsening trend and divergent behaviour from other countries. In addition, three Middle East & North African countries (Djibouti, Iraq and Yemen) have profiles more consistent with African countries when considering Fertility Rate. There was found to be a steady increase in life expectancy indicator with the top 25 per cent of the 66 countries moving away from the bottom 50 per cent. There are 23 countries with decreasing life expectancy (predominantly African), in addition to Djibouti and Yemen (Middle East & North Africa) being clustered as being in Africa. Mortality rate seems to be improving in general but there is a worsening position for 12 countries (predominantly African). Three Middle East & North African countries (Djibouti, Iraq and Yemen) have Mortality rates similar to African countries. These conclusions indicate that government targets for tackling world poverty are unlikely to be universally met.

57

Poster Session

Does Population Mixing Measure Infectious Exposure at the Community Level? John C Taylor Paediatric Epidemiology Group, University of Leeds Presented by: John Taylor Background and methods: There is growing evidence that some chronic diseases, including asthma, leukaemia, brain tumours and autoimmune diseases may be associated with exposure to infections. Infections may act through a direct mechanism with specific infections precipitating disease, or through a process such as the ‘hygiene hypothesis’, in which a lack of exposure to infections in early childhood fails to correctly prime the immune system thereby increasing future risk of secondary disease. A series of studies have used population mixing, a term used to describe a process by which contact between people is promoted by their spatial movement, as a proxy for the level of infectious disease circulating in a community. For this study, nine areal measures of population mixing were compared with routine hospital admissions data for infections covering the West Midlands and Eastern England to develop a valid and reproducible infectious disease proxy for future epidemiological studies. Results: Migration and commuting based distance measures showed a strong significant negative association with hospital admissions for childhood (0-14 years of age) infections and a weaker association in the older age group (over 14 years of age). The measure that showed the most consistent significant positive association with infections in both age groups was the diversity of origins of childhood migrants. Conclusions: The results suggest distance commuted is the most reliable measure of population mixing but is not a good proxy for the level of infectious disease. The most consistent measure of population mixing that reflects the level of infectious disease is the diversity of origins of childhood migrants. Previous epidemiological studies have often assumed that there is a positive association between population mixing and the level of infectious disease; our study shows that this is not the case for the majority of measures.

58

Poster Session

LIST OF DELEGATES Surname

Forename

Institution/ company

Alder Anandaciva Ancona Anderson Aniyeloye Barry Barton Blackwell Bucknall Chatfield Chelliah Coleman Cook Cripps Curnow Danciu Dhaliwal Dhillon Earle Edwards Field Finselbach Francois Gillard Gilmore Grahame Gray Grigg Hall Harding Harvey Henmi Hudson Ingle Jitlal Johnson Kanhere Kapoor Kenyon Khan Knight

Nicola Sivakumar Miguel Mary-Jane Deborah Sarah Sheila Joanne Robert Mark Brian Ruth Kate Edward Elinor Aniela Jen Mohit Denise Julia Jonty Hannah Romain Jonathan Fiona Peter Laura Olivia Seb Nick Paul Masayuki Alex Suzanne Mark Hayley Anagha Deepa David Nadeer Marina

Centre for Statistics in Medicine Department of Health Man Investments NHS National Services Scotland Communities and Local Government University of Glasgow University of Southampton UK Transplant / YSM 2007 team Office for National Statistics MRC Human Nutrition Research GlaxoSmithKline University of Oxford DASA University of Sheffield UK Transplant Academy of Economic Studies, Bucharest GlaxoSmithKline UWE / Barclays Nui Maynooth DASA Man Investments Office for National Statistics Mango Solutions Cardiff University GlaxoSmithKline Legal & General University of Nottingham MRC Biostatistics Unit DASA University of Leeds UK Transplant University of Warwick UK Transplant / YSM 2007 team University of Bristol Cancer Research UK / UCL University of the West of England University of Southampton Queen Mary, University of London DASA HM Treasury University of Bristol

Surname

Forename

Institution/ company

Kohlmann Kontos Lay Limer Massa Mendy Merrall Messow Morden Mottram Mt-Isa Mumford Myllymäki Natsios Nelson Niemi Nunes Omer Pacillo Palmer Panayiotou Parker Phillips Pioli Purutcuoglu Radford Rajala Raji Robinson Scott Shafe Shamsan Taylor Thomas Thomas Titman Vowler Vrotsou Waters Williams Woodhill Zapettis

Mareike Dimitris Tiffany Laura Maria Sofia Sang Elizabeth Claudia-Martina James Sara Shahrul Lisa Mari David Christopher Aki Matthew Showgi Simona Tom Nayia Ben Patrick Sue Vilda Lucy Tuomas Olaide Francesca Emma Anna Yarim John Helen Kate Andrew Sarah Kalliopi Rachel Laura Nicholas Amy

University of Munich / Roche Diagnostics GlaxoSmithKline DASA University of Sheffield University of Padova / Oxford University University of Liverpool MRC Biostatistics Unit University of Mainz UK Transplant Primary Care Musculoskeletal Research Centre Wolfson Institute of Preventive Medicine UK Transplant University of Jyväskylä University of Liverpool University of Leicester University of Jyväskylä University of Bristol ISD University of Sannio University of Leicester University of Bristol Queen Mary, University of London MRC Clinical Trials Unit UK Transplant / YSM 2007 team University of Lancaster University of Sheffield University of Jyväskylä University of Leeds UK Transplant National Foundation for Educational Research Medicines and Healthcare products Regulatory Agency National Foundation for Educational Research University of Leeds UK Transplant / YSM 2007 team University of Bristol MRC Biostatistics Unit University of Cambridge University of Cambridge University of Oxford Legal & General Office for National Statistics GlaxoSmithKline

Paper Version Final

Booklet 21 Taras DDSC 2013 version 1.3 doc.pdf

Pakatan Harapan Budget 2016 (Final final Version) (BM) v3.0).pdf ...

permission final version 10 december 02

Final Version NMHS Athletic Handbook.pdf

CATOxidonitroso (version final 2).pdf

Cutworm booklet - Final-FR-May1-2017.pdf

AERO FINAL BOOKLET SECOND SEMESTER 2018(1).pdf ...

Cutworm booklet - Final-EN-April27.pdf

Booklet - Tiwoho EMR site history (final).pdf

OHV Final Version 1.1.pdf

constraints - final publication version

Booklet - Tiwoho EMR site history (final).pdf

Colorado Securities Act (Website Version) - FINAL DRAFT.pdf ...

Version FInal Dragado CD IADO.pdf

constraints - final publication version

Ruby 1.9.x Web Servers Booklet

Web Version (Sting Variations).pdf

ARC Resource Web Version final.pdf

Informe VRIN version Web FINAL.pdf

website version dec 2007 pnl rel letter sharing

Final Guide Web v1.2.pdf

ISPST Final Program for web