Quality Of The Baseline Data Considerable time and energy was invested in the organization and maintenance of this large body of data; extensive and thorough documentation has been prepared and maintained at our research facilities. To assess the validity of the electronic data, two studies were undertaken at the national level. In the first, twenty study registrants were selected at random; photocopies of all study forms were requested and compared with the computer-coded data. Of the 40,000 separate pieces of information examined, only 34 unique errors were identified, yielding an error rate of less than one-tenth of one percent. In a second study, the validity of obstetric information in the CPP records was assessed at two sites by comparing them with the hospital records of the same Centers. The hospital and CPP records for eight percent of the total samples were obtained, reviewed and abstracted by a physician. The authors conclude "that the Collaborative Study records contained extensive and detailed information not available in the other hospital records and that the Study records had a high standard of completeness in the two Centers where they could be evaluated" (Niswander and Gordon, 1972). We have further investigated the quality of the Boston and Providence cohort data, hand-checking randomly selected charts against the electronic data files and finding a 99.8% correspondence. For all of our investigations, we and others have examined the quality and validity of key aspects of the baseline data through a variety of biological and statistical techniques. This has covered: the quality and integrity of the archived sera; maternal reports of cigarette smoking in utero; microscopic examination of placental pathology; clinical rating of obstetrical complications; cognitive test results; behavioral ratings; neurological exams; and many other areas. Throughout, we have been struck by the quality of the data for these two sites. This is largely explained by the level of research expertise at the Harvard and Brown sites, where central administrative NIH personnel have reported particularly high quality and compliance with study protocol. Space permits only two examples. First, in an analysis of 448 CPP participants there was a significant agreement (kappa = 0.83) between serum cotinine concentration derived from 40 year-old frozen sera and maternal reports of smoking during pregnancy. Treating a serum cotinine concentration of >10 ng/ml as an index of active smoking, 94.9% of women who denied smoking and 87.0% of women who stated that they smoked reported their status correctly (Klebanoff et al., 1998). Self-reported smoking and serum cotinine concentrations were comparably correlated with infant birthweight (0.20 and 0.24, respectively). We have conducted similar analyses demonstrating the quality of obstetrician reports of hypoxic conditions, validated against levels of erythropoetin, and maternal/physician reports of fever against serum cytokine levels (Buka et al., 1998). Secondly, academic achievement, intelligence tests, personality measures, behavioral ratings, and tests of fine and gross motor function were administered to subjects at three ages (8 months, 4 years and 7 years). We have conducted analyses demonstrating the psychometric properties of several of these and report results for the cognitive measures used at age 7. We began by comparing published normative data with the results for the Boston and Providence cohorts. Following the ethnic composition and slightly lower SES of these cohorts, the sample has somewhat lower mean scores for the Wechsler Intelligence Scales for Children (WISC) and the Wide Range Achievement Test (WRAT) (mean Full Scale IQ = 95.3, SD = 14.1; mean WRAT Reading = 99.0, SD = 15.6; ranges 45-165), but values that are consistent with expectations. The intercorrelations between the WISC verbal and performance scales are essentially identical to those from the sample on which the measure was developed (Wechsler, 1949), as are the intercorrelations among the WRAT subtests (Jastak & Jastak, 1965). Finally, three month test-retest results for FSIQ with the Providence cohort are identical to published data for a separate community sample of youth ages 5-14 (Pearson correlation coefficient = 0.82) and superior for the stability of WRAT Reading and Spelling (0.91 and 0.84, respectively) (Brown et al., 1989). These results demonstrate the psychometric integrity of cognitive testing conducted with the New England cohorts.
Finally, we conducted a series of analyses to examine the construct validity of indices of learning disability (LD) and attention deficits that we have derived from the age seven assessment protocol (Buka et al., 1998). We reviewed the clinical and research literature and selected 15 known or suggested correlates of LD and ADD, including male gender, low SES, neurological soft signs, conduct problems, grade retention, and perinatal complications. We calculated the rates of each of these for subjects rated as LD, ADD and the remaining normal control subjects, controlling for gender and SES. Our results (not shown due to space limitations) provide strong validation of the baseline data and our classifications. Comparing LD subjects to normal controls, all of the predicted associations were confirmed at the p<.05 level. The LD group was twice as likely to be rated as suspect on the 7-year neurological examination, 3.7 times more likely to have repeated a grade by age 7, 3.5 times as likely to be in a special class by age 7, and 1.8 times as likely to receive a rating of "conduct disordered" at age 7. Follow-up information showed that the LD group was somewhat more likely to have repeat arrests for delinquency. Over half of the children with LD at age 7 had severe and persistent learning disorders, based on extensive neuropsychological examination, at age 35. Children with early signs of ADD were 1.7 times more likely than normal controls to come from lower SES families, 1.5 times more likely to be of low birthweight and had significantly higher rates of grade retention, placement in special classes, learning disability and conduct problems. By grade 12 this group was more likely to be placed in a special class with a diagnosis of emotional or behavioral disorder sample, and, by age 18, was twice as likely to have been arrested repeatedly for delinquent offenses. These and other analyses indicate that the administration and results of cognitive, academic and behavioral assessments in the New England NCPP cohorts are consistent with, and equally reliable and valid as, other published normative samples.
References Brown, S. J., Rourke, B. P., & Cicchetti, D. (1989). Reliability of tests and measures used in the neuropsychological assessment of children. Clinical Neuropsychologist, 3, 353-368. Buka, S. L., Satz, P., Seidman, L., & Lipsitt, L. P. (1998). Defining learning disabilities. The role of longitudinal studies. Thalamus, 16, 14-29. Buka, S. L., Yolken, R. H., Torrey, E. F., Klebanoff, M. A., & Tsuang, M. T. (1998, November). Viruses, fetal hypoxia and subsequent schizophrenia: A direct test of infectious agents using prenatal sera. Paper presented at the 4th annual symposium on the neurovirology and neuroimmunology of schizophreniz and bipolar disorder, Bethesda, MD. Klebanoff, M. A., Levin, R. J., Clemens, J. D., DerSimonian, R., & Wilkins, D. G. (1998). Serum cotinine ocncentration and self-reported smoking during pregancy. American Journal of Epidemiology, 148, 259-262. Niswander, K. R., & Gordon, M. (1972). The women and their pregnancies. Washington, DC: U.S. Government Printing Office. Wechsler, D. (1949). Manual for the wechsler intelligence scale for children. New York: The Psychological Corporation.