Contents

1 Introduction 1.1 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Notes on the History of Data Analysis . . . . . . . . . . . . 1.2.1 Biometry . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Era Piscatoria . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Psychometrics . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Analysis of Proximities . . . . . . . . . . . . . . . . . 1.2.5 Genesis of Correspondence Analysis . . . . . . . . . 1.3 Correspondence Analysis or Principal Components Analysis 1.3.1 Similarities of These Two Algorithms . . . . . . . . . 1.3.2 Introduction to Principal Components Analysis . . . 1.3.3 An Illustrative Example . . . . . . . . . . . . . . . . 1.3.4 Principal Components Analysis of Globular Clusters 1.3.5 Correspondence Analysis of Globular Clusters . . . . 1.4 R Software for Correspondence Analysis and Clustering . . 1.4.1 Fuzzy or Piecewise Linear Coding . . . . . . . . . . . 1.4.2 Utility for Plotting Axes . . . . . . . . . . . . . . . . 1.4.3 Correspondence Analysis Program . . . . . . . . . . 1.4.4 Running the Analysis and Displaying Results . . . . 1.4.5 Hierarchical Clustering . . . . . . . . . . . . . . . . . 1.4.6 Handling Large Data Sets . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

1 1 3 4 4 5 7 8 9 9 10 11 13 14 17 17 18 18 20 21 27

2 Theory of Correspondence Analysis 2.1 Vectors and Projections . . . . . . . . . . . . 2.2 Factors . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Review of Metric Spaces . . . . . . . . 2.2.2 Clouds of Points, Masses, and Inertia . 2.2.3 Notation for Factors . . . . . . . . . . 2.2.4 Properties of Factors . . . . . . . . . . 2.2.5 Properties of Factors: Tensor Notation 2.3 Transform . . . . . . . . . . . . . . . . . . . . 2.3.1 Forward Transform . . . . . . . . . . . 2.3.2 Inverse Transform . . . . . . . . . . . 2.3.3 Decomposition of Inertia . . . . . . . . 2.3.4 Relative and Absolute Contributions . 2.3.5 Reduction of Dimensionality . . . . . .

. . . . . . . . . . . . .

29 29 32 32 34 35 36 36 38 38 38 38 39 39

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

xv

xvi

Correspondence Analysis and Data Coding

2.4

2.5

2.6 2.7

2.8

2.3.6 Interpretation of Results . . . . . . . . . . . . . . . . . 2.3.7 Analysis of the Dual Spaces . . . . . . . . . . . . . . . 2.3.8 Supplementary Elements . . . . . . . . . . . . . . . . . Algebraic Perspective . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Processing . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Axes and Factors . . . . . . . . . . . . . . . . . . . . . 2.4.5 Multiple Correspondence Analysis . . . . . . . . . . . 2.4.6 Summary of Correspondence Analysis Properties . . . Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Hierarchical Agglomerative Clustering . . . . . . . . . 2.5.2 Minimum Variance Agglomerative Criterion . . . . . . 2.5.3 Lance-Williams Dissimilarity Update Formula . . . . . 2.5.4 Reciprocal Nearest Neighbors and Reducibility . . . . 2.5.5 Nearest-Neighbor Chain Algorithm . . . . . . . . . . . 2.5.6 Minimum Variance Method in Perspective . . . . . . . 2.5.7 Minimum Variance Method: Mathematical Properties 2.5.8 Simultaneous Analysis of Factors and Clusters . . . . Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further R Software for Correspondence Analysis . . . . . . . 2.7.1 Supplementary Elements . . . . . . . . . . . . . . . . . 2.7.2 FACOR: Interpretation of Factors and Clusters . . . . 2.7.3 VACOR: Interpretation of Variables and Clusters . . . 2.7.4 Hierarchical Clustering in C, Callable from R . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Input Data Coding 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Fundamental Role of Coding . . . . . . . . . . . 3.1.2 “Semantic Embedding” . . . . . . . . . . . . . . . . 3.1.3 Input Data Encodings . . . . . . . . . . . . . . . . . 3.1.4 Input Data Analyzed Without Transformation . . . 3.2 From Doubling to Fuzzy Coding and Beyond . . . . . . . . 3.2.1 Doubling . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Complete Disjunctive Form . . . . . . . . . . . . . . 3.2.3 Fuzzy, Piecewise Linear or Barycentric Coding . . . 3.2.4 General Discussion of Data Coding . . . . . . . . . . 3.2.5 From Fuzzy Coding to Possibility Theory . . . . . . 3.3 Assessment of Coding Methods . . . . . . . . . . . . . . . . 3.4 The Personal Equation and Double Rescaling . . . . . . . . 3.5 Case Study: DNA Exon and Intron Junction Discrimination 3.6 Conclusions on Coding . . . . . . . . . . . . . . . . . . . . . 3.7 Java Software . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Running the Java Software . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

39 40 41 41 41 41 42 43 44 46 46 46 49 49 52 53 54 55 57 57 58 58 61 64 67 69 71 71 72 73 75 76 77 77 79 80 85 86 92 98 99 103 104 105

Table of Contents 4 Examples and Case Studies 4.1 Introduction to Analysis of Size and Shape . . . . . . . 4.1.1 Morphometry of Prehistoric Thai Goblets . . . . 4.1.2 Software Used . . . . . . . . . . . . . . . . . . . . 4.2 Comparison of Prehistoric and Modern Groups of Canids 4.2.1 Software Used . . . . . . . . . . . . . . . . . . . . 4.3 Craniometric Data from Ancient Egyptian Tombs . . . . 4.3.1 Software Used . . . . . . . . . . . . . . . . . . . . 4.4 Time-Varying Data Analysis: Examples from Economics 4.4.1 Imports and Exports of Phosphates . . . . . . . 4.4.2 Services and Other Sectors in Economic Growth 4.5 Financial Modeling and Forecasting . . . . . . . . . . . . 4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 4.5.2 Brownian Motion . . . . . . . . . . . . . . . . . . 4.5.3 Granularity of Coding . . . . . . . . . . . . . . . 4.5.4 Fingerprinting the Price Movements . . . . . . . 4.5.5 Conclusions . . . . . . . . . . . . . . . . . . . . .

xvii 111 . . . 111 . . . 111 . . . 116 . . 118 . . . 130 . . . 135 . . . 139 . . . 140 . . . 140 . . . 145 . . . 148 . . . 148 . . . 149 . . . 150 . . . 158 . . . 160

5 Content Analysis of Text 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Accessing Content . . . . . . . . . . . . . . . . . . . 5.1.2 The Work of J.-P. Benz´ecri . . . . . . . . . . . . . . 5.1.3 Objectives and Some Findings . . . . . . . . . . . . 5.1.4 Outline of the Chapter . . . . . . . . . . . . . . . . . 5.2 Correspondence Analysis . . . . . . . . . . . . . . . . . . . . 5.2.1 Analyzing Data . . . . . . . . . . . . . . . . . . . . . 5.2.2 Textual Data Preprocessing . . . . . . . . . . . . . . 5.3 Tool Words: Between Analysis of Form and Analysis of Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Tool Words versus Full Words . . . . . . . . . . . . . 5.3.2 Tool Words in Various Languages . . . . . . . . . . . 5.3.3 Tool Words versus Metalanguages or Ontologies . . . 5.3.4 Refinement of Tool Words . . . . . . . . . . . . . . . 5.3.5 Tool Words in Survey Analysis . . . . . . . . . . . . 5.3.6 The Text Aggregates Studied . . . . . . . . . . . . . 5.4 Towards Content Analysis . . . . . . . . . . . . . . . . . . . 5.4.1 Intra-Document Analysis of Content . . . . . . . . . 5.4.2 Comparative Semantics: Diagnosis versus Prognosis 5.4.3 Semantics of Connotation and Denotation . . . . . . 5.4.4 Discipline-Based Theme Analysis . . . . . . . . . . . 5.4.5 Mapping Cognitive Processes . . . . . . . . . . . . . 5.4.6 History and Evolution of Ideas . . . . . . . . . . . . 5.4.7 Doctrinal Content and Stylistic Expression . . . . . 5.4.8 Interpreting Antinomies Through Cluster Branchings 5.4.9 The Hypotheses of Plato on The One . . . . . . . .

. . . . . . . .

161 161 161 161 163 164 164 164 165

. . . . . . . . . . . . . . . . .

166 166 167 168 170 171 172 172 172 174 175 175 176 176 177 179 179

xviii 5.5

Correspondence Analysis and Data Coding

Textual and Documentary Typology . . . . . . . . . . . . . . 5.5.1 Assessing Authorship . . . . . . . . . . . . . . . . . . . 5.5.2 Further Studies with Tool Words and Miscellaneous Approaches . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Conclusion: Methodology in Free Text Analysis . . . . . . . . 5.7 Software for Text Processing . . . . . . . . . . . . . . . . . . . 5.8 Introduction to the Text Analysis Case Studies . . . . . . . . 5.9 Eight Hypotheses of Parmenides Regarding the One . . . . . 5.10 Comparative Study of Reality, Fable and Dream . . . . . . . 5.10.1 Aviation Accidents . . . . . . . . . . . . . . . . . . . . 5.10.2 Dream Reports . . . . . . . . . . . . . . . . . . . . . . 5.10.3 Grimm Fairy Tales . . . . . . . . . . . . . . . . . . . . 5.10.4 Three Jane Austen Novels . . . . . . . . . . . . . . . . 5.10.5 Set of Texts . . . . . . . . . . . . . . . . . . . . . . . . 5.10.6 Tool Words . . . . . . . . . . . . . . . . . . . . . . . . 5.10.7 Domain Content Words . . . . . . . . . . . . . . . . . 5.10.8 Analysis of Domains through Content-Oriented Words 5.11 Single Document Analysis . . . . . . . . . . . . . . . . . . . . 5.11.1 The Data: Aristotle’s Categories . . . . . . . . . . . . 5.11.2 Structure of Presentation . . . . . . . . . . . . . . . . 5.11.3 Evolution of Presentation . . . . . . . . . . . . . . . . 5.12 Conclusion on Text Analysis Case Studies . . . . . . . . . . .

180 180 184 186 188 189 190 197 198 198 199 199 200 200 201 205 207 207 210 214 220

6 Concluding Remarks

221

References

223

Index

229

Contents - multiresolutions.com

Notes on the History of Data Analysis . . . . . . . . . . . . . 3. 1.2.1. Biometry . . . . . . . . . . . . . . . . . . . . . . . . . 4. 1.2.2. Era Piscatoria . . . . . . . . . . . . . . . . . . . . . . . 4. 1.2.3. Psychometrics . . . . . . . . . . . . . . . . . . . . . . . 5. 1.2.4. Analysis of Proximities . . . . . . . . . . . . . . . . . . 7. 1.2.5. Genesis of Correspondence Analysis . . . . . . . . . . 8. 1.3.

71KB Sizes 0 Downloads 347 Views

Recommend Documents

Contents - GitHub
May 9, 2006 - 3. 2 Requirements from SAGA Use Cases. 3. 2.1 Use Cases in Detail . ... A useful though informal way of understanding SAGA's aim and scope is to think of the MPI .... the design and implementation of specific API calls and maybe even th

Contents - GitHub
Dec 12, 2015 - 5.2.2 TIR-FCS with a square-shaped lateral detection volume . . . . . . . 30. 6 Troubleshooting. 32 ... 3http://www.gnu.org/licenses/gpl.html. 3 ..... This entry displays this documentation using the systems default PDF viewer. Wiki.

Contents - GitHub
Jan 7, 2015 - Foundation, either version 2 of the License, or (at your option) any later version. An exemplary usage of ... cython.org/src/tutorial/appendix.html.

Contents
Contents. Java. Chapter 4: Object-Oriented Analysis and Design ..... while I've been holed up in my office working on this book. I know they'd like to have ... than slapping down a few lines of code in Java (or C++, Eiffel, or any other object-orient

contents
Jun 2, 2008 - meeting. Fred Sewall made a motion to approve the minutes as ... changes down the road, the waiver should not necessarily go with it. Parker.

Contents -
A study on the use of online public access catalogue at the library of M.G.M College. DEVENDRA .... Device, Talking Typing Teacher Software, Braille Scanning Software, SARA, etc. The libraries should .... development of digital library initiatives in

contents
6 Nov 2014 - which is now in the form of a difference equation, suitable for coding in an appropriate programming language. This particular form of the PID algorithm is known as the 'positional' PID ... is initially positive. The integral component w

Contents
90. 4.4.4 Estimating by simulation: A logit-smoothed AR simulator 92. 4.5 Review and exercises . . . . . . . . . . . . . . . . . . . . . . . . . 97. 4.A Deriving the Multinomial Logit log-likelihood . . . . . . . . . . . 98. 5 Discrete Games. 100. 5.

Contents - Disability Rights California
Adult Day Health Services (ADHC). Annual Report 2010. 15. Administration. 100 Howe Ave.,. Suite 185-N. Sacramento, CA 95825. (916) 488-9955. Legal Offices. Sacramento. 100 Howe Ave.,. Suite 235-N. Sacramento, CA 95825. (916) 488-9950. Bay Area. 1330

Table of Contents - GitHub
random to receive a new welfare program called PROGRESA. The program gave money to poor families if their children went to school regularly and the family used preventive health care. More money was given if the children were in secondary school than

Table of Contents - Groups
It is intended for information purposes only, and may ... It is not a commitment to ... Levels of Security, Performance, and Availability. MySQL Enterprise. Audit ...

Contents - Beck-Shop
www.cambridge.org. © in this web service Cambridge University Press ... 3.2.2 Convex Distance Concentration and Rademacher Processes. 139. 3.2.3 A Lower ...

Contents
May 15, 2011 - describe recent E. cecorum-related lameness cases in Georgia, ... Page 2. Enterococcus cecorum is a normally innocuous inhabitant of the gastro-intestinal tract of various mam- ..... Broiler growers in the 19-State weekly program place

contents
(f) a ∧ b = 0 implies a ⊕ b = a ∨ b,. (g) a ∨ b = 1 implies a ⊙ b = a ∧ b. Proof. ... By Proposition 2.2.4 (a), it follows that 0 is the first element and 1 is the last element of A. In order to prove that l.u.b.{a, .... (d), (e) and (f)

contents
∗The Open University, ∗∗Lexical Computing Ltd. Abstract. In this paper ... be a “pernicious source of structural ambiguity in English” (Resnik 1999), they have ... the British National Corpus (bnc) (http://www.natcorp.ox.ac.uk). .... We use

Table of Contents
The Archaeological Evidence for the Jafnids and the Nas ̣rids. 172. Denis Genequand. 5. Arabs in the Conflict between Rome and Persia, AD 491–630. 214.

Contents - Ng Woon Lam
In the language of visual art, Color and Design are the most funda- mental qualities ... and today, its development and pace of practical use in the field of science ...

Table of Contents
Feb 24, 2012 - Commission for Africa (ECA) [South African. Mission]. E-mail: [email protected]. Mail: PO Box 1091, Addis Ababa, ETHIOPIA.

Contents - Ng Woon Lam
My practical experience and theoretical understanding with color issues helped me .... determine the speed of transition between the three pure colors. Fig.

Contents
... to create an SD Card. On a Mac the “Apple Pi Baker App” makes this easy, download available here: ... Download Cannybots Software login as the 'pi' user ...

Contents
1,2The Edward S. Rogers Sr. Department of Electrical and Computer Engineering .... First in Section 1.2, basic multilinear algebra is reviewed and the commonly used ..... the training set and repeating 20 times is to reduce the dependency of.

Contents
4 Mechanisms of Masking by Schroeder-Phase Complexes .................... 37. Magdalena Wojtczak and Andrew J. ... of the Tectorial Membrane in Cochlear Sensory Processing................ 69. Guy P. Richardson, Victoria ... 79. Ricardo Gómez-Nieto,