Two Genealogy Models Overview This document attempts to briefly describe the Record and Conclusion models within GEDOMX. It also explains the need for the two models, how they’re different, and why they’re separate.

Background For decades, GEDCOM has remained the de facto standard for communicating genealogical data between people and systems. Although significant limitations have existed in this format, it’s been adequate enough to persist despite various attempts to replace it. GEDCOM was the first standard. Prior to the Internet, people would perform painstaking research, and (even more painfully ) attempt to put their evidence and conclusions together into GEDCOM files. There wasn’t much choice for them to do anything else besides:

1) Manually searching and finding historical evidence at libraries, archives, and other institutions. 2) Entering Conclusions based on the evidence they found 3) Attempting to document their Evidence in a GEDCOM file by: a. Referencing the evidence with a bibliographic citation, OR b. Photo-scanning the evidence and embedding the content (there was nowhere else to put these), OR

c. Both 4) Copying their file to another floppy disk 5) Giving the copy to anyone who cared With the emergence of the web, email replaced steps 4 and 5 above, but there were still two problems with:

1) Conclusions - Since there wasn’t a common repository of conclusions, people weren’t aware of research completed by others, so work was being duplicated over and over by people related to the same ancestors.

2) Evidence - If you only had a bibliographic citation, you still couldn’t see a qualitative-enough representation of the evidence without either:

a. Driving to the facility that kept the evidence to see it for yourself, OR b. Having a photo-copy manually sent to you somehow, either through email or postal service. So, websites were built to attempt to solve these problems. Specifically, they provided:

1) Conclusions - A common database for everyone to store and share their conclusions, AND

2) Evidence - A massive warehouse containing representations of all the world’s historical documents! This is where we are today, but unfortunately, we still have two problems with:



1) Conclusions - The idea of a common conclusion database is great, unless there are several of them. Today, if one wants to switch systems or use multiple systems, they must export (guess what?) a GEDCOM file from System A and import/upload it to System B. In today’s weboriented world, this should be better facilitated with modern formats (e.g. XML, JSON, etc.) and modern technologies (e.g. web services).

2) Evidence - It will take decades for any single organization to complete the digitization of all the world’s historical records, if ever. What do we do in the meantime? What if no single organization ever digitizes all the records themselves? Also, some archives want to be the exclusive online repository for their own records so they can charge fees to cover their costs. How can data consumers easily integrate with these systems while still serving the interests of the producer? These problems, as with other genealogical problems in the past, must be solved somehow with new innovation.

The Solution For Conclusions, the GEDCOMX Conclusion Model addresses the need to share conclusion data by modernizing the integration format and technologies used between various conclusion-based systems. GEDCOM’s replacement is essentially the GEDCOMX Conclusion Model. For Evidence, there are significant integration barriers across disparate evidence repositories. Archives and genealogy companies need to share and reference each other’s evidence data in a common format to overcome these barriers. The GEDCOMX Record Model is intended to be this format. It allows for the common exchange of record data (including digital transcriptions and URLs of evidence images). At the same time, an archive can keep its digital images behind a “pay wall” and let the record data it has published to other systems drive traffic to its images. The GEDCOMX Record Model has no predecessor within the legacy GEDCOM format. It is a new model for sharing and referencing genealogical evidence, not conclusions, in a world of individual online archives. These two models, while targeting very different domains, are intended to be compatible. Conclusions will naturally refer to evidence; so too, conclusion data should be able to reference, and even consume, evidence data. In this way, the Conclusion Model is aware of, references, and consumes the Record Model.

Why Not One? Since all we’re talking about is genealogy, why not have one “genealogy model”? Both models need to the same type of information; why can’t they be the same objects? Looking at the surface, these models have a lot of common terms. There are Relationships, Names, Facts, Dates, Places, Genders, etc. in both models. Additionally, the structures of these two models appear to be almost identical. For example, in both models, a Person(a) has Names, Facts, and Gender; a Relationship has Facts and references two Person(a)s; and a Fact has a Date and Place. These are the same domain, no? No. Common terms do not equate to common domain. Name, Account, Dollar Amount, Tax, and Transaction Date are also common terms in accounting systems, but that doesn’t mean Accounts Payable, Account Receivable, and Payroll systems should share the same schema.

The Record and Conclusion Models are, semantically, trying to accomplish very different things. The Conclusion Model is trying to justify the conclusion that a person existed with associated information and specific relationships to other persons. The Record Model is documenting the existence of a record concerning some persona. If we define the domain as “genealogy” instead of “genealogical conclusions” and “genealogical evidence”, then we could try to put all these terms into one all-purpose, generic “genealogy model” to handle the wide breadth of use cases and processes within the industry. Combining these domains would make for a smaller more-generic model which would confuse the “person existed” assertion with the “record existence” documentation. This would also increase implementation complexity. Additional code must be written across multiple systems to differentiate between objects used for transcriptions and objects used for conclusions. These objects have very different requirements. In the context of software design principles, merging these objects would violate the “Single responsibility principle” and “Interface segregation principle” within SOLID object-oriented design. See http://en.wikipedia.org/wiki/SOLID_(object-oriented_design) Here’s a list of some of the dichotomies between the two models: Record Model

Conclusion Model

Evidence-based

Conclusion-based

Generally, a snapshot in time

Across time

Static, with few re-edits

Dynamic, with many re-edits

Field class

Normalizable Interface - is a composite or conclusion of multiple transcriptions

- is transcription Transcription-specific members of Field class - original - interpreted

Conclusion-specific members of Normalizable Interface - value

- label Record class

(nothing equivalent)

Can effectively model some non-person information

Not very conducive for non-person information

Records are self-contained, they don’t reference each other

Conclusions reference records and other conclusions

Do not have negative assertions, they only positively assert what’s on the record

Can have negative assertions

Age class

(nothing equivalent)

Facts exist on Records that are not at the Persona or Relationship level. For example, film/image number, page number, etc.

(nothing equivalent)

(nothing equivalent)

NameForm class

(nothing equivalent)

Name has one Primary Form and multiple

Alternate Forms Primary Fact and Principal Persona within a Record

(nothing equivalent)

Has DatePart, PlacePart, AgePart for identifying when Fields on an image are separated and when they’re combined

(nothing equivalent)

Can be used by Archives and Geneology Companies

Only used by Geneology Companies

Processes are more tailored to fast data-entry use cases. It is more valuable for users to key breadth over depth. That is, we’re more interested in data-entry users keying more records with fewer, high-quality fields such as name, date, and place; instead of keying fewer records that go deeper into lower-value fields. Entering how confident a user is in their transcription of each field, what they concluded, why they concluded it, etc. will give us deeper records, but significantly fewer of them. We’re more interested in helping people find the evidence image, not rationalizing the existence of the evidence data. Yes, this may be heresy to some genealogists, but it’s the most pragmatic approach for most organizations.

Processes are targeted to accurately capturing research and conclusions with high attribution, citation, and confidence support. Highly genealogically-sound processes are intended to be supported.

The above table is based on version 0.10 of both models.

The challenge that these two models present is that, while very different, one must consume the other. To this end, we’ve attempted to keep the Record Model very consumable by the Conclusion Model; which partially explains why they have common terms and similar structures: not because they inherently are the same, but because they’re very different and we’re trying to make one map-able to the other. While maintaining map-ability is crucial; providing map-ability by keeping the process models (classes) in lock step between the two models fosters increased harmful complexity.

Industry Need and Best Practice Most archives don’t care what a researcher concludes about their records. They present the evidence in the most consumable format (e.g. first-hand viewing, photo-copies, digital representations, etc.) and let the viewer judge for themselves. Archives only need systems and schemas that present the evidence in its most objective form. We must provide a simple model for them to participate with the genealogical industry and publish the world’s records to all in a common format. Archives have no need to host systems or schemas that allow people to build conclusion towers on top of their evidence. For conclusion towers, we rely on conclusions systems hosted by genealogical companies. Fortuitously, many of these same companies also host their own evidence-based systems to support their conclusion systems, and we can look to these industry-leading organizations for insight as to what the generally-accepted best practice is for combining or separating these systems and models. Today, all industry-leading genealogical websites separate their conclusion systems from their evidence systems, of course with the allowance for conclusions linking to and referencing evidence. Without enumerating them here, an entirely different set of processes, requirements, and use cases exist for transcribing a record than for documenting one’s genealogical research. These different use

cases have independently brought the same evolutionary outcomes across multiple genealogy companies; thus, objectively proving the need for two systems and two domains. In other industries, such as within detective or investigative domains, there exists a similar division between the evidence system and the hypothesis/theory system. It’s simply not effective to mix the objective evidence (including recordings and measurements) with everything we’re hypothesizing based on that evidence. Yes, in a purely academic sense, one can say even a transcription is a conclusion, particularly in cases where the original record creator used illegible writing; but this is more of an intellectually-interesting stretch than an accurate portrayal of reality, since most hand-written records are quite legible and form-based, so names, dates, and places are clearly identifiable. Moreover, saying “transcriptions are conclusions” for records created with typewriters is downright silly, and it’s beyond ridiculous when referring to digitally-born evidence. The exceptional “bad handwriting” example shouldn’t redefine a domain at the expense of the majority case.

Summary This document has attempted to explain the rationale behind the existence and separation of the Conclusion and Record Models within GEDCOMX. While our industry benefits from compatibility between the two models, they must still be developed within their own semantic domains. As with other industries, segregating domains is ultimately more effective in solving the unique needs of each domain.

Two Genealogy Models - GitHub

These two models, while targeting very different domains, are intended to be ... we define the domain as “genealogy” instead of “genealogical conclusions” and “ ...

46KB Sizes 15 Downloads 332 Views

Recommend Documents

Warped Mixture Models - GitHub
We call the proposed model the infinite warped mixture model. (iWMM). ... 3. Latent space p(x). Observed space p(y) f(x). →. Figure 1.2: A draw from a .... An elegant way to construct a GP-LVM having a more structured latent density p(x) is.

and two- mode models
distribution network operation plans. A well-filled-out database is a suitable source for deter- mining wear-out characteristics of components - if informa-.

Intel ME: Two Years Later - GitHub
In first versions it was included in the network card, later moved into the chipset ... HECI/MEI driver, management services, utilities. AMT SDK, code ... ME Gen 1. ME Gen 2. SEC/TXE. ME versions. 1.x-5.x. 6.x-10.x. 1.x (Bay Trail). Core. ARCTangent-

CamJam EduKit Sensors Worksheet Two Equipment ... - GitHub
Jun 20, 2017 - LEDs and Buzzer camjam.me/edukit .... Next, push the buzzer into the breadboard with the buzzer itself straddling the centre of the board. The.

Building patient-level predictive models - GitHub
Jul 19, 2017 - 3. 3.2 Preparing the cohort and outcome of interest . .... We will call this the cohort of interest or cohort for short. .... in a way that ensures R does not run out of memory, even when the data are large. We can get some .... plotDe

Additive Genetic Models in Mixed Populations - GitHub
It needs to estimate one virtual variance for the hybrid population which is not linked to any genetic variance in the real world. .... Setup parallel computing.

Nebraska Genealogy Events.pdf
Video conference via Skype. Stromsburg Library. March 29, 2011. 7:00 pm – 9:00pm. LDS Family History Center. 3000 Old Cheney Road. Lincoln, Nebraska.

Two models of unawareness: comparing the object ... - Springer Link
Dec 1, 2010 - containing no free variables.3 We use OBU structures to provide truth conditions only ..... can envisage an extension where unawareness of properties is also modeled. ..... are the identity when domain and codomain coincide.

Models and Algorithms for Three-Stage Two ...
Nov 30, 2005 - {puchinger|raidl}@ads.tuwien.ac.at. Preprint submitted .... derbeck solves a three-stage two-dimensional cutting stock problem, where the main ...

Two models of unawareness: Comparing the object ...
Comparing the object-based and the subjective- state-space ... In this paper we compare ...... Incorporating unawareness into contract theory. Mimeo, University ...

Acquisition of nonlinear forward optics in generative models: Two ...
Illustration of proposed learning scheme. (a) The network architecture. ..... to lth object basis in foreground and mth in background) are presented. Here, we in-.

Two models of unawareness: comparing the object ... - Springer Link
Dec 1, 2010 - In this paper we compare two different approaches to modeling unawareness: the object-based approach of Board and Chung (Object-based unawareness: theory and applications. University of Minnesota, Mimeo, 2008) and the subjective-state-s

Estimation and Inference for Linear Models with Two ...
Estimation and Inference for Linear Models with Two-Way. Fixed Effects and Sparsely Matched Data. Appendix and Supplementary Material. Valentin Verdier∗. April 21, 2017. ∗Assistant Professor, Department of Economics, University of North Carolina,

Regression models in R Bivariate Linear Regression in R ... - GitHub
cuny.edu/Statistics/R/simpleR/ (the page still exists, but the PDF is not available as of Sept. ... 114 Verzani demonstrates an application of polynomial regression.

Simulating a two dimensional particle in a square quantum ... - GitHub
5.3.12 void runCuda(cudaGraphicsResource **resource) . . . . . 17 ... the probabilities of the position and the energy of the particle at each state. ..... 2PDCurses is an alternative suggested by many http://pdcurses.sourceforge.net/. The.

Genealogy and Evolvability
mutations. Rather, path-dependence works as a background condition to evolvability, ... illustration of robust-process explanation implies path-dependence. 14 ...

foucault, genealogy, history
"anti,” held fast in the essence of that over against which it ..... “whether it is chemical, biological, social or ..... Journal for Thcoretical Studies in Media and ('ulture.

Two-tier App migration on Azure Container: Lab Guide ... - GitHub
OnPremisesRG (any valid name) b. Location: East US(any location) c. Virtual Machine Name: WebDbServer (Any valid name) d. Admin User: demouser (any username of your choice) e. Admin Password: . Accepts terms and conditions and click on Purchase butto

Two new keywords for interactive, animated plot design ... - GitHub
Mar 16, 2016 - shiny web server package to acheive interactivity [RStudio, 2013]. Implementations of the ... In the static PDF version of this figure, the year 1979 and the ... tries, designed using clickSelects and showSelected keywords (top). Left:

Nietzsche's Genealogy as Performative Critique.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Nietzsche's ...