GEDCOM X defines an open data model and an open serialization format for exchanging the genealogical data essential to the genealogical research process.
See the site for more information.
An open data model and an open serialization format for exchanging genealogical data.
Home Page: http://www.gedcomx.org
License: Apache License 2.0
GEDCOM X defines an open data model and an open serialization format for exchanging the genealogical data essential to the genealogical research process.
See the site for more information.
I noticed that several of these classes have a “URI base” as well as a “Links links”. Since “Links” already has a “URI base” in it, should the “URI base” be removed from these classes?
The implication for the existence of a SourceReferenceType is that SourceReferences can point to something other than a Source. If this is the case, then the two enums become one-and-the-same (with the possible exception that Sources may not be allowed to point to other Sources). I would suggest merging the two types into a single type called “GedcomXType” (or something else that doesn’t imply its use in Source or SourceReference) and I would add all the GedcomX entities to the list, e.g:
public enum GedcomXType {
source,
record,
persona,
person,
collection,
digital_artifact, // instead of image—more general
physical_artifact,
other
}
It is still unclear how how all the different parts of a Record will be attributed as changes happen. We need to clarify this right away to be sure we have all we need in the model to support this. In an earlier issue I suggested we need "Attribution" in EventRole and Relationship (couple, parent-child, and other). I would further like to suggest that Attribution be added to Event and Persona (It already exists on Characteristic since that is a Field). I believe this will suffice for a basic set of change operations. To get the discussion rolling, I have created a list of change operations on a Record. As we talk through these operations, it should bring to light the underlying assumptions about how records may be modified and whether or not we have a model that supports what we need. Here are the operations:
Field in the Record profile needs:
private List sources;
like Record. This is to support the "rectangle of an image" functionality. You could argue that it could be a single SourceReference, since it is very unlikely that a single Field would ever span more than 1 image.
Now that “type” has been removed from “PersistentId”, it has nothing but a URI, like this:
public final class PersistentId {
URI value;
I wonder if we could eliminate the PersistentId class altogether and just let objects with a persistent id just use URI e.g.:
public class Person {
URI persistentId;
…
I realize that having a “PersistentId” class documents the intent of the member variable more explicitly, but might the fact that the member variable is called “persistentId” be enough to explain the intent? My reason for making this suggestion is to simplify the client code. Instead of:
person.getPersistentId().getValue();
you would have:
person.getPersistentId();
The extra “getValue()” just seems superfluous when there is nothing else in the class.
Currently, “Collection looks like this:
public class Collection {
String id;
CollectionReference parent;
String title;
String description;
String publisher;
And Record’s tell which Collection they are part of, like this:
public class record {
…
CollectionReference collection;
Here are some questions I have:
• What about Records that are part of more than one Collection?
• What about Collections of other things, like a Collection of Persons or Images? Currently in CDS, Collections have Records, SubmittableUnits, Images, Films, and Waypoints. SubmittableUnits are groupings of records, but are not currently exposed in the UI, so we needn’t concern ourselves with them right now. Films and Waypoints are groupings of Images and currently, we only expose one or the other, so that Images have a single containment hierarchy. However, I don’t believe this will be sufficient going forward. Still, we need a way of expressing the containment hierarchy of Images, and we have a containment hierarchy for LLS. LLS is a Collection consisting of AF, PRF and a Collection of Sources, and—I hate to have to mention it—Repositories. PRF is a collection of user-submitted GEDCOM’s. Each user-submitted GEDCOM is also conceptually a Collection of Persons, and Relationships. We need a way of expressing all of these things.
• How are the contents of a collection described—in the “Links” section?
• The list of “contents” of a collection can be very large. What is the general paging mechanism?
• CDS Collections also have “modifiedDate”, “coverages”, “RecordType,” and a “completeness” value for records, images, and waypoints. We need a way of expressing all of this information.
I propose a rename of 'SourceType' to the more generic 'ResourceType' to allow for its use in a broader context.
In the current model there is no way to know who created a Relationship or added an EventRole to a Persona. This is because these things have no Fields.
This is a little bit of a funny animal since some of the roles currently defined imply the Gender of the Person involved, and some do not. However, when the Gender of a Person is changed, to keep the RelationshipRole in agreement it would also have to be modified. That is a little problematic in a system that is tracking the “who”, “when”, “why” for every change. Would the user be forced to give a reason for the change to the RelationshipRole? My suggestion would be to remove all the RelationshipRoles that imply Gender, e.g. put “Spouse” instead of “Husband” or “Wife”, “Parent” instead of “Father” or “Mother”. In the case where there is no genderless English word for the relationship, we would be forced into something like, “AuntOrUncle”. This is, I admit, a little ugly, but these kinds of relationships are rarely modeled and the common ones don’t have this problem.
I’m still questioning a little bit whether we ought to have RelationshipRole at all because of the “denormalized” nature of the data, and because it is not used in Persona (you know how I hate differences between the two models that “don’t need to be there”).
Do we still need this?
While GitHub provides a lot of nice features for "pulling" updates, sometimes I find that I need to "push" a message or announcement to people who are subscribed.
What are people's thoughts about setting up a Google Group?
Because of the 'other' attributes.
As gedcomx becomes more mature, we might want to specify lengths on our String types in the xsd. This would give us the following benefits:
Of course, there's always the potential for legitimately needing to exceed the maxLength on a given attribute, so the maxLengths would need to be set very conservatively. An analysis of our current data might help us find the right length(s) to use.
Currently “PersonReference” looks like this:
public final class PersonReference {
QName role;
URI href;
PersonReference is currently used by CoupleRelationship, ParentChildRelationship, and OtherRelationship. For CoupleRelationship and ParentChildRelationship the “role” would seem to be redundant and has the problem that it could disagree with the Relationship. If we decide to have a single “Relationship” with a “type”, then it would still be redundant. I would suggest that it be removed. If we decide to keep the three relationship classes, then my suggestion would be to put “RelationshipType” on “OtherRelationship”, so you would still remove the “role” from PersonReference. Since PersonReference is now nothing but an HREF, I would further suggest we eliminate the class and just use URI’s to refer to Persons in Relationships.
Shouldn’t the enum values be all uppercase according to Java style conventions?
CharacteristicType defines “Person” and “Couple” classes where the characteristics that are appropriate for a Person/Couple are listed. How can this information be used to determine if a given Characteristic is appropriate for a Person/Couple?
I just downloaded NClass and the latest record.ncp file and I get an error that the NCP file cannot be opened.
Can we get an updated record.ncp - or can we get a JPG or PDF form of the class diagram for reference?
I guess this is me contradicting myself or changing my tune, but, if Relationships are going to be entities, maybe it would be best if there weren’t different root elements like CoupleRelationship, ParentChildRelationship, etc. It might be better to have a single root element “Relationship” that has a type, “ParentChild”, “Couple”, etc. The type would define the roles of the two people involved in the relationship e.g. “ParentChild” would mean that person 1 is the parent and person 2 is the child. OF course, I guess I would then need to argue that we need to make this change in the “Record” world as well, because I hate it when they are different. Let’s discuss this.
I’m going to reveal a little of my ignorance with this, but currently GeoCode specifies longitude and latitude as “float”. I never use float anymore because I have had problems with the lack of precision. Are we sure we wouldn’t need “double”?
This approach of having a “www” flavor of an object that has “links” seems to be getting more and more problematic. It is exploding the number of classes (because lots of objects now have a “www” counterpart). Also, the WWW flavor of “Person” is hard to consume because it extends “ordinary Person” which has lists of “ordinary Name”, “ordinary Event”, etc. Your comment that says you will change these to be lists of “? extends {object}”. This helps, but won’t entirely solve the problem because anyone working with a WWW Person, when iterating the Names, for example, will have to check each Name to see if it is a “WWW Name” or an “ordinary Name”. If it is a “WWW Name” then the links are available, but if “ordinary Name” they are not. Yuck.
I propose we just have one set of objects, with Links, and that we ask the Data-Framework guys to just swallow the pill that their Record’s have a place for “Links” that they won’t be populating. I know they have pushed back hard on this, but sometimes concessions need to be made for the overall good.
From what I can tell, “Record” uses “SourceReference” whereas “Person” and “Relationship” use “AttributedSourceReference”. It would seem to me that all SourceReferences could be “attributed”. This gets into the whole attribution model for records, but I would like to push for a unified model for attributions. If you take my previous suggestion about “Contribution”, SourceReference’s would implement “Contribution”.
What about Collections of other things, like a Collection of Persons or Images? Currently in CDS, Collections have Records, SubmittableUnits, Images, Films, and Waypoints. SubmittableUnits are groupings of records, but are not currently exposed in the UI, so we needn’t concern ourselves with them right now. Films and Waypoints are groupings of Images and currently, we only expose one or the other, so that Images have a single containment hierarchy. However, I don’t believe this will be sufficient going forward. Still, we need a way of expressing the containment hierarchy of Images, and we have a containment hierarchy for LLS. LLS is a Collection consisting of AF, PRF and a Collection of Sources, and—I hate to have to mention it—Repositories. PRF is a collection of user-submitted GEDCOM’s. Each user-submitted GEDCOM is also conceptually a Collection of Persons, and Relationships. We need a way of expressing all of these things.
I would just like to say that I think “Lineage” as a type of “Characteristic” would be more ideal than as a member variable since, as a Characteristic it would automatically have all the “Conclusion” stuff (who, when, why) allowing it to be modified and tracked.
We need to either develop a simple model for expressing Contributor/User information, or adopt an existing specification.
We probably need a model for Notes on a Person (and possibly other entities).
We should probably change all mime subtypes to start with 'x-' to imply that they're not registered with IANA.
See http://en.wikipedia.org/wiki/Internet_media_type.
E.g. "application/x-gedcom-record-v1+xml"
Currently, when you serialize Attribution on Record and Field, it looks like this:
<attribution>
<gxa:contributor xlink:href="user/1"/>
<gxa:explanation>just because</gxa:explanation>
<gxa:timestamp>2011-06-08T16:53:58.511-06:00</gxa:timestamp>
</attribution>
I believe that if you would put the Attribution element in the "gxa" namespace, then it could look like this:
<gxa:attribution>
<contributor xlink:href="user/1"/>
<explanation>just because</explanation>
<timestamp>2011-06-08T16:53:58.511-06:00</timestamp>
</gxa:attribution>
Do you think this would look a little better?
This is how the record-profile ParentChildRelationship serializes to XML:
This is a request to provide Interfaces (and/or perhaps Abstract classes) in the Java reference implementation of GEDCOM X.
In trying to use the 0.1.0 release in another system a question arose as to whether we should Inherit or use Composition. Composition is often recommended/favored over Inheritance for a number of well documented reasons. Having the Java reference implementation provide a set of Interfaces would open up more design options.
FieldType currently has “household” and “batch_number”. “batch_number” would seem to be specific to our operations and not a general concept. Is the purpose of “household” for census records to be able to determine the household boundaries? If so, we would also need “relationship_to_head” since this is used to determine relationships in census records. We might also need “name”, “date”, “place”. Of course, fields often hold only part of a name, date, or place so we would need, “name_part”, “date_part”, “place_part” as well, but that is not sufficient because we would need to know which part “given”, “surname”, etc. RecordField specifies a “QName type.” Perhaps, the “QName” in these cases could be the QNames for NamePartType, DatePartType and PlacePartType?
Currently “Conclusion” looks like this:
public abstract class Conclusion {
String id;
AttributionReference attribution;
…
“AttributionReference” is an “href” to the attribution information for the conclusion. My concern is that, in a typical “person” (or “relationship” or whatever), there will be lots of conclusions and, thus, lots of “AttributionReferences”. Dereferencing all of these is likely to be tedious and slow. Suppose, instead, that we eliminate AttributionReference, and embed the attribution information within the object. Thus, “Conclusion” would look like this:
public abstract class Conclusion implements Contribution {
String id;
String reason;
Confidence confidence;
URI contributor;
java.util.Date timestamp;
Note that “Conclusion” implements “Contribution” which is an interface that could look something like this:
public interface Contribution {
URI getContributor();
void setContributor(URI contributor);
Date getTimestamp();
void setTimestamp(Date timestamp);
Confidence getConfidence();
void setConfidence(Confidence confidence);
String getReason();
void setReason(String reason);
}
The “Contribution” interface could be implemented by all the objects that currently have “AttributionReference” as a member. The 4 parts of a “Contribution”, the “who” (contributor), “when” (timestamp), “why” (reason), and “confidence” could, perhaps, all be attributes so they don’t impinge upon child elements of any given object that implements “Contribution”.
What is the general approach to getting the metadata of a GedcomX entity? I can think of a few options:
I believe the status quo of the model is option 2? There are a couple of things that still bother me about option 2:
First, it has the problem that you can't fetch the metadata without first fetching the entity. Often, you may want to inspect the metadata to decide if you want to fetch the (generally bigger) entity.
Second, there are a number of things that end up being redundant in the metadata with the original entity. As an “alternate representation” (option 3) this doesn’t bother me. But as “more information about this entity” (option 2), it seems wrong.
Let me give you some examples of the redundancy I am talking about:
• SourceReferences. Record and Person have a list of SourceReferences (called "sources"). The WWW version of these entities also has a list of "Links", which could also include them as <link rel="source" ...>. On top of that, has a list of dc:source elements. That's potentially 3 different places for the same information.
• Other Dublin Core "linking" elements that have GedcomX counterparts are: references (relationship.personReference), isReferencedBy (person.relationshipReference), replaces (alternateIds), contributor, isPartOf (record.collection, collection.collection, etc.), hasPart (collection.links.link{rel="content"}), identifier (person.persistentId, person.links.link{rel="self"})
• Other Dublin Core non-linking elements that have GedcomX counterparts are: bibliographicCitation (all entities need this), title (all entities need this), description (collection.description), coverage (collection.coverage, useful for all entities), publisher (collection.publisher), spatial (collection.coverage.spatial), temporal (collection.coverage.temporal)
This redundancy is one of the reasons that in SoRD we embedded Metadata in each entity (the other main reason being that many of the different types of metadata are needed on virtually every request, so that having to make another request to get it is onerous). Embedding the metadata within the response has precedence in both HTTP and HTML. In HTTP, a response consists of the response headers and the response body. The response headers are metadata about the requested entity. Clients can get just the metadata by doing a HEAD request. In HTML, metadata is available inside of the element. Interestingly, there was a need to get just HTML metadata (the stuff inside the element), so the proposed approach was to prepend "WWW-" to the element name and return it as an HTTP response header on a HEAD request. For example, becomes the "WWW-Link:" response header, and <TITLE> becomes the "WWW-Title:" response header. We could potentially do something similar by prepending our own prefix to different metadata elements.
I propose that we either:
or...
bibliographicCitation
title
coverage
contributor
modified
sources
isPartOf
In the earlier models there has always been a characteristic on the event. This is where we put added descriptive elements for the event such as:
These are event specific characteristics, not persona specific - so where do I put them now?
Currently “PersonVitals” looks like this:
public final class PersonVitals {
String id;
Name name;
Gender gender;
Event birth;
Event christening;
Event death;
Event burial;
I know this is what Rontel has asked for, but it seems strange to me for this class to have 2 kinds of birth-like Events and 2 kinds of death-like events. In my opinion, when this object is being used it is most likely that the client is not wanting this level of detail, but only to know “about when and where the person was born, and about when and where he died.” For this purpose, a single “birth-like” Event and a single “death-like” Event is better since, as a client, I only need look in the birth Event for the best known birth information and in the death Event for the best known death information. The Event has an “EventType” to say if this is a “birth” or a “christening” or a “baptism”, or whatever, so that could be displayed to users along with the “when” and “where.” It is also strange that “christening” and “burial” are singled out for special representation in PersonVitals. Suppose the only “death-like” Event we have is “Cremation”. Where does it go? For these reasons, I suggest “christening” and “burial” be removed from PersonVitals:
public final class PersonVitals {
String id;
Name name;
Gender gender;
Event birth;
Event death;
In the org.gedcomx.record.Field class there is the "id" member which is intended to do the following:
It is proposed to update the documentation to reflect that, if there is agreement.
And there is a "fieldId" which is intended to do the following:
It is proposed to rename this "fieldId" to "fieldName" so that it will reflect its true usage and help differentiate from the "id".
Also, update the documentation, if there is agreement.
Consider adding the notion of confidence level to attribution to determine how confident the contributor is with the conclusion.
Some work needs to be done to define the different confidence levels and what each level means.
I find myself going back and forth between QName's and Enums in my code. I end up having to do stuff like this a lot:
XmlQNameEnumUtil.toQName(EventType.birth)
Yes, I know about the "getKnownType()" methods, but those only work when you have the object that has the type embedded. Often, I just have the type. I would like to suggest that each type have a "toQName" method and a static "fromQName" method. These methods would just call XmlQNameEnumUtil's toQName() and fromQName() methods.
I just wanted to ask these questions and be sure we have considered them:
What is the purpose of the JulianDayRange on Date in the Conclusion profile.
Likewise, what is the purpose of the GeoCode in the Conclusion profile?
Are we sure that those same reasons don't exist in the Record profile?
Methods that return a List should never be allowed to return 'null'. I suppose there may be exceptions, but in general this is a nice principle to follow. It allows the user code to be cleaned up a bit. For instance:
Record record = new Record();
// .... Add stuff or even read in the Record from somewhere else where it never needed to define/add 'otherRelationships'
if (record.getOtherRelationships() != null) { // It would be nice to not have to have this 'if' statement.
for (OtherRelationship otherRelationship : record.getOtherRelationships()) {
// ... Do stuff
}
}
Having to surround the 'for' loop in an 'if' statement clutters user code and doesn't need to be there if the method returning the List of OtherRelationships never returned 'null'. This could be handled in the constructor of such objects to do it all in one place, or, if it is desirable to lazily instantiate the List, it could be done in each method returning a List. It would also need to be a requirement of any setter methods.
To indicate what language the fields (and subclasses of field) are in, like 'en' for English 'de' for Deutsch, to aid in translating from one language to another, and to assist in correct spellings of place names and date parts, we need a place to store it.
An example would be a DatePart of "Abr" for month, which is 'pt' for Portuguese, which is valid, but is an invalid month name when in English.
We could store it per-collection, and assume all the records and fields are always specified in that language. But, if there ever is a case where, say the record is in English, but the birthplace is in Deutsch, then that will be insufficient.
Storing it per field feels like overkill.
Any other better places to store it?
Link has a “QName rel;” member. Where is the list of known “rel” types, e.g. “isPartOf”?
Currently, there is a "getKnownRelationshipType()" method on Relationship and "OtherRelationship" has a "getType() method which returns a QName. However, ParentChildRelationship and CoupleRelationship have no "getType()" method. When processing Relationships generically it would be helpful to have one. The "getKnownRelationship()" method won't serve when you need to know the different between "other" relationships with different QNames--they all map to "other". It is possible to get the QName from a CoupleRelationship, or ParentChildRelationship via XmlQNameEnumUtil.toQName(getKnownRelationshipType()) method, but this forces you to write code like this:
public static QName getRelationshipType(Relationship relationship) {
if (relationship instanceOf OtherRelationship) {
return ((OtherRelationship) relationship).getType();
}
else {
return XmlQNameEnumUtil.toQName(relationship.getKnownRelationshipType());
}
}
If there existed a "getType()" method on relationship, then the code would look like this:
relationship.getType();
I believe the "description" in Record-profile Event should be removed. I believe it is a vestige of the old way of doing things for "other" type events.
The “QName role” in RelationshipReference is “denormalized” information, right? The role is defined in the Relationship, so storing it in the reference is redundant and opens the possibility of it not being in agreement with the relationship. If we are putting denormalized information in the RelationshipReference then we ought to be explicit about why we are doing it. Is it to help the client know which relationships to dereference? For example, suppose we want to identify all the children of a person, we would only need to dereference the RelationshipReferences where the role is “Parent”. Is this the reason for the role on the reference?
If we are putting this kind of denormalized information on a person we need to be clear how we plan to keep everything self-consistent. My suggestion would be that we don’t allow Relationships to be created without two Persons and a “type”, and that these three pieces of information are immutable. (I know this flies in the face of Rontel’s opinions on the subject). This gets into all the life-cycle questions between Person and Relationship that need to be clearly defined and documented.
Though we have not yet been very explicit about this, I believe that the current state of the model implies the following:
• When a Relationship is created or deleted, the Person’s involved are “modified” with the addition/deletion of a “RelationshipReference”. While I personally agree with this, I believe it precludes the “git-like” editing model that John Sumsion et. al. are wanting, since their model requires a “directed-acyclic-graph”. It may be worth exploring their ideas some more before we shut the door on them, I don’t know. We would also need to explore what it would mean to the model if Persons didn’t have any explicit knowledge of the Relationships they participate in—or in other words, what if Persons were NOT modified when a Relationship is created that refers to them?
• When an Event or Characteristic is added to, or removed from, a Relationship, the Persons involved are NOT modified. (This seems rather unfortunate, from my point of view, since someone “watching” a Person would probably consider a new “marriage date” to be the kind of thing they would like to be notified of. I believe this is one of the side-effects of having Relationships as entities and now opens up the need for users to explicitly “watch” Relationships as well as Persons. To me, it would seem much more simple and natural for a user to “watch” a Person and be notified of any change to any Relationship that the Person participates in.)
• When a Person is deleted, the Relationships the Person is involved in are also deleted.
• When a Person is merged with another Person, what happens to the Relationships? (Is the “uniqueness constraint” part of the model? If so, then some merging of Relationships would seem to be implied. If not, then systems are free to do this differently. Unfortunately, this becomes an impediment to interoperability since one system may allow multiple relationships of the same type between the same two people, and another may insist upon uniqueness. My opinion is that this is the type of thing that a well-defined model is supposed to guard against and ought to be clearly specified.)
What is the general caching mechanism for our entities? Will we be supporting HTTP’s “If-Modified-Since: date” header? If so, should GedcomX web services populate the “Last-Modified” HTTP response header or should all our entities have a “modified date” attribute (or both)? Or will we be using the “Version” http response header in conjunction with HTTP “HEAD” support?
There are two options for modeling a genealogical characteristic:
Neither option is wrong nor right; it's mostly a matter of preference. Currently, option 1 has been selected to move forward. We need to gather majority opinion across industry on the matter.
In the Record profile, OtherRelationship has a "QName type" and a "String description." In the Conclusion profile, OtherRelationship has neither of these. It seems to me that it ought to have a "QName type". Also, is the "String description" in the Record model a vestige of the old way of doing things? We used to have a description for all "other" enum values, but it seems we have discontinued this practice in favor of the "QName" approach. Is this right?
Date and place aren't used in relationship characteristics. Should we use a different class on relationship to model relationship characteristics?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.