familysearch / gedcom5-java Goto Github PK
View Code? Open in Web Editor NEWGedcom parsers
License: Apache License 2.0
Gedcom parsers
License: Apache License 2.0
I'm developing the Family Gem app, that is strongly based on this amazing Gedcom parser.
Family Gem at the moment is translated in 10 languages and growing.
Obviously a core feature of Family Gem is to add events and facts to persons and families.
The "display type" of each event/fact is taken from EventFact.java (e.g. "Birth" is the display type for BIRT).
These display types are hard-coded in English, and unfortunately no translation system is provided by the Gedcom parser.
Of course inside Family Gem I could filter the English display type and replace it with a translation..
But wouldn't it be great if a translation of display types was provided inside Gedcom, available for all the users of the library?
Hi,
Please find attached a patch, which adds a simple tool to create a GraphViz (dot) file from a GEDCOM model to visualize a family tree starting from a root person.
Usage:
List all persons in the model to get a root ID
mvn exec:java -Dexec.mainClass=org.folg.gedcom.tools.Gedcom2Dot -Dexec.args="-i src/test/resources/Muster_GEDCOM_UTF-8.ged --list"
Create a DOT file
mvn exec:java -Dexec.mainClass=org.folg.gedcom.tools.Gedcom2Dot -Dexec.args="-i src/test/resources/Muster_GEDCOM_UTF-8.ged -o example.dot -r I1"
Visualize using GraphViz
dot -Tpdf -o example.pdf example.dot
See attached patch and examples:
Gedcom2Dot.patch.txt
example.pdf
example.dot.txt
I think this a useful tool and showcase. Let me know of feedback or if you want to provide a pull request.
It would be nice if I could import this jar from Maven Central or MVN Repository, without having to manually build every update, as well as being able to keep tabs on the newest version. It would also be nice to have the sources by default.
@michelesalvador's FamilyGem app has a source-less local JAR.
According to the GEDCOM 5.5.1 standard a source can have a subordinate DATA tag storing additional informations:
n @<XREF:SOUR>@ SOUR {1:1}
+1 DATA {0:1}
+2 EVEN <EVENTS_RECORDED> {0:M}
+3 DATE <DATE_PERIOD> {0:1}
+3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1}
+2 AGNC <RESPONSIBLE_AGENCY> {0:1}
+2 <<NOTE_STRUCTURE>> {0:M}
...
At the moment this DATA tag and its subordinates are not included in the object model.
I'd like to submit a pull request to fix.
I have created a SourceData
class and a DataEvent
class.
But now I have a doubt: in SourceData
is it better to define a single DataEvent
or a List<DataEvent>
?
In the real world of GEDCOMs is there usually only one EVEN under DATA?
Or is it actually common to have multiple EVEN, in accordance to the standard?
I am working with GEDCOMs stored as S3 objects, and I want to work with content as streams or readers (i.e. S3.getObject().getObjectContent()). It would be nice to have GedcomParser.parse(InputStream) or parse(Reader), and corresponding methods on ModelParser, TreeParser, and JsonParser.
Hi, I'm brand new to this, and I search for the ability to export/import metainfo or a notes field.
Is there already something like this, or how to add it ?
If my comprehension is correct, in a Gedcom file as many SUBM records can be used, not only one: http://wiki-en.genealogy.net/GEDCOM/SUBM-Tag#Use_of_SUBM_Records
So this Gedcom structure should be valid:
0 HEAD
1 SUBM @U2@
...
0 @U1@ SUBM
1 NAME Less Important Submitter
0 @U2@ SUBM
1 NAME Submitter Linked in Head
But the Gedcom parser seems to assume that only one submitter exists:
Header.getSubmitter()
always returns null
Gedcom.getSubmitter()
always returns the first submitterIn my opinion Header.getSubmitter()
should return the submitter pointed by the SUBM
tag in HEAD
,
and Gedcom.getSubmitter()
should be replaced by a Gedcom.getSubmitters()
returning a List<Submitter>
.
Hello,
I was recently testing parsing a large gedcom that was generated by the Legacy software. While parsing I noticed that the reading of the gedcom took a long time to parse because there was an info log if there was a tag added as extension in ModelParser.addGedcomTag
. (The tagging feature in Legacy seems to add a GedcomTag like this: _TAG1
)
private void addGedcomTag(ExtensionContainer ec, GedcomTag tag) throws SAXException {
@SuppressWarnings("unchecked")
List<GedcomTag> moreTags = (List<GedcomTag>)ec.getExtension(MORE_TAGS_EXTENSION_KEY);
if (moreTags == null) {
moreTags = new ArrayList<GedcomTag>();
ec.putExtension(MORE_TAGS_EXTENSION_KEY, moreTags);
}
moreTags.add(tag);
warning(new SAXParseException("Tag added as extension: "+joinTagStack()+" "+tag.getTag(), locator));
}
I tested locally and moved that to a trace log so that it wouldn't appear when I didn't need it to, it improved performance greatly.
I was just curious as to why this was being logged as at the info level? Or would a lower log level be a better option? I do not see the information as being to helpful from the type of data being added as an extension.
In preparation for the release of GEDCOM Version 7 we would like to use the FamilySearch/gedcom repository which is now being used for code relating to parsing GEDCOM. I have the names and emails of all the contributors so they can be notified of the change 30 days before it happens. Jimmy Z. suggested naming it GEDCOM5-java so that GEDCOM is available for a public version of the next version of GEDCOM and beyond. Please share thoughts.
I can't understand why EventFact.caus
is of class EventFact
instead of String
.
In GEDCOM 5.5 Cause is defined as
EVENT_STRUCTURE
n <EVENT_TAG> {1:1}
...
+1 CAUS <CAUSE_OF_DEATH> {0:1}
And in GEDCOM 5.5.1
EVENT_DETAIL
...
n CAUS <CAUSE_OF_EVENT> {0:1}
Both <CAUSE_OF_DEATH>
and <CAUSE_OF_EVENT>
are simple values of {Size=1:90}
.
No trace of another event nested inside.
Should EventFact.caus
be of class String
?
Current behavior of EventFact.getDisplayType()
is to return a display type obtained from the tag of the event.
0 @F1@ FAM
1 MARR
The tag MARR
produces the display type "Marriage".
So far, so good.
Things change when a TYPE
is defined. Take for example:
0 @F1@ FAM
1 MARR
2 TYPE Common Law
In this case the display type of Marriage is "Other".
But the GEDCOM standard (5.5.1 more than 5.5) clearly suggests another use for TYPE
value:
{Size=1:90}
A descriptive word or phrase used to further classify the parent event or attribute tag. [...]
Using the subordinate TYPE tag classification method with any of the other defined event tags
provides a further classification of the parent tag but does not change the basic meaning of the parent tag. For example, a MARR tag could be subordinated with a TYPE tag with an EVENT_DESCRIPTOR value of `Common Law.'
1 MARR
2 TYPE Common Law
This classifies the entry as a common law marriage but the event is still a marriage event.
So, for my comprehension, the result of getDisplayType()
should be yet "Marriage", or maybe something like "Marriage (Common Law)", but not "Other".
At last, if the value of TYPE
is present among personal or family event fact tags, the correspondent display type is returned:
0 @F1@ FAM
1 MARR
2 TYPE CLAW
The display type of this Marriage event will be "Common law marriage".
Even if this example can appear correct, in GEDCOM standard I found no trace that the TYPE
value can be a tag that completely replaces the parent tag.
Gedcom.getSubmitter()
always returns the first submitter, and has now been replaced by Header.getSubmitter(Gedcom)
, that returns the correct submitter, which is not necessarily the first one.
I think there is still something to do about the task of multiple submitters:
Gedcom.getSubmitter()
should be removed, or at least deprecated.Header
there are a couple of comments recommending to use Gedcom.getSubmitter
, and they should be removed.It would be nice to be able to grab this as a dependency from mc.
Take a GEDCOM with this individual:
0 @I1@ INDI
1 NAME Victoria /Hanover/
1 TITL Queen of England
ModelParser.handleTitl(Object)
puts the TITL
record within names
adding the user-defined _type
key:
{
"id": "I1",
"names": [
{
"value": "Victoria /Hanover/"
},
{
"_type": "TITL",
"value": "Queen of England"
}
]
}
This is strange, because in GEDCOM 5.5 specifications TITL
is a standard attribute of INDI
(like for example OCCU
or PROP
):
INDIVIDUAL:=
n <<NAME_STRUCTURE>> {1:M}
n TITL <INDI_TITLE> {0:M}
n ...
And also in GEDCOM 5.5.1 TITL
is located among other standard attributes:
INDIVIDUAL_ATTRIBUTE_STRUCTURE:=
[
n ...
|
n TITL <NOBILITY_TYPE_TITLE> {1:1}
+1 <<INDIVIDUAL_EVENT_DETAIL>> {0:1}
]
Wouldn't be more correct to parse the TITL
record as a simple EventFact
? Like this:
{
"id": "I1",
"names": [
{
"value": "Victoria /Hanover/"
}
]
"eventsFacts": [
{
"tag": "TITL",
"value": "Queen of England"
}
]
}
I tried version 1.9.0 and I encountered one problem.
With this version it's possible to parse a Json file only if it contains a subms
key. Therefore Json files created with version 1.8.0 can no longer be opened because they don't have a subms
key.
This is due to Gedcom.createIndexes()
that calls Gedcom.getSubmitters()
that can return a null subms
object.
The same happens parsing a GEDCOM file: Gedcom.createIndexes()
works only if the GEDCOM contains at least one SUBM
tag.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.