familysearch / gedcom5-java Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 41.0 1.11 MB

Gedcom parsers

License: Apache License 2.0

Java 100.00%

gedcom5-java's People

Contributors

Stargazers

Watchers

Forkers

fkjellberg slouie-youwho matburnham pgmoir bobfields keiichiroy davidmoten mydesignbuddy rwellscs jonmcrawford rmburkhead tropology draakusa huntersj fcoffey apokalipsys michelesalvador jbonevich ktp-forked-repos dmaude bland328 daleathan icorecool fbrissi ourrootsorg clarkegj nodje suninforest mehdiksouri twcardenas domseichter raydecampo vtajzich bremet15 isabella232 mtrevisan ciri-cuervo mrlem nickolaymo sternbach-software seanpm2001

gedcom5-java's Issues

Translation of EventFact display types

I'm developing the Family Gem app, that is strongly based on this amazing Gedcom parser.
Family Gem at the moment is translated in 10 languages and growing.

Obviously a core feature of Family Gem is to add events and facts to persons and families.
The "display type" of each event/fact is taken from EventFact.java (e.g. "Birth" is the display type for BIRT).
These display types are hard-coded in English, and unfortunately no translation system is provided by the Gedcom parser.

Of course inside Family Gem I could filter the English display type and replace it with a translation..
But wouldn't it be great if a translation of display types was provided inside Gedcom, available for all the users of the library?

PATCH: Example tool to create GraphViz (dot) graph from GEDCOM model

Hi,

Please find attached a patch, which adds a simple tool to create a GraphViz (dot) file from a GEDCOM model to visualize a family tree starting from a root person.

Usage:

List all persons in the model to get a root ID
mvn exec:java -Dexec.mainClass=org.folg.gedcom.tools.Gedcom2Dot -Dexec.args="-i src/test/resources/Muster_GEDCOM_UTF-8.ged --list"
Create a DOT file
mvn exec:java -Dexec.mainClass=org.folg.gedcom.tools.Gedcom2Dot -Dexec.args="-i src/test/resources/Muster_GEDCOM_UTF-8.ged -o example.dot -r I1"
Visualize using GraphViz
dot -Tpdf -o example.pdf example.dot

See attached patch and examples:
Gedcom2Dot.patch.txt
example.pdf
example.dot.txt

I think this a useful tool and showcase. Let me know of feedback or if you want to provide a pull request.

Hosting this JAR on Maven Central

It would be nice if I could import this jar from Maven Central or MVN Repository, without having to manually build every update, as well as being able to keep tabs on the newest version. It would also be nice to have the sources by default.
@michelesalvador's FamilyGem app has a source-less local JAR.

Import from gramps sqlite

https://gramps-project.org/blog/

How to include source DATA in the object model

According to the GEDCOM 5.5.1 standard a source can have a subordinate DATA tag storing additional informations:

n @<XREF:SOUR>@ SOUR			{1:1}
+1 DATA					{0:1}
+2 EVEN <EVENTS_RECORDED>		{0:M}
+3 DATE <DATE_PERIOD>			{0:1}
+3 PLAC <SOURCE_JURISDICTION_PLACE>	{0:1}
+2 AGNC <RESPONSIBLE_AGENCY>		{0:1}
+2 <<NOTE_STRUCTURE>>			{0:M}
...

At the moment this DATA tag and its subordinates are not included in the object model.
I'd like to submit a pull request to fix.

I have created a SourceData class and a DataEvent class.
But now I have a doubt: in SourceData is it better to define a single DataEvent or a List<DataEvent>?

In the real world of GEDCOMs is there usually only one EVEN under DATA?
Or is it actually common to have multiple EVEN, in accordance to the standard?

Support InputStream and Reader parse on all parsers

I am working with GEDCOMs stored as S3 objects, and I want to work with content as streams or readers (i.e. S3.getObject().getObjectContent()). It would be nice to have GedcomParser.parse(InputStream) or parse(Reader), and corresponding methods on ModelParser, TreeParser, and JsonParser.

Is there a field meta or notes ?

Hi, I'm brand new to this, and I search for the ability to export/import metainfo or a notes field.
Is there already something like this, or how to add it ?

Many Submitters or only one?

If my comprehension is correct, in a Gedcom file as many SUBM records can be used, not only one: http://wiki-en.genealogy.net/GEDCOM/SUBM-Tag#Use_of_SUBM_Records

So this Gedcom structure should be valid:

0 HEAD
1 SUBM @U2@
...
0 @U1@ SUBM
1 NAME Less Important Submitter
0 @U2@ SUBM
1 NAME Submitter Linked in Head

But the Gedcom parser seems to assume that only one submitter exists:

the deprecated method Header.getSubmitter() always returns null
Gedcom.getSubmitter() always returns the first submitter

In my opinion Header.getSubmitter() should return the submitter pointed by the SUBM tag in HEAD,
and Gedcom.getSubmitter() should be replaced by a Gedcom.getSubmitters() returning a List<Submitter>.

Info Log when adding a GedcomTag

Hello,

I was recently testing parsing a large gedcom that was generated by the Legacy software. While parsing I noticed that the reading of the gedcom took a long time to parse because there was an info log if there was a tag added as extension in ModelParser.addGedcomTag. (The tagging feature in Legacy seems to add a GedcomTag like this: _TAG1)

 private void addGedcomTag(ExtensionContainer ec, GedcomTag tag) throws SAXException {
      @SuppressWarnings("unchecked")
      List<GedcomTag> moreTags = (List<GedcomTag>)ec.getExtension(MORE_TAGS_EXTENSION_KEY);
      if (moreTags == null) {
         moreTags = new ArrayList<GedcomTag>();
         ec.putExtension(MORE_TAGS_EXTENSION_KEY, moreTags);
      }
      moreTags.add(tag);
      warning(new SAXParseException("Tag added as extension: "+joinTagStack()+" "+tag.getTag(), locator));
   }

I tested locally and moved that to a trace log so that it wouldn't appear when I didn't need it to, it improved performance greatly.

I was just curious as to why this was being logged as at the info level? Or would a lower log level be a better option? I do not see the information as being to helpful from the type of data being added as an extension.

Renaming this Repository

In preparation for the release of GEDCOM Version 7 we would like to use the FamilySearch/gedcom repository which is now being used for code relating to parsing GEDCOM. I have the names and emails of all the contributors so they can be notified of the change 30 days before it happens. Jimmy Z. suggested naming it GEDCOM5-java so that GEDCOM is available for a public version of the next version of GEDCOM and beyond. Please share thoughts.

Cause is EventFact instead of String

I can't understand why EventFact.caus is of class EventFact instead of String.

In GEDCOM 5.5 Cause is defined as

EVENT_STRUCTURE
n  <EVENT_TAG>			{1:1}
...
+1 CAUS <CAUSE_OF_DEATH>	{0:1}

And in GEDCOM 5.5.1

EVENT_DETAIL
...
n CAUS <CAUSE_OF_EVENT> 	{0:1}

Both <CAUSE_OF_DEATH> and <CAUSE_OF_EVENT> are simple values of {Size=1:90}.
No trace of another event nested inside.

Should EventFact.caus be of class String?

The logic behind getDisplayType()

Current behavior of EventFact.getDisplayType() is to return a display type obtained from the tag of the event.

0 @F1@ FAM
1 MARR

The tag MARR produces the display type "Marriage".
So far, so good.

Things change when a TYPE is defined. Take for example:

0 @F1@ FAM
1 MARR
2 TYPE Common Law

In this case the display type of Marriage is "Other".

But the GEDCOM standard (5.5.1 more than 5.5) clearly suggests another use for TYPE value:

{Size=1:90}
A descriptive word or phrase used to further classify the parent event or attribute tag. [...]
Using the subordinate TYPE tag classification method with any of the other defined event tags
provides a further classification of the parent tag but does not change the basic meaning of the parent tag. For example, a MARR tag could be subordinated with a TYPE tag with an EVENT_DESCRIPTOR value of `Common Law.'
1 MARR
2 TYPE Common Law
This classifies the entry as a common law marriage but the event is still a marriage event.

So, for my comprehension, the result of getDisplayType() should be yet "Marriage", or maybe something like "Marriage (Common Law)", but not "Other".

At last, if the value of TYPE is present among personal or family event fact tags, the correspondent display type is returned:

0 @F1@ FAM
1 MARR
2 TYPE CLAW

The display type of this Marriage event will be "Common law marriage".
Even if this example can appear correct, in GEDCOM standard I found no trace that the TYPE value can be a tag that completely replaces the parent tag.

Overcome Gedcom.getSubmitter()

Gedcom.getSubmitter() always returns the first submitter, and has now been replaced by Header.getSubmitter(Gedcom), that returns the correct submitter, which is not necessarily the first one.

I think there is still something to do about the task of multiple submitters:

Method Gedcom.getSubmitter() should be removed, or at least deprecated.
In Header there are a couple of comments recommending to use Gedcom.getSubmitter, and they should be removed.

Please could you release this to maven central?

It would be nice to be able to grab this as a dependency from mc.

Parsing individual title

Take a GEDCOM with this individual:

0 @I1@ INDI
1 NAME Victoria /Hanover/
1 TITL Queen of England

ModelParser.handleTitl(Object) puts the TITL record within names adding the user-defined _type key:

{
  "id": "I1",
  "names": [
    {
      "value": "Victoria /Hanover/"
    },
    {
      "_type": "TITL",
      "value": "Queen of England"
    }
  ]
}

This is strange, because in GEDCOM 5.5 specifications TITL is a standard attribute of INDI (like for example OCCU or PROP):

INDIVIDUAL:=
n  <<NAME_STRUCTURE>>		{1:M}
n  TITL <INDI_TITLE>		{0:M}
n  ...

And also in GEDCOM 5.5.1 TITL is located among other standard attributes:

INDIVIDUAL_ATTRIBUTE_STRUCTURE:=
  [
  n ...
  |
  n TITL <NOBILITY_TYPE_TITLE>		{1:1}
    +1 <<INDIVIDUAL_EVENT_DETAIL>>	{0:1}
  ]

Wouldn't be more correct to parse the TITL record as a simple EventFact? Like this:

{
  "id": "I1",
  "names": [
    {
      "value": "Victoria /Hanover/"
    }
  ]
  "eventsFacts": [
    {
      "tag": "TITL",
      "value": "Queen of England"
    }
  ]
}

getSubmitters() can return null

I tried version 1.9.0 and I encountered one problem.
With this version it's possible to parse a Json file only if it contains a subms key. Therefore Json files created with version 1.8.0 can no longer be opened because they don't have a subms key.
This is due to Gedcom.createIndexes() that calls Gedcom.getSubmitters() that can return a null subms object.
The same happens parsing a GEDCOM file: Gedcom.createIndexes() works only if the GEDCOM contains at least one SUBM tag.