imamuseum / linkedart Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 4.0 1.47 MB

Transforming IMA objects, creators, and exhibitions data to the Linked Art data model.

Home Page: https://linked.art

XSLT 91.56% Shell 8.44%

linkedart's People

Contributors

Stargazers

Watchers

Forkers

kerameikos tgra flapka paulwang1905

linkedart's Issues

Acquisition date clean-up and modeling

Need to review Accession Dates (TitAccessionDate) in EMu for consistency in date formatting.

This field is the best option for assigning a date to the acquisition event (E8_Acquisition), though it is not always populated. Regardless of how the date is entered into EMu, need to transform for appropriate formatting for dates in CRM.

Quotations in strings

Some fields contain quotations as catalogued in EMu - need to replace these to avoid invalid JSON output. Not addressed in current transformation, so needs rework.

collection.imamuseum.org IDs/URLs

Dagwood IDs are different than EMu IRNs, and are not currently cataloged in EMu. This makes it difficult to automate creation of Homepage pattern.

Need to work with Registration to determine an appropriate place in EMu to document either Dagwood IDs or full collection.imamuseum.org URLs.

Thoughts:

Link Catalogue records (BibReference_tab) to Bibliography records (BibRecordType = Web Site) representing each object URL (WebIdentifier)
For objects with images, does it make sense to also represent image URLs in this same way? Another option for capturing these in EMu could be the Image Reference table in the References tab (RefImageType_tab + RefImageReference_tab) - would need to add an Image Reference Type to the thesaurus for this use case.

"brief text" as metatype

Per Linked Art Issue #296, "brief text" should be modeled as a metatype of the linguistic object's main type.

Creation Locations

Creation Locations are catalogued in a table in EMu with the following headers:

Country
Province/State/Territory
District/County/Shire
City/Town

Data is entered into these fields very inconsistently, particularly where the information doesn't fit cleanly into these categories. Significant clean-up needs to be undertaken, in collaboration with registrars. All values in these fields are controlled by a look-up list, so proliferation of bad values continues. Clean-up will need to include clearing look-up list values.

Not modeling this information for now, but hope to undertake in the future.

After/Following creators

How should After/Followers be handled? Currently ignoring creators with such qualifiers.
EDIT: See list of all qualifiers present in IMA data two comments below.

Related to this, we don't catalog Workshops of (or related terms) as individual Parties records, but use the qualifier "Workshop of" with actual record for artist (e.g. Titian). Don't have an easy way to assign a unique identifier to these groups in absence of individual record. These are also being ignored by current mapping, due to presence of the qualifier.

Overall Dimensions vs. Dimensions of Parts

EMu contains a table for dimensions with headers:

Type, Height, Width, Depth, Diameter, Unit (Length), Weight, Unit (Weight, Dimension Notes

The Type value can be used as an indicator for whether the dimensions on that line of the table can be attributed to a part of an object (one of few areas where our data lends itself to partitioning), or if the dimensions captured on that line refer to the object as a whole. Type is a relatively controlled list, with key terms. These values need to be evaluated for whether they indicate part or whole dimensions.

Portfolio and Series Titles - partition?

Portfolio Title and Series Title are captured in single text fields on object records. Current mapping to Linked Art reflects this, transforming values to P1_is_identified_by statements with E55_Type "series titles" and "titles". Side note: not a fitting AAT vocab word for "portfolio titles".

Would it make more senses extrapolate parent record information from these title fields? i.e., create separate MMO records of type "portfolios (groups of works)" - http://vocab.getty.edu/aat/300179434 - and "series (object groupings)" - http://vocab.getty.edu/aat/300027349, with title information (and little else).

Issue, this may cause conflict in instances where portfolios/series are catalogued as blanket records with individual works as parts. To avoid creating duplicate MMOs with conflicting URIs (Note: since there aren't IRNs to associate with these inferred series/portfolios, the URIs would need to be something like data.discovernewfields.org/series/[lowercase-series-name]), I could set a conditional transformation that only if an object with a series or portfolio titles is NOT flagged as a part record, then create the inferred MMO pattern in the JSON.

Need to think through this more and also ask Editorial Board about how series/portfolio titles relates to partitioning.

Parse Production Events for multiple Creators

Activities cannot be carried_out_by multiple Actors. Instead, the overarching Production event is broken into multiple sub-Production events (parts_of), each with their own Actors. Need to rework in the transformation file.

Mark Description - AAT Mapping

In EMu, we have a field titled "Mark Description" (backend name = CrePrimaryInscriptions). It seems to be used to capture notes about marks, inscriptions, and signatures on works.

There are individual AAT terms for each of those three items, but it would be difficult to determine in which way the field is being used at a given time. How should we type this Linguistic Object?

Thoughts: there may be some consistency in terminology used in the field that could be used as a flag while generating the JSON-LD. For example, the field sometimes started with "Signed: " - could this indicate the that contents are ALWAYS of type signature?

IMA Locations Clean-up and Transformation

Review Locations Levels 1-3 and clean as needed. Order should be:

Level 1 = IMA
Level 2 = Name of gallery (if applicable - contains "galler" or "suite"); other location info.
Level 3 = Gallery code (if gallery), otherwise more specific info.

Level 2 (code) values to represent in transformation:

On Loan
see related parts
Art Study Room (S90)
Westerley (name of room)
Efroysom Family Entrance Pavilion (F02)
The Virgina B. Fairbanks Art & Nature Park

Blank nodes

Blank nodes are allowable where for attributes that are not themselves entities. This includes:

Timespans
Dimensions
Values/Monetary Amounts
etc.

Essentially, if an element would not need to be referenced by multiple sources, it does not need a dereference-able URI.

Need to update the transformation logic, tracking spreadsheet to no longer create URIs for:

Dimensions
Production Timespans
Check for other elements that don't need URIs.

Partitioning Medium and Support - data consistency

In EMu, Medium and Support are catalogued in two tables, and outputs in XML as:

<table name="Medium">
  <tuple>
    <atom name="PhyMedium">fabric</atom>
  </tuple>
  <tuple>
    <atom name="PhyMedium">plastic</atom>
  </tuple>
</table>
<table name="Support">
  <tuple>
    <atom name="PhySupport">structural foam</atom>
  </tuple>
</table>

Unfortunately, data is inconsistent. For example:

<table name="Medium">
  <tuple>
    <atom name="PhyMedium">paper</atom>
  </tuple>
</table>

Paper should be listed as a Support, not Medium

<table name="Medium">
  <tuple>
    <atom name="PhyMedium">plastic</atom>
  </tuple>
  <tuple>
    <atom name="PhyMedium">paint</atom>
  </tuple>
</table>
<table name="Support">
  <tuple>
    <atom name="PhySupport">steel</atom>
  </tuple>
  <tuple>
    <atom name="PhySupport">plastic</atom>
  </tuple>
  <tuple>
    <atom name="PhySupport">paint</atom>
  </tuple>
</table>

plastic and paint are repeated under Support, also listed in Medium

How will incorrect data affect partitioning of Support and Medium?

If too complicated, can simplify represent Medium(s) and Support(s) with made-of syntax. If we go this route though, will need to avoid duplication of made-of statements when values are repeated either within a single table or across both tables (e.g., XML example directly above).

Identifier Types

Since we publish non-accessioned items (e.g., long-term loans to the PC) online, we will need LA JSON-LD representation. Current transformation applies the "Accession Number" type to all values coming from TitObjectID - need to build in the logic for nuances when the item is NOT accessioned.

Related to this, whatever identifier type we go with for this, should probably also be applied to Previous Accession Number (the field name is a bit of a misnomer).

Adjust classification mapping logic

Per the standard list of vocabulary terms to me used in LA, there are established categories of artworks and other Human-Made_Objects that are recommended for use.

Current IMA mapping assigns classifications of "artwork" for all records + IMA thesaurus terms based on PhyMediaCategory_tab. This is incorrect. Only artworks should be considered artworks (e.g., design collection and similar artifacts should not receive this type.

Solution: use TitObjectType for major categories (e.g., "Visual Works: Paintings"). Where there is not a clear mapping to our Record Types, a broad AAT category won't be available, just the IMA thesaurus term.

Long Term Loans

Long term loans to the permanent collection ARE published online, but are NOT owned by IMA. Add in logic to the owner pattern to not make the statement when the Legal Status is not "Accessioned."

Add Type of Type pattern where appropriate

Example for object classifications:

object/1 -> classified_as -> Type -> classified_as -> "object type" URI

Creator Types

Currently all creators (whether individuals, organizations, collaborations, or cultures) are represented as Type: Actor in the "carried_out_by" statements of object production events.

Should this be modified? For example, should cultures be typed as "Group"?

For EMu party records, I can pull the Party record type to identify Person vs. Organization. Is Collaboration a type?

Previous Accession Number(s)

Now that I have answers from Registration re: allowable values in TitPreviousAccessionNo, need to rework the transformation to address all possible values.

Notes:

Clean-up complete and ingested into EMu
Field contains non-IMA identifiers, but we will only represent IMA values in Linked Art data
Single text field, but when there are multiple previous accession numbers, they are delimited by " | "
Ignore "No TR Number"
Possible IMA identifier starters: TR (temp), U (Unknown), NON-ART (non-art), S (Study), E (Eiteljorg)

Research:

Is there a more fitting AAT classification for retired identifiers than general "identification numbers"?

Representing Department membership from Object

Per the LA model, objects associated with a specific department in a museum are represented as aggregations within a set. How would the "member_of" status be represented from each object's JSON-LD? Base on the context file, "member_of" isn't available in CIDOC-CRM?