sul-dlss-deprecated / discovery-indexer Goto Github PK
View Code? Open in Web Editor NEWgem: manages the core operations for the discovery indexing such as reading PURL xml, mapping to the solr document, and writing to solr core.
License: Other
gem: manages the core operations for the discovery indexing such as reading PURL xml, mapping to the solr document, and writing to solr core.
License: Other
https://gist.github.com/mejackreed/68755eb862f279f660b33d1028ca09f8
Per @mejackreed:
Seems that jobs are processing just fine the first time around, but will often get stuck in the mud after they fail multiple times. Most of the failures seem to be issues like DiscoveryIndexer::Errors::MissingRDF
on the indexing side
Analysis of one object in prod shows an ETD with RELS-EXT as follows:
<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="info:fedora/druid:bg468jw7546"></rdf:Description> </rdf:RDF>
However, in the -test environment this is missing altogether. Seems to be from a batch of bad data from 2012.
(All of this is documented in the comments of issue sul-dlss/argo/issues/910 )
The user facing problem is: collection items released to SearchWorks have an incorrect collection title displayed.
Per @jkeck, the SearchWorks Solr index field is collection_with_title
.
Per code path info below, the actual incorrect code base is
# @return [Hash] the collection data as { title: 'coll title', ckey: catkey'}
def collection_info
return {} unless purl_model
@info = {}
@info = { title: purl_model.label, ckey: purl_model.catkey } if @info.empty?
end
So it seems to be getting the title from something it refers to as ?? collection.label ??
# @return objectLabel value from the DOR identity_metadata, or nil if there is no barcode
def parse_label
get_value(@purlxml_ng_doc.xpath('/publicObject/identityMetadata/objectLabel'))
end
which is not populated from mods, but from identityMetadata, and does NOT get updated when the collection record MODS is updated.
So the correct thing would be to populate the title at sul-dlss/discovery-indexer/blob/master/lib/discovery-indexer/collection.rb#L22-L27 from MODS, using stanford_mods method sw_display_title
https://github.com/sul-dlss/stanford-mods/blob/master/lib/stanford-mods/searchworks.rb#L187-L195
would be my suggestion (or sw_short_title
https://github.com/sul-dlss/stanford-mods/blob/master/lib/stanford-mods/searchworks.rb#L127-L130)
The MODS is now available in the public xml on purl so no additional network call would be required to retrieve the MODS.
AFAICT, this potentially affects SearchWorks, Revs, and possibly Spotlight. I can't imagine the existing collection title would be preferred to this fix, which would supply the actual collection title.
comes out of sul-dlss-deprecated/discovery-indexing#11
NOTE: requires a full re-publish of all images objects first
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.