Coder Social home page Coder Social logo

sul-dlss-deprecated / discovery-indexer Goto Github PK

View Code? Open in Web Editor NEW
0.0 14.0 1.0 183 KB

gem: manages the core operations for the discovery indexing such as reading PURL xml, mapping to the solr document, and writing to solr core.

License: Other

Ruby 100.00%
gem

discovery-indexer's Issues

collection_info needs to get collection's title from mods title data

(All of this is documented in the comments of issue sul-dlss/argo/issues/910 )

The user facing problem is: collection items released to SearchWorks have an incorrect collection title displayed.

Per @jkeck, the SearchWorks Solr index field is collection_with_title.

Per code path info below, the actual incorrect code base is

https://github.com/sul-dlss/discovery-indexer/blob/master/lib/discovery-indexer/collection.rb#L22-L27

    # @return [Hash] the collection data as { title: 'coll title', ckey: catkey'}
    def collection_info
      return {} unless purl_model
      @info = {}
      @info = { title: purl_model.label, ckey: purl_model.catkey } if @info.empty?
    end

So it seems to be getting the title from something it refers to as ?? collection.label ??

https://github.com/sul-dlss/discovery-indexer/blob/master/lib/discovery-indexer/reader/purlxml_parser_strict.rb#L223-L226

     # @return objectLabel value from the DOR identity_metadata, or nil if there is no barcode
     def parse_label
       get_value(@purlxml_ng_doc.xpath('/publicObject/identityMetadata/objectLabel'))
    end

which is not populated from mods, but from identityMetadata, and does NOT get updated when the collection record MODS is updated.

So the correct thing would be to populate the title at sul-dlss/discovery-indexer/blob/master/lib/discovery-indexer/collection.rb#L22-L27 from MODS, using stanford_mods method sw_display_title

https://github.com/sul-dlss/stanford-mods/blob/master/lib/stanford-mods/searchworks.rb#L187-L195

would be my suggestion (or sw_short_title https://github.com/sul-dlss/stanford-mods/blob/master/lib/stanford-mods/searchworks.rb#L127-L130)

The MODS is now available in the public xml on purl so no additional network call would be required to retrieve the MODS.

AFAICT, this potentially affects SearchWorks, Revs, and possibly Spotlight. I can't imagine the existing collection title would be preferred to this fix, which would supply the actual collection title.

Analyze issues with 469 druids failing indexing

https://gist.github.com/mejackreed/68755eb862f279f660b33d1028ca09f8

Per @mejackreed:

Seems that jobs are processing just fine the first time around, but will often get stuck in the mud after they fail multiple times. Most of the failures seem to be issues like DiscoveryIndexer::Errors::MissingRDF on the indexing side

Analysis of one object in prod shows an ETD with RELS-EXT as follows:

<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="info:fedora/druid:bg468jw7546"></rdf:Description> </rdf:RDF>

However, in the -test environment this is missing altogether. Seems to be from a batch of bad data from 2012.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.