Coder Social home page Coder Social logo

squishysquid / solrmarc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from solrmarc/solrmarc

1.0 0.0 0.0 66.84 MB

SolrMarc is a utility that reads in MARC records, extracts information from various fields as specified in an indexing specification, and sends that information to a specified Apache Solr index.

Java 30.53% HTML 0.01% Lex 0.30% mIRC Script 69.16%

solrmarc's Introduction

Overview   Build Status

SolrMarc is designed to read MARC records and to extract data from those records to build an Apache Solr index. It relies on the library Marc4j for reading MARC records and then uses a user-provided indexing specification to determine what fields are to be created for the Solr input document, and where that data should be extracted from, lastly it uses the SolrJ library for sending the Solr input documents to the Solr index.

As of version 3.0 the program has been completely re-written, based on code written by Oliver Obenland, (See https://github.com/oobenland/SolrMarc-Indexer-Tests)
The key design improvement Oliver created is to essentially compile the indexing specification once, and then apply that "compiled" version to each of the records that need indexing. I have taken his code and added handling of the basic field specification of SolrMarc (such as: title_display = 245abnp ) via a parser specification (CUP and JFlex) which makes defining and handling more complex specifications simpler.

The goal of the design is a program which operates much the same as the earlier versions of SolrMarc, including being able to process index specifications that worked with previous versions and produce substantially the same Solr records. But with the further goals of operating much faster and supporting a richer superset of features in the index specification language.

Included with this project is a Swing-based interactive interface that could eventually be used to develop, modify, extend and debug a set of indexing specifications, but for now it can be used to see how some of the new features will work.

A more in-depth description of the differences in this new version can be found in the Wiki, as well as information on how to install the program, how to create an index specification, how to run the program with that specification.

Additionally there is some information there about the code and design of the program for those that might be interested in contributing to the project.

solrmarc's People

Contributors

cedelis avatar demiankatz avatar eocarragain avatar greg-pendlebury avatar haschart avatar kocher avatar michaelrlevy avatar mtrojan-ub avatar ndushay avatar todolson avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.