Coder Social home page Coder Social logo

marc4js's Introduction

Build Status

A Node.js module for handling MARC records

Installation

npm install marc4js

Features

marc4js provides the following features

  • An easy to use API that can handle large record sets.
  • Uses Node.js stream API and pipe functions for parsing and writing ISO2709 format, MarcEdit text (mrk) format, MARC in JSON, and MARCXML.
  • Offers callback functions for parsing and writing various formats.
  • SAX based MARCXML parsing that doesn't in-memory storage of records while parsing. Able to parse large MARCXML file with ease.
  • A MARC record object model for in-memory editing of MARC records, similar to the Marc4J object model
  • Supports UTF-8 encoded marc files and MARC-8 encoded marc files (It requires marc8 to handle MARC-8 encoded files).

Examples

Examples can be found in the the marc4js_examples. You can also find examples in the test directory.

Usage

var marc4js = require('marc4js');

Parsers

Parsers take various MARC formats and convert them to marc4js.marc.Record objects. Marc4js supports ISO2709, text (MarcEdit .mrc file) and MARCXML formats.

There are three ways to use a parser.

Callback API

marc4js.parse(data, options, function(err, records) {
});

Stream API

var parser = marc4js.parse(options);
parser.on('data', function(record) {
});
parser.on('end', function() {
});
parser.on('error', function(err) {
});
parser.write(data);
parser.end();

All events are based on the Node.js stream API.

Note that the parsers always work in the paused (aka non-flowing) streaming mode - therefore the objectMode option of the stream api is disabled, and is always set to true. Listening to the readable event will throw an erorr.

Pipe function

var parser = marc4js.parse(options);
fs.createReadStream('/path/to/your/file').pipe(parser).pipe(transformer).pipe(process.stdout);

options

format: default iso2709, possible values iso2709, marc, text, mrk, marcxml, xml

Different types of parsers

Iso2709Parser

Parses ISO2709 format. Used by default or when format is iso2709 or marc

MrkParser

Parses MarcEdit text format (.mrk files). Used when format is mrk

Other options:

  • spaceReplace: In MarcEdit mrk files, spaces in data field indicators or control fields are replace by \. By default MrkPaser will convert \ to space in those places. It can be configured with this option.
TextParser

Parses a text format that is slightly different from mrk format. Used when format is text.

MarcxmlParser

Parses MarcEdit text format (.mrk files). Used when format is marcxml or xml

The stream and pipe API is SAX based so it doesn't require in-memory storage of the records. This is suitable for processing large MARCXML file. The callback API will read all records in memory and return it in the callback function and is not advised to process large MARCXML file.

Other options:

  • strict: default is false. When in strict mode, the parser will fail if the XML is not well-formatted. For details, see the strict option in sax-js.
MijParser

Parses MARC-in-JSON format. Used when format is json or mij.

The stream and pipe API uses a sax-like JSON stream parser so it doesn't require in-memory storage of the records. Thus it can process large number of MARC-in-JSON records.

Transformers

Transformers transform the marc4js.marc.Record objects into various MARC formats. Marc4js supports ISO2709, text (MarcEdit .mrc file) and MARCXML formats.

Like parsers, transformers can also be used in three different ways.

Callback API

marc4js.transform(records, options, function(err, output) {
});

Stream API

var transformer = marc4js.transform(options);
transformer.on('readable', function(output) {
});
transformer.on('end', function() {
});
transformer.on('error', function(err) {
});
transformer.write(record); // one record
// or to write an array of records
// records.forEach(function(record) {
//     transformer.write(record);
// });
transformer.end();

Note that even though parsers can be only in the flowing mode, the transformers can use either flowing or paused (aka non-flowing) mode in the stream API. In the above example it's using the paused mode, but it can also use the data event handler if flowing mode is used.

Pipe function

var transformer = marc4js.transform(options);
fs.createReadStream('/path/to/your/file').pipe(parser).pipe(transformer).pipe(process.stdout);

options

format: default iso2709, possible values iso2709, marc, text, mrk, marcxml, xml objectMode: default false. Used to switch between the flowing and paused (aka non-flowing) mode in the stream API.

Different types of Transformers

Iso2709Transformer

Outputs ISO2709 format. Used by default or when format is iso2709 or marc

MrkTransformer

Outputs MarcEdit text format (.mrk files). Used when format is mrk

Other options:

  • spaceReplace: by default space in data field indicators and control fields are replaced with \. But it can be configured with this option.
TextTransformer

Outputs text format, which is slightly different from mrk format. Used when format is text.

MarcxmlTransformer

Outputs MarcEdit text format (.mrk files). Used when format is marcxml or xml

Other options:

  • pretty: default is true. Output XML in pretty format. If set to false, new indentation and line-breakers in outputs.
  • indent: default is ' ' (two spaces). Used to indent lines in pretty format.
  • newline: default is \n. Used in pretty format.
  • declaration: default is true. If set to false, the XML declaration line (<?xml versiont ...>) is not included in the output.
  • root: default is true. If false, the root <collection> element is not included in the output.
MijTransformer

Outputs MARC-in-JSON string. Used when format is json or mij.

Other options:

  • asArray: default is true. By default the output will be in an JSON array format, even if there is only one record. If this option set to false, the output will not write the enclosing brackets [ and ] at the beginning and end of the output.

marc4js's People

Contributors

cleydyr avatar larkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

marc4js's Issues

XML transform fails - invalid character

Receiving the following error message when transforming the marc record to XML:

Invalid character (in string: AV2 fiction readalong ; at index 2

I figure its an encoding issue. I forked this repo to troubleshoot with my own copy. I upgraded the version of xmlbuilder to 8.2.2 and added the allowSurrogateChars: true parameter to the .create() method in the marcxml_transformer.js file. No luck.

Worth noting that other transformers work fine.

See attached marc record as an example.

test.txt

Not in NPM registry

I'm getting an error that this is not in the npm registry. Would you mind publishing it? Thanks

Async/Await

Can this library be used via Async/Await?

Continue on Parse Error?

When encountering a parse error on a record, the parsing does not continue. Is there a way to allow the script to skip the record with the error and continue to the next?

MarcError: Subfield doesn't start with 0x1f

Parse MARC 21 records?

Hi there, most MARC records I come across look like this:

LEADER 00000cam a2200361 i 4500 
001    GR111154 
008    780609s1975    nyuaf    b    00110 eng u 
010    78102239 
020    0156309351 :|c$3.75 
035    (CaOTULAS)154869186 
035    (UtOrBLW)b10796848 
041 1  eng|hrus 
082 0  791.45/01 
090    PN1995|b.E52 1975 
100 1  Eisenstein, Sergei,|d1898-1948 
245 14 The film sense /|cby Sergei M. Eisenstein ; translated and
       edited by Jay Leyda 
250    [Rev. ed.] 
260    New York :|bHarcourt Brace Jovanovich,|cc1975 
300    x, 288 p., [2] leaves of plates :|bill. ;|c21 cm 
336    text|btxt|2rdacontent 
337    unmediated|bn|2rdamedia 
338    volume|bnc|2rdacarrier 
500    Includes index 
504    "Bibliography of Eisenstein's writings availabe in 
       English": p. 269-276 
650  0 Motion pictures|xAesthetics 
700 1  Leyda, Jay,|d1910-1988 
900    unlv|lmain 
910    rdae 
949 0  UNLM|p31147002913863 
989    unlv*PN1995 .E52 1975 
999    UNLM 

Do you have a parser that can handle this format? Thanks!

Using this in a browser

Is there anyway to use this outside of Node with just plain javascript (with require.js) in the browser? Or is it server-side only and dependent on Node? And is ES6 required?

Question about Transformer

Sorry if this is a newbie question. I'm trying to transform a record into "Text" and have that text be assigned to a variable as a string so I can return it in an HTML template. However I keep ending up with a TextTransformer object. I'm assuming this is because the transformation hasn't "finished". Suggestions on how to fix this.

	let record_s = marc4js.transform(this.record, {toFormat: 'text'}, function(err, output) {
		return output;
	});
	console.log(record_s);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.