#MARCspec - A common MARC record path language
MARCspec is the specification of a reference, encoded as string, to a set of data from within a MARC record.
See http://marcspec.github.io/MARCspec for lastest specification.
:page_facing_up: MARCspec - A common MARC record path language
Home Page: http://marcspec.github.io/MARCspec/marc-spec.html
#MARCspec - A common MARC record path language
MARCspec is the specification of a reference, encoded as string, to a set of data from within a MARC record.
See http://marcspec.github.io/MARCspec for lastest specification.
@cKlee, are you still maintaining this organization and the several implementations?
and furthermore:
since MARC is an implementation ISO 2709 there should be no additional constraints, e.g. alpha characters in field tags should be allowed
Typically one is interested in the last characters of the subfield preceeding 245$e : E.g. a field may be viewed upon as a data field, but subfields as subordinate data fields are the wrong concept: Its simply marks inserted at certain positions of the field data...
I very much doubt that one can invent an practically useful accessor syntax to MARC considerably "simpler" than full XPath.
The subfield branch of the marcSpec rule confuses me as to the purpose of the subSpec clauses:
MARCspec = fieldSpec *subSpec / (subfieldSpec *subSpec *(abrSubfieldSpec *subSpec)) / indicatorSpec *subSpec
Because one or more subSpecs can occur after the subfieldSpec, and after the abrSubfieldSpec, it seems like the following would be quite valid (it is valid with the parser I am building in python):
"880$a{?$f}$b$c$e{$f=\q}"
But I though the function of the abrSubfieldSpec and subSpecs after it are to allow multiple subfields to be specified. Would the subspec {$f=\a} be evaluated against all of the subfields? What is the sense of the first subspec in this?
With respect to http://www.loc.gov/marc/specifications/specrecstruc.html#varitags field tags in ANSI Z39.2 and ISO 2709 could consist of both alphabetic and numeric characters, although MARC 21 formats use only numeric tags.
The current Spec embraces this possibility by:
fieldTag = 3(alphalower / DIGIT / ".") / 3(alphaupper / DIGIT / ".")
@pkiraly suggests to disallow alphabetic characters and make LDR
or LEADER
an explicit field tag:
fieldTag = 3(DIGIT / ".") / "LDR" / "LEADER"
But if the overall MARCspec should cope ANSI Z39.2 and ISO 2709, we should support it, if this does not cause problems.
About the leader: LDR
is already covered by 3(DIGIT)
, so why make this explicit? And does LEADER
actually appear in data? This usage might lead to additional efforts for parsing. Is it necessary?
If a subfield does not exist, allow to specify a fallback subfield.
245$a|k
e.g. In titles I would like to point to everything after the Œ/Œ in a 245
field
Right now MARCspec refers to MARC 21. Does this limitation hold or is it also suitable for other applications of of ISO 2709?
For interoperability would it be useful/possible to assemble a language-agnostic collection of specs, and their expected stringified output given a test MARC record? The purpose would be to make sure that the MARCspec was parsed and interpreted correctly.
I was thinking of something in Markdown or JSON like:
{
"245$a": "Finnegan's Wake /",
"245$a$c": "Finnegan's Wake / James Joyce.",
...
}
If I were to put something together would this be of interest? I guess an initial set could be derived fairly easily from TestMarcSpecTest?
Reference subfield a of field 306 if character at position 0 of field 007 is either "m", "s" or "v".
306$a{007/0=~m|s|v}
/
seems more common than ~
.
008/0-3
instead of 008~0-3
Localy defined fields might contain the character "X" in the tag. This might lead to interpretation problems.
Having two words (MARC spec) as name is problematic, how about MARCspec?
I installed pandoc (v1.12.3) and made sure I had a clone of makespec in the directory above, and I see this when I run make:
pandoc: 1: openFile: does not exist (No such file or directory)
I'm new to this pandoc/makespec toolchain, so my apologies if this is a very basic question.
This was merged quite some time ago: 2b1d491
But I still see 020$s{?020$a}
at http://marcspec.github.io/MARCspec/marc-spec.html
Hi. Given example data
020$cLorem$aIpsum
020$cDolor
I expected 020$c{$a}
from example in 4.7.2 to return just ['Lorem']
(thinking
xpath-like 020[a]/c
), but instead I got ['Lorem', 'Dolor']
(from File_MARC_Reference). Of course I'm guilty of not having read the spec thoroughly enough, but if I understand it right, this is a result of point 2 in 2.3? And that 020$c{$a}
is just a shorthand for 020$c{020$a}
?
To avoid confusion, perhaps
Reference data content of subfield “c” of field “020”, if subfield “a” of field “020” exists.
could be clarified as
Reference data content of subfield “c” of any field “020”, if subfield “a” of any field “020” exists (not necessarily the same field).
or something along those lines?
see discussion pkiraly/qa-catalogue#23
According to the MARC 21 bibliographic standard as well as [UNIMARC 2008], a subfield code can only be alphabetic or numeric (MARC 21 specifies lower-case alphabetic). However, the grammar defines subfieldChar
and subfieldCode
as:
subfieldChar = %x21-3F / %x5B-7B / %x7D-7E
; ! " # $ % & ' ( ) * + , - . / 0-9 : ; < = > ? [ \ ] ^ _ \` a-z { } ~
subfieldCode = "$" subfieldChar
Is this intentional? Do non-bibliographic MARC use cases include punctuation as subfield codes?
Local defined field tags may contain alphabetic characters. Thus field tag should allow these:
alphaupper = %x41-5A ; A-Z
alphalower = %x61-7A; a-z
fieldTag = 3*3(((alphalower / alphaupper) / DIGIT)) / "LDR"
Problem: how to interprete "X" when defined as local field and not meant as wildcard? Use other character for wildcard? Like "*"?
Given a MARC field, one could further select parts of it. An example:
titles = getMARCspec(record, "245")
foreach titleField in titles
title = getMARCspec(titleField, "$a")
remainder = getMARCspec(titleField, "$b")
if title.endsWith(":") then
...
end
done
I'd propose to change the core syntax to:
MARCspec = fieldSpec / characterSpec / subfieldSpec
; refer to a (set of) fields
fieldSpec = fieldTag ["_" indicators]
; refer to a character position or range
characterSpec = [ fieldTag ] "/" characterPositionOrRange
; refer to a (set of) subfields of specified or given fields
subfieldSpec = fieldTag [ "$" ] subfieldTags ["_" indicators]
/ fieldTag "_" indicators "$" subfieldTags
/ "$" subfieldTags
This would also allow to select subfields of a given field, such as "$a". The preceding "$" is necessary to not confuse "123" (the field) with "$123" (three subfields). I'd also make it optional for "100a" == "100$a" and to support giving indicator before subfields ("245a_1" == "245$a_1" == "245_1$a").
Note that "$" is also a valid subfield tag, so "$" should be mandatory to refer to this subfield:
100a ; valid (subfield "a" of field 100)
100$ ; invalid
100$a ; valid (subfield "a" of field 100)
100$$ ; valid (subfield "$" of field 100)
Hi there, is there recommended way to define a combination of MARCspecs to indicate multiple applicable matches? Solrmarc and Traject both use a colon to delimit multiple specs.
Examples:
506
and 540
, it would be nice to be able to do something like 506:540
.650$z:650$a:034{LDR/6=\e}:255{LDR/6=\e}
Pointing to the first item, e.g. first author. For repeatable fields, point to the first in the list.
Possible solution: Prefix field tag with a character, which does not get encoded in URI. E.g. use "-". Thus the first field of all 100 fields is referenced by -100
. Other possible characters are "~", "_", "/", "+" and "*".
When I learn the specification and work on the implementation I had several conclusions I would like to share with you.
Comments on existing features:
Here is my formalized suggestion for renaming the specification
alphaupper = %x41-5A
; A-Z
alphalower = %x61-7A
; a-z
DIGIT = %x30-39
; 0-9
VCHAR = %x21-7E
; visible (printing) characters
positiveDigit = %x31-39
; "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
positiveInteger = "0" / positiveDigit [1*DIGIT]
; field
fieldTag = 3(DIGIT / ".")
/ "LDR"
/ "LEADER"
position = positiveInteger / "#"
range = position "-" position
positionOrRange = range
/ position
characterSpec = "/" positionOrRange
index = "[" positionOrRange "]"
shortField = index [characterSpec]
/ characterSpec
field = fieldTag [index] [characterSpec]
; subfield
subfieldChar = alphaupper
/ alphalower
/ DIGIT
subfieldCode = "$" subfieldChar
subfieldCodeRange = "$" ( (alphaupper "-" alphaupper)
/ (alphalower "-" alphalower)
/ (DIGIT "-" DIGIT) )
; [a-z]-[a-z] / [0-9]-[0-9]
shortSubfield = (subfieldCode / subfieldCodeRange) [index] [characterSpec]
subfield = fieldTag [index] shortSubfield
; indicator
shortIndicator = [index] "^" ("1" / "2")
indicator = fieldTag shortIndicator
; condition
comparisonString = ("'" *VCHAR "'")
/ ('"' *VCHAR '"')
operator = "=" / "!=" / "~" / "!~" / "!" / "?"
; equal / unequal / includes / not includes / not exists / exists
abbreviation = shortField
/ shortSubfield
/ shortIndicator
conditionTerm = field
/ subfield
/ indicatorPath
/ comparisonString
/ abbreviation
condition = [ [conditionTerm] operator ] conditionTerm
conditionSet = "{" condition *( "|" condition ) "}"
; the whole together
marcPath = field *conditionSet
/ (subfield *conditionSet *(shortSubfield *conditionSet))
/ indicatorPath *conditionSet
Besides that the relationship between the "path" and the "condition" is not clear for me. There can be two interpretations relating to the conditions, and for both there are valid use cases:
008/18{LDR/6=\t}
Here the situation is clear: 008
and LDR
are two different fields, here we should follow the first interpretation.
880$a{100$6~880$6/3-5}
020$c{020$a}
Suppose we have two 880 fields. Should we take both if the condition is true either of them, or we should take that 880 for which the condition is true? Same situation for 020 (which is repeatable field).
I would like to see a constraints in which the context is defined explicitly. We can use the following notation for the leftHandSide (or path) part:
self
or .
means the current context
020$c{.="something"}
- get 020$c if it's value is "something"parent
or ..
means the parent
020$c{..?$a}
- get 020$c if the same 020 field has subfield $a020$c{020$a}
- get 020$c if there is 020$a anywhere in the recordI admit, "make it more clear" is a very subjective statement, as we don't have absolute scale for semantic clearness. So this comment is more of a discussion opening one, than a final suggestion.
...and it'd be nice if it did. :-)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.