Coder Social home page Coder Social logo

avendesora / pythonbible Goto Github PK

View Code? Open in Web Editor NEW
55.0 55.0 11.0 579.58 MB

A python library for validating, parsing, normalizing scripture references and retrieving scripture texts (for open source and public domain versions)

Home Page: https://docs.python.bible

License: MIT License

Python 100.00%
bible hacktoberfest python scripture-references

pythonbible's Introduction

Hello there 👋

My name is Nathan.

An image of @avendesora's Holopin badges, which is a link to view their full Holopin profile

pythonbible's People

Contributors

avendesora avatar codacy-badger avatar dependabot[bot] avatar james-patton avatar otto-dev avatar pre-commit-ci[bot] avatar william-patton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pythonbible's Issues

Compound reference fails for Job

These work as expected
bible.get_references('Ezra 1:1-Ezra 1:3')
bible.get_references('Job 1:1-3')

This doesn't
bible.get_references('Job 1:1-Job 1:3')

No bible book abbreviations

It would be very helpful if there was an easy way to utilize the bible book abbreviations. A simplified method would just have a direct dictionary, but there are several variants on each book (anywhere from 2-4 letter abbreviations). For example:

bible_book_abbreviations = {
    "Genesis": "Ge",
    "Exodus": "Ex",
    "Leviticus": "Le",
    "Numbers": "Nu",
    "Deuteronomy": "De",
    "Joshua": "Jos",
    "Judges": "Jg",
    "Ruth": "Ru",
    "1 Samuel": "1S",
    "2 Samuel": "2S",
    "1 Kings": "1K",
    "2 Kings": "2K",
    "1 Chronicles": "1Ch",
    "2 Chronicles": "2Ch",
    "Ezra": "Ez",
    "Nehemiah": "Ne",
    "Esther": "Es",
    "Job": "Job",
    "Psalms": "Ps",
    "Proverbs": "Pr",
    "Ecclesiastes": "Ec",
    "Song of Solomon": "So",
    "Song of Songs": "So",
    "Isaiah": "Is",
    "Jeremiah": "Je",
    "Lamentations": "La",
    "Ezekiel": "Ez",
    "Daniel": "Da",
    "Hosea": "Ho",
    "Joel": "Joe",
    "Amos": "Am",
    "Obadiah": "Ob",
    "Jonah": "Jon",
    "Micah": "Mic",
    "Nahum": "Na",
    "Habakkuk": "Hab",
    "Zephaniah": "Zep",
    "Haggai": "Hag",
    "Zechariah": "Zec",
    "Malachi": "Mal",
    "Matthew": "Mt",
    "Mark": "Mk",
    "Luke": "Lk",
    "John": "Jn",
    "Acts": "Ac",
    "Romans": "Ro",
    "1 Corinthians": "1Co",
    "2 Corinthians": "2Co",
    "Galatians": "Ga",
    "Ephesians": "Eph",
    "Philippians": "Php",
    "Colossians": "Col",
    "1 Thessalonians": "1Th",
    "2 Thessalonians": "2Th",
    "1 Timothy": "1Ti",
    "2 Timothy": "2Ti",
    "Titus": "Tit",
    "Philemon": "Phm",
    "Hebrews": "Heb",
    "James": "Jas",
    "1 Peter": "1Pe",
    "2 Peter": "2Pe",
    "1 John": "1Jn",
    "2 John": "2Jn",
    "3 John": "3Jn",
    "Jude": "Jud",
    "Revelation": "Re"
}

ASV data has incomplete bible

In pythonbible/bible/data/asv/verses.json only the first line of each verse exists. For example, Psalm 1:1 (19001001) only contains "Blessed is the man that walketh not in the counsel of the wicked," and not the full "Blessed is the man that walketh not in the counsel of the wicked, Nor standeth in the way of sinners, Nor sitteth in the seat of scoffers: "

I suspect this is due to the newlines in the poetic or multi-line verses. This is a pretty major issue because large chunks of the bible are missing with no indication that something failed. To be clear: this is a data storage issue, not a problem with the code itself.

Feature idea: Add flag to return formatted references as abbreviated

Or, add an optional dict input of a "short title" dictionary. It would be handy when trying to get the output references into a specific format.

I gave it a quick try, but had a hashable dict error:

import pythonbible as bible
from pythonbible.books import Book
text ='Jeremiah 10:11-12;'

titles = {
    Book.JEREMIAH: "Jer", # instead of Jeremiah
}

references = bible.get_references(text)
formatted = bible.format_scripture_references(references, short_titles=titles)

print(formatted)

I modified two functions. They way you pass through kwargs is pretty nice! I haven't used that before.

# fomatter.py:
def _get_book_title(book: Book, include_books: bool = True, **kwargs: Any) -> str:
    if not include_books:
        return ""

    version: Version = kwargs.get("version", DEFAULT_VERSION)
    full_title: bool = kwargs.get("full_title", False)
    version_book_titles: BookTitles = get_book_titles(book, version, **kwargs)

    return (
        version_book_titles.long_title
        if full_title
        else version_book_titles.short_title
    )

# formatter.py:
@lru_cache()
def get_book_titles(book: Book, version: Version, **kwargs: Any) -> BookTitles:
    """Return the book titles for the given Book and optional Version.

    :param book: a book of the Bible
    :type book: Book
    :param version: a version of the Bible, defaults to American Standard
    :type version: Version
    :return: the long and short titles of the given book and version
    :rtype: BookTitles
    :raises MissingBookFileError: if the book file for the given book and version does
                                  not exist
    """
    short_titles, long_titles = _get_version_book_titles(version or DEFAULT_VERSION)

    if short_titles in kwargs:
        short_titles = kwargs.short_titles

    if long_titles in kwargs:
        long_titles = kwargs.long_titles

    short_title = short_titles.get(book, book.title)
    long_title = long_titles.get(book, book.title)

    return BookTitles(long_title, short_title)

but get this error:

    version_book_titles: BookTitles = get_book_titles(book, version, **kwargs)
TypeError: unhashable type: 'list'

if you've handled this errror before, I'd appreciate a tip :) Otherwise I'll probably give it a shot again sometime.

Thanks!

Convert docs to use Sphinx

Using Sphinx instead of docusaur will simplify things by keeping it all within the python ecosystem. It should also greatly reduce the number of dependabot alerts (and the number of dependencies).

OSIS Parser is ignoring the seg and divineName tags

The KJV OSIS XML file uses the divineName tag for LORD, but the OSISParser is currently not handling that tag or its parent seg tag, so the text within is not included in the output.

For example:

import pythonbible as bible

# 1 Chronicles 16:8
bible.get_verse_text(13016008)

The output is:

'Give thanks unto the call upon his name, make known his deeds among the people.'

But it should be:

'Give thanks unto the LORD, call upon his name, make known his deeds among the people.'

This is the input XML for that verse:

<verse osisID="1Chr.16.8" sID="1Chr.16.8.seID.11183" n="8" />
<w lemma="strong:H3034" morph="strongMorph:TH8685">Give thanks</w> 
unto the 
<seg>
    <divineName>
        <w lemma="strong:H3068">LORD</w>
    </divineName>
</seg>,
<w lemma="strong:H7121" morph="strongMorph:TH8798">call</w>
<w lemma="strong:H8034">upon his name</w>,
<w lemma="strong:H3045" morph="strongMorph:TH8685">make known</w>
<w lemma="strong:H5949">his deeds</w>
<w lemma="strong:H5971">among the people</w>.

This issue was reported via email by Nathan Wood.

Mark 9:43 is blank in KJV

>>> import pythonbible as bible
>>> kjv_parser = bible.get_parser(version=bible.Version.KING_JAMES)
>>> kjv_verse_text = kjv_parser.get_verse_text(41009043)
>>> kjv_verse_text
'43.'

If I do the same thing in ASV, I get:

>>> import pythonbible as bible
>>> asv_parser = bible.get_parser(version=bible.Version.AMERICAN_STANDARD)
>>> asv_verse_text = asv_parser.get_verse_text(41009043)
>>> asv_verse_text
'43. And if thy hand cause thee to stumble, cut it off: it is good for thee to enter into life maimed, rather than having thy two hands to go into hell, into the unquenchable fire.'

Add support for portuguese

Hi,

Can you add support for portuguese books name and abreviation?

https://avinuapp.com/bible-books.html

Or:

Português Espanhol Abreviado (Português) Abreviado (Espanhol)
Gênesis Génesis Gn Gen
Êxodo Éxodo Êx Ex
Levítico Levítico Lv Lev
Números Números Nm Núm
Deuteronômio Deuteronomio Dt Deut
Josué Josué Js Jos
Juízes Jueces Jz Jue
Rute Rut Rt Rut
1 Samuel 1 Samuel 1 Sm 1 Sam
2 Samuel 2 Samuel 2 Sm 2 Sam
1 Reis 1 Reyes 1 Rs 1 Rey
2 Reis 2 Reyes 2 Rs 2 Rey
1 Crônicas 1 Crónicas 1 Cr 1 Cro
2 Crônicas 2 Crónicas 2 Cr 2 Cro
Esdras Esdras Ed Esd
Neemias Nehemías Ne Neh
Ester Ester Et Est
Job Job
Salmos Salmos Sl Sal
Provérbios Proverbios Pr Prov
Eclesiastes Eclesiastés Ec Ecl
Cantares de Salomão Cantar de los Cantares Ct Cant
Isaías Isaías Is Is
Jeremias Jeremías Jr Jer
Lamentações de Jeremias Lamentaciones de Jeremías Lm Lam
Ezequiel Ezequiel Ez Ez
Daniel Daniel Dn Dan
Oseias Oseas Os Os
Joel Joel Jl Jl
Amós Amós Am Am
Obadias Abdías Ob Abd
Jonas Jonás Jn Jon
Miqueias Miqueas Mi Miq
Naum Nahúm Na Nah
Habacuque Habacuc Hab Hab
Sofonias Sofonías Sof Sof
Ageu Hageo Ag Hag
Zacarias Zacarías Zc Zac
Malaquias Malaquías Ml Mal
Mateus Mateo Mt Mat
Marcos Marcos Mc Mar
Lucas Lucas Lc Luc
João Juan Jo Jn
Atos dos Apóstolos Hechos de los Apóstoles At Hch
Romanos Romanos Ro Rom
1 Coríntios 1 Corintios 1 Co 1 Cor
2 Coríntios 2 Corintios 2 Co 2 Cor
Gálatas Gálatas Gl Gál
Efésios Efesios Ef Ef
Filipenses Filipenses Fp Fil
Colossenses Colosenses Cl Col
1 Tessalonicenses 1 Tesalonicenses 1 Ts 1 Tes
2 Tessalonicenses 2 Tesalonicenses 2 Ts 2 Tes
1 Timóteo 1 Timoteo 1 Tm 1 Tim
2 Timóteo 2 Timoteo 2 Tm 2 Tim
Tito Tito Tt Tit
Filemom Filemón Fm Flm
Hebreus Hebreos Hb Heb
Tiago Santiago Tg Stg
1 Pedro 1 Pedro 1 Pe 1 Ped
2 Pedro 2 Pedro 2 Pe 2 Ped
1 João 1 Juan 1 Jo 1 Jn
2 João 2 Juan 2 Jo 2 Jn
3 João 3 Juan 3 Jo 3 Jn
Judas Judas Jd Jud
Apocalipse Apocalipsis Ap Ap

Missing books from the Apocrypha and Septuagint

The following books from the Apocrypha and Septuagint are completely missing from this library:

  • Baruch
  • Additions to Daniel
  • Prayer of Azariah
  • Bel and the Dragon
  • Song of the Three Young Men
  • Susanna
  • 2 Esdras
  • Additions to Esther
  • Epistle of Jeremiah
  • Judith
  • 3 Maccabees
  • 4 Maccabees
  • Prayer of Manasseh

We also need regular expressions, valid verses, and max verse number by chapter for all of the books from the Apocrypha and Septuagint, so the above books, plus:

  • 1 Esdras
  • 1 Maccabees
  • 2 Maccabees
  • Sirach/Ecclesiasticus
  • Tobit
  • Wisdom of Solomon

Philemon is getting parsed as Philippians and is causing an error during normalization of the reference

The following is throwing an error even though it is a valid reference.

import pythonbible as bible
text = "Philemon 1:9"
references = bible.get_references(text)

It is because the regular expression for the book of Philippians returns a match for this because it finds the abbreviation "Phil". We need to still support that abbreviation in Philippian's regular expression but only if it is not immediately followed by "emon".

Words in text are interpreted as a reference

Thanks for this :)

When trying to get references from text strings sometimes a word matching a bible book returns a full book reference. For example, the word "mark" in here is giving me an extra ref. Is there a way we can add a flag on the get_references function to say if a chapter number is required in the match?

text = """This one thing I do, forgetting those things which are behind.... I press toward the mark
    for the prize of the high calling of God in Christ Jesus. Phil 3:13, 14"""
references = bible.get_references(text)

print(bible.format_scripture_references(references))

#Mark;Philippians 3:13-14

NEW_AMERICAN_STANDARD seems to be broken

Hello,

Thank you for your work on this project! This is quite nice... I don't know if this is expected, but for some reason it just does not work...

import pythonbible as bible

reference = bible.NormalizedReference(bible.Book.GENESIS, 1, 1, 1, 31)
verse_ids = bible.convert_reference_to_verse_ids(reference)
for verse_id in verse_ids:
    verse = bible.get_verse_text(verse_id, version=bible.Version.NEW_AMERICAN_STANDARD)
    print(verse)

That throws the following error...

$ python bible.py
Traceback (most recent call last):
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/bible/bibles.py", line 54, in get_bible
    return BIBLES[version][bible_type]
           ~~~~~~^^^^^^^^^
KeyError: <Version.NEW_AMERICAN_STANDARD: 'NASB'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mpenning/bible.py", line 7, in <module>
    verse = bible.get_verse_text(verse_id, version=bible.Version.NEW_AMERICAN_STANDARD)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/formatter.py", line 425, in get_verse_text
    bible = get_bible(version, "plain_text_readers")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/bible/bibles.py", line 56, in get_bible
    raise MissingVerseFileError from error
pythonbible.errors.MissingVerseFileError
(py311_test) mpenning@mudslide:~$

If there is some reason that the verse for that version cannot be retrieved, it's better to make that clear (i.e. You must do foo to support bible.Version.NEW_AMERICAN_STANDARD). FWIW, you can retrieve quite a bit of the bible for free via biblegateway

Problems to understand some abreviations

Hi,

I made a new version of books in a fork to support portuguese/spanish.

I tested one by one, but i have a problem with only 5 books:

- Search for Philippians does not work (fp 1)
- The search for Leviticus does not work (lv 1)
- The search for Obadiah does not work (ob 1)
- The search for Lucas does not work (lc 1)
- The search for Colossians does not work (cl 1)

For example (above) it understand "lev 1" but not "lv 1", the same for others. I tested with LEVITICUS = 3, "Leviticus", r"(Lev|Lv)", ("Lev", "Lv") but it don't work too.

I don't know what im doing wrong. Can you point me what is wrong in regexp for this five cases?

from __future__ import annotations

from enum import Enum
from typing import Any
from typing import Type


def _build_book_regular_expression(
    book: str,
    prefix: str | None = None,
    suffix: str | None = None,
) -> str:
    return _add_suffix(_add_prefix(book, prefix), suffix)


def _add_prefix(regex: str, prefix: str | None = None) -> str:
    return regex if prefix is None else rf"(?:{prefix})(?:\s)?{regex}"


def _add_suffix(regex: str, suffix: str | None = None) -> str:
    return regex if suffix is None else rf"{regex}(?:\s*{suffix})?"


_SAMUEL_REGULAR_EXPRESSION = r"(Samuel|Sam\.*|Sa\.*|Sm\.*)"
_KINGS_REGULAR_EXPRESSION = r"(Kings|Kgs\.*|Kin\.*|Ki\.*|Reyes|Reis|Re\.*|Rs\.*)"
_CHRONICLES_REGULAR_EXPRESSION = r"(Chronicles|Chron\.*|Chro\.*|Chr\.*|Crónicas|Crônicas|Cr\.*)"
_JOHN_REGULAR_EXPRESSION = r"(John|Joh\.*|Jhn\.*|Jo\.*(?!shua|b|nah|el)|Jn\.*|Juan|João|Jn\.*)"
_CORINTHIANS_REGULAR_EXPRESSION = r"(Corinthians|Corintios|Coríntios|Co\.*)"
_THESSALONIANS_REGULAR_EXPRESSION = r"(Thessalonians|Tesalonicenses|Tessalonicenses|Th\.*|Ts\.*)"
_TIMOTHY_REGULAR_EXPRESSION = r"(Timothy|Timoteo|Timóteo|Ti\.*|Tm\.*)"
_PETER_REGULAR_EXPRESSION = r"(Peter|Pedro|Pe\.*|Pt\.*)"

_MACCABEES_REGULAR_EXPRESSION = r"(Maccabees|Macabeos|Macabeus|Ma\.*|M\.*)"

_FIRST = r"1|I\s+|1st\s+|First\s+|Primero\s+|Primeiro\s+|1\s+"
_SECOND = r"2|II|2nd\s+|Second\s+|Segundo\s+|2\s+"
_THIRD = r"3|III|3rd\s+|Third\s+|Tercero\s+|Terceiro\s+|3\s+"

_FIRST_BOOK = rf"{_FIRST}|(First\s+Book\s+of(?:\s+the)?)"
_SECOND_BOOK = rf"{_SECOND}|(Second\s+Book\s+of(?:\s+the)?)"

_EPISTLE_OF_PAUL_TO = r"Epistle\s+of\s+Paul\s+(?:the\s+Apostle\s+)?to(?:\s+the)?"
_GENERAL_EPISTLE_OF = r"(?:General\s+)?Epistle\s+(?:General\s+)?of"

_FIRST_PAUL_EPISTLE = rf"{_FIRST}|(First\s+{_EPISTLE_OF_PAUL_TO})"
_SECOND_PAUL_EPISTLE = rf"{_SECOND}|(Second\s+{_EPISTLE_OF_PAUL_TO})"

_FIRST_GENERAL_EPISTLE = rf"{_FIRST}|(First\s+{_GENERAL_EPISTLE_OF})"
_SECOND_GENERAL_EPISTLE = rf"{_SECOND}|(Second\s+{_GENERAL_EPISTLE_OF})"
_THIRD_GENERAL_EPISTLE = rf"{_THIRD}|(Third\s+{_GENERAL_EPISTLE_OF})"


class Book(Enum):
    """Book is an Enum that contains all the books of the Bible.

    :param name: the unique text identifier of the book
    :type name: str
    :param value: the unique numerical identifier of the book
    :type value: int
    :param title: the common English name of the book
    :type title: str
    :param regular_expression: the regular expression for the book
    :type regular_expression: str
    :param abbreviations: the allowed title abbreviations for the book
    :type abbreviations: tuple[str, ...]
    """

    def __new__(
        cls: Type[Book],
        *args: dict[str, Any],
        **kwargs: dict[str, Any],
    ) -> Book:
        obj: Book = object.__new__(cls)
        obj._value_ = args[0]
        return obj

    def __init__(
        self: Book,
        _: int,
        title: str,
        regular_expression: str,
        abbreviations: tuple[str, ...],
    ) -> None:
        """Set the title and regular_expression properties."""
        self._title_ = title
        self._regular_expression_ = regular_expression
        self._abbreviations_ = abbreviations

    @property
    def title(self: Book) -> str:
        return self._title_

    @property
    def regular_expression(self: Book) -> str:
        return self._regular_expression_

    @property
    def abbreviations(self: Book) -> tuple[str, ...]:
        return self._abbreviations_

    GENESIS = 1, "Genesis", r"(Gen\.*(?:esis)?|Gén\.*(?:esis)?|Gên\.*(?:esis)?|Gn\.*)", ("Gen", "Gn")
    EXODUS = 2, "Exodus", r"(Exo\.*(?:d\.*)?(?:us)?|Éxo\.*(?:do)?|Êxo\.*(?:do)?|Éx|Êx|Ex\.*)", ("Exo", "Exod", "Éx", "Êx", "Ex")
    LEVITICUS = 3, "Leviticus", r"(Lev\.*(?:iticus)?|Lev\.*(?:ítico)?|Lv\.*)", ("Lev", "Lv")
    NUMBERS = 4, "Numbers", r"(Num\.*(?:bers)?|Num\.*(?:eros)?|Núm\.*|Nm\.*)", ("Num", "Núm", "Nm")
    DEUTERONOMY = 5, "Deuteronomy", r"(Deu\.*(?:t\.*)?(?:eronomy)?|Deu\.*(?:teronomio)?|Dt\.*)", ("Deu", "Deut", "Dt")
    JOSHUA = 6, "Joshua", r"(Joshua|Josh\.*|Jos\.*|Jsh\.*|Josué|Js\.*)", ("Jos", "Jsh", "Josh", "Js")
    JUDGES = 7, "Judges", r"(Judges|Judg\.*|Jdgs\.*|Jdg\.*|Jueces|Jue\.*|Jz\.*)", ("Jdg", "Jdgs", "Judg", "Jue", "Jz")
    RUTH = 8, "Ruth", r"(Ruth|Rut\.*|Rth\.*|Rt\.*)", ("Rth", "Rut", "Rt")
    SAMUEL_1 = (
        9,
        "1 Samuel",
        _build_book_regular_expression(
            _SAMUEL_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
            suffix=r"Otherwise\s+Called\s+The\s+First\s+Book\s+of\s+the\s+Kings",
        ),
        ("Sa", "Sam", "Sm", "1Sm"),
    )
    SAMUEL_2 = (
        10,
        "2 Samuel",
        _build_book_regular_expression(
            _SAMUEL_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
            suffix=r"Otherwise\s+Called\s+The\s+Second\s+Book\s+of\s+the\s+Kings",
        ),
        ("Sa", "Sam", "Sm", "2Sm"),
    )
    KINGS_1 = (
        11,
        "1 Kings",
        _build_book_regular_expression(
            _KINGS_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
            suffix=r"\,\s+Commonly\s+Called\s+the\s+Third\s+Book\s+of\s+the\s+Kings",
        ),
        ("Re", "Rey", "Reis", "Reyes", "Kgs", "Kin", "Ki", "1Rs"),
    )
    KINGS_2 = (
        12,
        "2 Kings",
        _build_book_regular_expression(
            _KINGS_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
            suffix=r"\,\s+Commonly\s+Called\s+the\s+Fourth\s+Book\s+of\s+the\s+Kings",
        ),
        ("Re", "Rey", "Reis", "Reyes", "Kgs", "Kin", "Ki", "2Rs"),
    )
    CHRONICLES_1 = (
        13,
        "1 Chronicles",
        _build_book_regular_expression(
            _CHRONICLES_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
        ),
        ("Cr", "Crón", "Crôn", "Chron", "Chro", "Chr", "1Cr"),
    )
    CHRONICLES_2 = (
        14,
        "2 Chronicles",
        _build_book_regular_expression(
            _CHRONICLES_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
        ),
        ("Cr", "Crón", "Crôn", "Chron", "Chro", "Chr", "2Cr"),
    )
    EZRA = 15, "Ezra", r"(Ezr\.*(?:a)?|Esd\.*|Esdras|Ed\.*)", ("Ezr", "Esd", "Ed")
    NEHEMIAH = 16, "Nehemiah", r"(Neh\.*(?:emiah)?|Ne\.*|Neemias|Ne\.*)", ("Neh", "Ne")
    ESTHER = 17, "Esther", r"(Est\.*(?:h\.*)?(?:er)?|Est\.*|Ester|Et\.*)", ("Est", "Esth", "Et")
    JOB = 18, "Job", r"(Job|Jb\.*|Jó\.*)", ("Job", "Jb", "Jó")
    PSALMS = (
        19,
        "Psalms",
        r"(Psalms|Psalm|Pslm\.*|Psa\.*|Psm\.*|Pss\.*|Ps\.*|Salmos|Sal\.*|Sl\.*)",
        ("Ps", "Psa", "Pslm", "Psm", "Pss", "Sal", "Sl"),
    )
    PROVERBS = (
        20,
        "Proverbs",
        r"(Proverbs|Prov\.*|Pro\.*|Prv\.*|Proverbios|Prov\.*|Provérbios|Pv\.*)",
        ("Pro", "Prov", "Prv", "Pv"),
    )
    ECCLESIASTES = (
        21,
        "Ecclesiastes",
        r"(Ecclesiastes(?:\s+or\,\s+the\s+Preacher)?|Eclesiastés|Eclesiastes"
        r"|Eccles\.*(?!iasticus?)|Ecles\.*"
        r"|Eccle\.*(?!siasticus?)|Ecle\.*"
        r"|Eccl\.*(?!esiasticus?)(?!us?)|Ecl\.*"
        r"|Ecc\.*(?!lesiasticus?)(?!lus?)|Ec\.*|Ecl\.*|Qoh\.*)",
        ("Ec", "Ecc", "Eccl", "Eccle", "Eccles", "Ecl", "Ecle", "Ecles", "Qoh"),
    )
    SONG_OF_SONGS = (
        22,
        "Song of Songs",
        r"(Song(?: of (Solomon|Songs|Sol\.*))?|Cantar de los Cantares|Cânticos|Cantares|Ct\.*)"
        r"|Canticles|(Canticle(?: of Canticles)?)|SOS|Cant",
        ("Cant", "Canticle", "Canticles", "Song", "Song of Sol", "SOS", "Ct"),
    )
    ISAIAH = 23, "Isaiah", r"(Isa\.*(?:iah)?|Isaias|Isa\.*|Is\.*)", ("Isa", "Is")
    JEREMIAH = 24, "Jeremiah", r"(Jer\.*(?:emiah)?|Jeremias|Jer\.*|Je\.*|Jr\.*)", ("Jer", "Je", "Jr")
    LAMENTATIONS = (
        25,
        "Lamentations",
        _build_book_regular_expression(
            r"(Lam\.*(?:entations)?|Lamentaciones|Lamentações|Lam\.*|Lm\.*|Lá\.*)",
            suffix=r"of\s+Jeremiah",
        ),
        ("Lam", "Lm", "Lá"),
    )
    EZEKIEL = 26, "Ezekiel", r"(Ezekiel|Ezequiel|Eze\.*|Ezq\.*|Ezk\.*|Ez\.*)", ("Eze", "Ezq", "Ezk", "Ez")
    DANIEL = 27, "Daniel", r"(Dan\.*(?:iel)?|Dan\.*|Dn\.*)", ("Dan", "Dn")
    HOSEA = 28, "Hosea", r"(Hos\.*(?:ea)?|Oseas|Os\.*|O\.*)", ("Hos", "Os", "O")
    JOEL = 29, "Joel", r"(Joe\.*(?:l)?|Joel|Jl\.*)", ("Joe", "Jl")
    AMOS = 30, "Amos", r"(Amo\.*(?:s)?|Amós|Am\.*)", ("Amo", "Am")
    OBADIAH = 31, "Obadiah", r"(Oba\.*(?:d\.*(?:iah)?)?|Abdías|Obd\.*|Abd\.*|Ob\.*|Ab\.*)", ("Oba", "Obd", "Abd", "Ob", "Ab")
    JONAH = 32, "Jonah", r"(Jonah|Jon\.*|Jnh\.*|Jonás|Jn\.*|Jnh\.*)", ("Jnh", "Jon", "Jn")
    MICAH = 33, "Micah", r"(Mic\.*(?:ah)?|Miqueas|Mi\.*|Mq\.*)", ("Mic", "Mi", "Mq")
    NAHUM = 34, "Nahum", r"(?<!Jo)(Nah\.*(?:um)?|Nahúm|Na\.*)", ("Nah", "Na")
    HABAKKUK = 35, "Habakkuk", r"(Hab\.*(?:akkuk)?|Habacuc|Hab\.*|Hb\.*|Hc\.*)", ("Hab", "Hb", "Hc")
    ZEPHANIAH = 36, "Zephaniah", r"(Zep\.*(?:h\.*(?:aniah)?)?|Sofonías|Zefanias|Sof\.*|Zef\.*|Sf\.*|Zp\.*)", ("Zep", "Sof", "Zef", "Sf", "Zp")
    HAGGAI = 37, "Haggai", r"(Hag\.*(?:gai)?|Ageo|Ag\.*|Hg\.*)", ("Hag", "Ag", "Hg")
    ZECHARIAH = 38, "Zechariah", r"(Zec\.*(?:h\.*(?:ariah)?)?|Zacarías|Zacarias|Zac\.*|Zc\.*)", ("Zec", "Zac", "Zc")
    MALACHI = 39, "Malachi", r"(Mal\.*(?:achi)?|Malaquías|Malaquias|Mal\.*|Ml\.*)", ("Mal", "Ml")
    MATTHEW = 40, "Matthew", r"(Mat\.*(?:t\.*(?:hew)?)?|Mateo|Mat\.*|Mt\.*)", ("Mat", "Matt", "Mt")
    MARK = 41, "Mark", r"(Mark|Mar\.*|Mrk\.*|Marcos|Mr\.*|Mc\.*)", ("Mar", "Mrk", "Mr", "Mc")
    LUKE = 42, "Luke", r"(Luk\.*(?:e)?|Lucas|Luc\.*|Lc\.*)", ("Luk", "Luc", "Lc")
    JOHN = (
        43,
        "John",
        rf"(?<!(?:1|2|3|I)\s)(?<!(?:1|2|3|I)){_JOHN_REGULAR_EXPRESSION}",
        ("Jhn", "Jn", "Jo", "Joh"),
    )
    ACTS = (
        44,
        "Acts",
        _build_book_regular_expression(
            r"(Act\.*(?:s)?|Hechos|Atos|Act\.*|He\.*|At\.*)",
            suffix="of the Apostles",
        ),
        ("Act", "He", "At"),
    )
    ROMANS = 45, "Romans", r"(Rom\.*(?:ans)?|Romanos|Rom\.*|Rm\.*)", ("Rom", "Rm")
    CORINTHIANS_1 = (
        46,
        "1 Corinthians",
        _build_book_regular_expression(
            _CORINTHIANS_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Co", "Cor", "1Co"),
    )
    CORINTHIANS_2 = (
        47,
        "2 Corinthians",
        _build_book_regular_expression(
            _CORINTHIANS_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Co", "Cor", "2Co"),
    )
    GALATIANS = 48, "Galatians", r"(Gal\.*(?:atians)?|Gálatas|Gal\.*|Gl\.*)", ("Gal", "Gl")
    EPHESIANS = 49, "Ephesians", r"(?<!Z)(Eph\.*(?:es\.*(?:ians)?)?|Efesios|Efésios|Efe\.*|Ef\.*)", ("Eph", "Ephes", "Efe", "Ef")
    PHILIPPIANS = (
        50,
        "Philippians",
        r"(Ph(?:(p\.*)|(?:il\.*(?!e\.*(?:m\.*(?:on)?)?)(?:ippians)?))|Filipenses|Flp\.*|Fp\.*)",
        ("Php", "Phil", "Flp", "Fp"),
    )
    COLOSSIANS = 51, "Colossians", r"(Col\.*(?:ossians)?|Colosenses|Colossenses|Col\.*|Cl\.*)", ("Col", "Cl")
    THESSALONIANS_1 = (
        52,
        "1 Thessalonians",
        _build_book_regular_expression(
            _THESSALONIANS_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Th", "Thes", "Thess", "Ths", "1Ts"),
    )
    THESSALONIANS_2 = (
        53,
        "2 Thessalonians",
        _build_book_regular_expression(
            _THESSALONIANS_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Th", "Thes", "Thess", "Ths", "2Ts"),
    )
    TIMOTHY_1 = (
        54,
        "1 Timothy",
        _build_book_regular_expression(
            _TIMOTHY_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Ti", "Tim", "1Tm"),
    )
    TIMOTHY_2 = (
        55,
        "2 Timothy",
        _build_book_regular_expression(
            _TIMOTHY_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Ti", "Tim", "2Tm"),
    )
    TITUS = 56, "Titus", r"(Tit\.*(?:us)?|Tito|Tit\.*|Tt\.*)", ("Tit", "Tt")
    PHILEMON = (
        57,
        "Philemon",
        r"(Philemon|Philem\.*|Phile\.*|Phlm\.*|Phi\.*(?!l)|Phm\.*|Filemón|Filemon|Flm\.*|Fm\.*)",
        ("Phi", "Phile", "Philem", "Phlm", "Phm", "Flm", "Fm"),
    )
    HEBREWS = 58, "Hebrews", r"(Heb\.*(?:rews)?|Hebreos|Hebreus|Heb\.*|Hb\.*)", ("Heb", "Hb")
    JAMES = 59, "James", r"(Ja(?:me)?s\.*|Santiago|Tiago|San\.*|Stg\.*|Tg\.*)", ("Jas", "San", "Stg", "Tg")
    PETER_1 = (
        60,
        "1 Peter",
        _build_book_regular_expression(
            _PETER_REGULAR_EXPRESSION,
            prefix=_FIRST_GENERAL_EPISTLE,
        ),
        ("Pe", "Pet", "Pt", "1Pe"),
    )
    PETER_2 = (
        61,
        "2 Peter",
        _build_book_regular_expression(
            _PETER_REGULAR_EXPRESSION,
            prefix=_SECOND_GENERAL_EPISTLE,
        ),
        ("Pe", "Pet", "Pt", "2Pe"),
    )
    JOHN_1 = (
        62,
        "1 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_FIRST_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "1Jo"),
    )
    JOHN_2 = (
        63,
        "2 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_SECOND_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "2Jo"),
    )
    JOHN_3 = (
        64,
        "3 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_THIRD_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "3Jo"),
    )
    JUDE = 65, "Jude", r"(Jud\.*(:?e)?(?!ges)|Judas|Jd\.*)", ("Jud", "Jd")
    REVELATION = (
        66,
        "Revelation",
        _build_book_regular_expression(
            r"(Rev\.*(?:elation)?|Apocalipsis|Apocalipse|Rev\.*|Ap\.*)",
            suffix="of ((Jesus Christ)|John|(St. John the Divine))",
        ),
        ("Rev", "Ap"),
    )
    ESDRAS_1 = (
        67,
        "1 Esdras",
        _build_book_regular_expression(
            r"(Esdras|Esdr\.*|Esd\.*|Es\.*)",
            _FIRST,
        ),
        ("Es", "Esd", "Esdr"),
    )
    TOBIT = 68, "Tobit", r"(Tobit|Tob\.*|Tb\.*|Tobías|Tobias|Tb\.*)", ("Tb", "Tob")
    WISDOM_OF_SOLOMON = (
        69,
        "Wisdom of Solomon",
        r"(Wisdom of Solomon|Wisdom|Sabiduría|Sabedoria|Wisd\.* of Sol\.*|Wis\.*|(?<!Hebre)Ws\.*)",
        ("Wis", "Wisd of Sol", "Ws", "Sab", "Sb"),
    )
    ECCLESIASTICUS = (
        70,
        "Ecclesiasticus",
        r"(Sirach|Sir\.*|Eclesiástico|Eclesiástico|Ecclesiasticus|Ecclus\.*)",
        ("Ecclus", "Sir", "Eclo", "Ecl"),
    )
    MACCABEES_1 = (
        71,
        "1 Maccabees",
        _build_book_regular_expression(
            _MACCABEES_REGULAR_EXPRESSION,
            _FIRST,
        ),
        ("M", "Ma", "Mac", "Macc"),
    )
    MACCABEES_2 = (
        72,
        "2 Maccabees",
        _build_book_regular_expression(
            _MACCABEES_REGULAR_EXPRESSION,
            _SECOND,
        ),
        ("M", "Ma", "Mac", "Macc"),
    )

Thanks.

Problem in compound verses with prefix

These works

bible.get_references('Ezra 1:1-Ezra 2:1')
bible.get_references('1 Kings 1:1-3')

But these return wrong references or fail to return references

bible.get_references("1 Kings 1:1-Kings 1:10")
bible.get_references("1 Chronicles 1:5-1 Chronicles 1:7")

My hunch is the 2nd "1" in "1 Chronicles" is getting picked up like I had tried to find "1 Chronicles 1:5-1"

A few potential test fails?

Thanks again for this :)

I have a couple strings that produce errors/unexpected results. Are any of these valid as is?

import pythonbible as bible


tests = [
'Or, Micah. 2Ch. 34:20', # probably fails because of "Micah."
'Or, of. Psalm. 46, title.', # unsure
'Or, A psalm for Asaph to give instruction. Psalms. 74, title.', # unsure
'Or, anathema. 1 Corinthians 1Co. 16:22', # misses 2nd ref, because it things the 1 on "1Co." goes with the first ref? maybe ok?
'Or, loving to the brethren. 1Peter 1Pe. 1:22', # same as above
'1Peter. 1:22' # no ref found here. if there is a dot after a full book name, we don't get matches.
]

for text in tests:
    try:
        references = bible.get_references(text)

        print(references)
    except BaseException as e:
        print(e)

Error when getting KJV verse text for 41009038 (Mark 9:38)

When attempting to get the KJV verse text for Mark 9:38 (verse_id = 41009038).

import pythonbible as bible
parser = bible.get_parser(version=bible.Version.KING_JAMES)
verse_text = parser.get_verse_text(41009038)
Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 101, in get_verse_text
    paragraphs = _get_paragraphs(self.tree, self.namespaces, [verse_id], **kwargs)
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 132, in _get_paragraphs
    paragraph_element, verse_ids, current_verse_id, **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 184, in _get_paragraph_from_element
    **kwargs,
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 242, in _handle_child_element
    return paragraph, skip_till_next_verse, new_current_verse_id
UnboundLocalError: local variable 'new_current_verse_id' referenced before assignment

A way to use without load bibles

Hi,

I want use only the parser part of this library but instead of ti, it is loading all the bible data and my server is crashing because of memory amount:

avinuteologia-1   |     import pythonbible as pb
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/__init__.py", line 30, in <module>
avinuteologia-1   |     from .formatter import format_scripture_references
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/formatter.py", line 8, in <module>
avinuteologia-1   |     from pythonbible.bible.bibles import get_bible
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/bible/bibles.py", line 5, in <module>
avinuteologia-1   |     import pythonbible.bible.asv.html as asv_html
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 879, in exec_module
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 975, in get_code
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 1074, in get_data
avinuteologia-1   | MemoryError

Good job :)

BUG: get_verse_text(67001009) gives wrong value

1 Esdras 1:9 has verse ID 67001009.

Calling get_verse_text(67001009) gives the entire text of the hebrew bible and new testament, instead of the verse in question. This appears to be true regardless of the value of the version argument.

It also appears to be true for subsequent verses, e.g. 67001010, etc.

I am using pythonbible==0.11.0.

When getting the references for a string containing multiple references separated by commas, if the reference has a prefix, the prefix could be mistaken for a chapter/verse.

For example, the string:

"1 Corinthians 1:1, 2 Corinthians 1:1"

Should return the following two references when getting the references:

[
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    ),
    NormalizedReference(
        book=<Book.CORINTHIANS_2: 47>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    )
]

However, the "2" in 2 Corinthians is interpreted as a verse rather than part of the book title, and the following references are returned:

[
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    ),
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=2,
        end_chapter=1,
        end_verse=2,
        end_book=None
    )
]

Potential issue with duplicated verses in references

Thanks again for this, I'm getting a lot of use :)

I found another potential issue. When a set of text has repeating references, they are grouped by book, but it seems not by chapter or verse. Meaning that the formatted output has duplicate verses. Here's an example, notice the refs for John and Hebrews for dupe verses, and Genesis for dupe chapter. I added a dupe remove on the verse id's to get the output I was expecting:

import pythonbible as bible

text ='Jeremiah 10:11-12;John 1:1;Hebrews 1:8-12;Genesis 1:1,2:4,2:7;Malachi 3:18;John 1:1;Psalms 33:6,9,136:5;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-10,11:3'

references = bible.get_references(text)
formatted = bible.format_scripture_references(references)

print(formatted)

# list and set to remove dups
verse_ids = list(set(bible.convert_references_to_verse_ids(references)))

new_references = bible.convert_verse_ids_to_references(verse_ids)
formatted_2 = bible.format_scripture_references(new_references)

print(formatted_2)

output:

# initial output
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1,1,1-3;Colossians 1:16-17;Hebrews 1:8,8-9,9-10,10-12,11:3

# with dupes manually removed list(set(...)) on verse_ids
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-12,11:3

Missing the first few words of some verses in ASV

When getting the verse text for Exodus 20:3 in the ASV version, the first two words are missing.

import pythonbible as bible
references = bible.get_references("Exodus 20:3")
verse_ids = bible.convert_references_to_verse_ids(references)
kjv_parser = bible.get_parser(version=bible.Version.KING_JAMES)
kjv_verse_text = kjv_parser.get_verse_text(verse_ids[0])

The verse text for KJV looks right:

'3. Thou shalt have no other gods before me.'

But, when I get the verse text for ASV:

asv_parser = bible.get_parser(version=bible.Version.AMERICAN_STANDARD)
asv_verse_text = asv_parser.get_verse_text(verse_ids[0])

The ASV verse text is missing the "Thou shalt", but it is in the XML file.

'3. have no other gods before me.'

Support for <note><rdg></rdg></note> tags (optional verses? especially in the ASV version)?

For example:

<verse osisID="Matt.17.21" sID="Matt.17.21.seID.24668" n="21" />
<note type="translation" osisRef="Matt.17.21" osisID="Matt.17.21!note.1" placement="foot">
<reference type="source" osisRef="Matt.17.21">17:21
</reference>Many authorities, some ancient, insert v. 21.
<rdg>But this kind goeth not out save by prayer and fasting.
</rdg>See Mrk 9:29.</note>
<verse eID="Matt.17.21.seID.24668" /></p><p>

Can't get references for an entire book of the Bible

I can't get the normalized references for an entire book of the bible. For example:

import pythonbible as bible
references = bible.get_references("Genesis")

Raises the following error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 23, in get_references
    references.extend(normalize_reference(match[0]))
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 49, in normalize_reference
    sub_reference, book, start_chapter
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 82, in _process_sub_reference
    start_chapter = int(min_chapter_and_verse[0].strip())
ValueError: invalid literal for int() with base 10: ''

I would expect it to return:

[(<Book.GENESIS: 1>, 1, 1, 50, 26)]

Error when getting Genesis 1 with one verse per paragraph

The following code raises a ValueError:

import pythonbible as bible
parser = bible.get_parser()
references = bible.get_references("Genesis 1")
verse_ids = bible.convert_references_to_verse_ids(references)
passage = parser.get_scripture_passage_text(verse_ids)
passage2 = parser.get_scripture_passage_text(verse_ids, one_verse_per_paragraph=True)

This is the error I'm getting:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 84, in get_scripture_passage_text
    paragraphs = _get_paragraphs(self.tree, self.namespaces, verse_ids, **kwargs)
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 139, in _get_paragraphs
    tree, namespaces, verse_ids[current_verse_index:], **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 132, in _get_paragraphs
    paragraph_element, verse_ids, current_verse_id, **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 170, in _get_paragraph_from_element
    child_element, verse_ids, new_current_verse_id
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 295, in _is_next_verse
    book_id, chapter, verse = child_element.get("osisID").split(".")
ValueError: too many values to unpack (expected 3)

get_referances() is failing to detect

Working on a project and been having issues with the function properly getting the scripture.

i.e. "Exodus 5:26" will not return a NormalizedReference

import pythonbible as bible
bible.get_references('Exodus 5:26')  # -> []
bible.get_references('Exodus')  # -> [NormalizedReference(book=<Book.EXODUS: 2>, start_chapter=1, start_verse=1, end_chapter=40, end_verse=38, end_book=None)]
]

fuzzy searches, get_references for messy ASR

Machine Generated ASR programs like open-Ai's Whisper are on the rise and tend to output messy formatting of scripture, with difficulties in consistent int/ordinals/words for book/chapter/verse numbers, spans, and have varying capitalizations problems, etc.

Here are a handful of examples lines from webvtt/srt outputs from a batch I've run recently:

  • Second Timothy chapter two verses three and four says endure hardship
  • If you read Ephesians four 17 through 32 all the ammunition
  • remember that powerful message of Paul in first Corinthians nine
  • in Jesus's first sermonic presentation on planet earth in Matthew five through seven,
  • Jesus said over in Matthew chapter six, verse number 12,
  • Genesis four, 25.
  • and forth between Haggai two and Ezra three.
  • and go and report to John one-fifteen and thirty.
  • I want to focus on here is Colossians chapter three, 22 through verses through chapter four, verse one.
  • In 1 Corinthians 9.22, you see Paul saying
  • says in Mark 16 10 that the disciples were
  • through that fire, 1 Kings 18.24-38, 1 Chronicles 21.26, 2 Chronicles 7.1-3.
  • open their Bibles to first Corinthians 14, 34, 35 and say, look
  • Genesis 1, 26, 2, 7, and 21, 22.
  • look in Revelations 21, 1 through 7, you can start reading all about
  • Psalms 103.12 says
  • for one another Galatians 6 1 & 2 clearly gives us

It will take a post-processing step to clean this sort of data up for nearly anyone using these tools seriously and while feeding the inputs into an LLM or NLP tookit may make sense, it would be swell if a library like this one could do some of the heavy lifting to normalize scripture referenced in a string. Tall order/deep rabbit hole, I understand, but worth a shot.

Suggest a reformat_fuzzy_references that returns (attempts) a reformatted input_string with even a subset of the most common speech patterns into a normalized form. Bonus points if the user can have some configuration control on output styles, e.g. omit "chapter" or use "v./vv."

Assumed gotchas:

  • Strings may contain other semi-formatted numbers that a simple regex search may false flag upon:
    • I was just in class at 8.30 with my friend Wilson
    • We're going to talk at 3.30 this afternoon about the discipline of grace and there is
    • So in Acts chapter 2, 3,000 were saved.

Roman Numeral Chapter Numbers

I was looking through a book of sermons and noticed that all of the chapter numbers in the scripture references where Roman numerals. In the interest of being able to easily parse scanned text for scripture references, we probably ought to update our regular expressions to find references with chapter numbers in Roman numeral form and update the normalize function to convert it into the appropriate integer value.

These references also do not contain the colon (probably since it is unnecessary with the chapter numbers and verse numbers being in a different format) but rather a period.

For example:

Matthew xvii. 19-21
Isa. iii. 10, 11
Jeremiah xlviii. 11, 12
1 John v. 10
2 Kings vii. 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.