avendesora / pythonbible Goto Github PK

A python library for validating, parsing, normalizing scripture references and retrieving scripture texts (for open source and public domain versions)

Home Page: https://docs.python.bible

License: MIT License

Python 100.00%

bible hacktoberfest python scripture-references

pythonbible's Introduction

Hello there 👋

My name is Nathan.

😄 Pronouns: he/him
🔭 I’m currently working on ...
🌱 I’m currently learning ...
- Flutter/Dart
- React
- Rust

pythonbible's People

Contributors

Stargazers

Watchers

Forkers

sulembutproton rtbs-dev ha-jos elivesay data-steve jackman337 otto-dev bryokim aceday smarkoco paulocoutinhox

pythonbible's Issues

Move parser functionality to separate repository/library

The pythonbible library should just contain the output of the parser for select open-source or public domain versions of the Bible. The actual parser functionality should be in a separate library.

Compound reference fails for Job

These work as expected
bible.get_references('Ezra 1:1-Ezra 1:3')
bible.get_references('Job 1:1-3')

This doesn't
bible.get_references('Job 1:1-Job 1:3')

No bible book abbreviations

It would be very helpful if there was an easy way to utilize the bible book abbreviations. A simplified method would just have a direct dictionary, but there are several variants on each book (anywhere from 2-4 letter abbreviations). For example:

bible_book_abbreviations = {
    "Genesis": "Ge",
    "Exodus": "Ex",
    "Leviticus": "Le",
    "Numbers": "Nu",
    "Deuteronomy": "De",
    "Joshua": "Jos",
    "Judges": "Jg",
    "Ruth": "Ru",
    "1 Samuel": "1S",
    "2 Samuel": "2S",
    "1 Kings": "1K",
    "2 Kings": "2K",
    "1 Chronicles": "1Ch",
    "2 Chronicles": "2Ch",
    "Ezra": "Ez",
    "Nehemiah": "Ne",
    "Esther": "Es",
    "Job": "Job",
    "Psalms": "Ps",
    "Proverbs": "Pr",
    "Ecclesiastes": "Ec",
    "Song of Solomon": "So",
    "Song of Songs": "So",
    "Isaiah": "Is",
    "Jeremiah": "Je",
    "Lamentations": "La",
    "Ezekiel": "Ez",
    "Daniel": "Da",
    "Hosea": "Ho",
    "Joel": "Joe",
    "Amos": "Am",
    "Obadiah": "Ob",
    "Jonah": "Jon",
    "Micah": "Mic",
    "Nahum": "Na",
    "Habakkuk": "Hab",
    "Zephaniah": "Zep",
    "Haggai": "Hag",
    "Zechariah": "Zec",
    "Malachi": "Mal",
    "Matthew": "Mt",
    "Mark": "Mk",
    "Luke": "Lk",
    "John": "Jn",
    "Acts": "Ac",
    "Romans": "Ro",
    "1 Corinthians": "1Co",
    "2 Corinthians": "2Co",
    "Galatians": "Ga",
    "Ephesians": "Eph",
    "Philippians": "Php",
    "Colossians": "Col",
    "1 Thessalonians": "1Th",
    "2 Thessalonians": "2Th",
    "1 Timothy": "1Ti",
    "2 Timothy": "2Ti",
    "Titus": "Tit",
    "Philemon": "Phm",
    "Hebrews": "Heb",
    "James": "Jas",
    "1 Peter": "1Pe",
    "2 Peter": "2Pe",
    "1 John": "1Jn",
    "2 John": "2Jn",
    "3 John": "3Jn",
    "Jude": "Jud",
    "Revelation": "Re"
}

ASV data has incomplete bible

In pythonbible/bible/data/asv/verses.json only the first line of each verse exists. For example, Psalm 1:1 (19001001) only contains "Blessed is the man that walketh not in the counsel of the wicked," and not the full "Blessed is the man that walketh not in the counsel of the wicked, Nor standeth in the way of sinners, Nor sitteth in the seat of scoffers: "

I suspect this is due to the newlines in the poetic or multi-line verses. This is a pretty major issue because large chunks of the bible are missing with no indication that something failed. To be clear: this is a data storage issue, not a problem with the code itself.

Feature idea: Add flag to return formatted references as abbreviated

Or, add an optional dict input of a "short title" dictionary. It would be handy when trying to get the output references into a specific format.

I gave it a quick try, but had a hashable dict error:

import pythonbible as bible
from pythonbible.books import Book
text ='Jeremiah 10:11-12;'

titles = {
    Book.JEREMIAH: "Jer", # instead of Jeremiah
}

references = bible.get_references(text)
formatted = bible.format_scripture_references(references, short_titles=titles)

print(formatted)

I modified two functions. They way you pass through kwargs is pretty nice! I haven't used that before.

# fomatter.py:
def _get_book_title(book: Book, include_books: bool = True, **kwargs: Any) -> str:
    if not include_books:
        return ""

    version: Version = kwargs.get("version", DEFAULT_VERSION)
    full_title: bool = kwargs.get("full_title", False)
    version_book_titles: BookTitles = get_book_titles(book, version, **kwargs)

    return (
        version_book_titles.long_title
        if full_title
        else version_book_titles.short_title
    )

# formatter.py:
@lru_cache()
def get_book_titles(book: Book, version: Version, **kwargs: Any) -> BookTitles:
    """Return the book titles for the given Book and optional Version.

    :param book: a book of the Bible
    :type book: Book
    :param version: a version of the Bible, defaults to American Standard
    :type version: Version
    :return: the long and short titles of the given book and version
    :rtype: BookTitles
    :raises MissingBookFileError: if the book file for the given book and version does
                                  not exist
    """
    short_titles, long_titles = _get_version_book_titles(version or DEFAULT_VERSION)

    if short_titles in kwargs:
        short_titles = kwargs.short_titles

    if long_titles in kwargs:
        long_titles = kwargs.long_titles

    short_title = short_titles.get(book, book.title)
    long_title = long_titles.get(book, book.title)

    return BookTitles(long_title, short_title)

but get this error:

    version_book_titles: BookTitles = get_book_titles(book, version, **kwargs)
TypeError: unhashable type: 'list'

if you've handled this errror before, I'd appreciate a tip :) Otherwise I'll probably give it a shot again sometime.

Thanks!

Convert docs to use Sphinx

Using Sphinx instead of docusaur will simplify things by keeping it all within the python ecosystem. It should also greatly reduce the number of dependabot alerts (and the number of dependencies).

OSIS Parser is ignoring the seg and divineName tags

The KJV OSIS XML file uses the divineName tag for LORD, but the OSISParser is currently not handling that tag or its parent seg tag, so the text within is not included in the output.

For example:

import pythonbible as bible

# 1 Chronicles 16:8
bible.get_verse_text(13016008)

The output is:

'Give thanks unto the call upon his name, make known his deeds among the people.'

But it should be:

'Give thanks unto the LORD, call upon his name, make known his deeds among the people.'

This is the input XML for that verse:

<verse osisID="1Chr.16.8" sID="1Chr.16.8.seID.11183" n="8" />
<w lemma="strong:H3034" morph="strongMorph:TH8685">Give thanks</w> 
unto the 
<seg>
    <divineName>
        <w lemma="strong:H3068">LORD</w>
    </divineName>
</seg>,
<w lemma="strong:H7121" morph="strongMorph:TH8798">call</w>
<w lemma="strong:H8034">upon his name</w>,
<w lemma="strong:H3045" morph="strongMorph:TH8685">make known</w>
<w lemma="strong:H5949">his deeds</w>
<w lemma="strong:H5971">among the people</w>.

This issue was reported via email by Nathan Wood.

Mark 9:43 is blank in KJV

>>> import pythonbible as bible
>>> kjv_parser = bible.get_parser(version=bible.Version.KING_JAMES)
>>> kjv_verse_text = kjv_parser.get_verse_text(41009043)
>>> kjv_verse_text
'43.'

If I do the same thing in ASV, I get:

>>> import pythonbible as bible
>>> asv_parser = bible.get_parser(version=bible.Version.AMERICAN_STANDARD)
>>> asv_verse_text = asv_parser.get_verse_text(41009043)
>>> asv_verse_text
'43. And if thy hand cause thee to stumble, cut it off: it is good for thee to enter into life maimed, rather than having thy two hands to go into hell, into the unquenchable fire.'

Add support for portuguese

Hi,

Can you add support for portuguese books name and abreviation?

https://avinuapp.com/bible-books.html

Or:

Português	Espanhol	Abreviado (Português)	Abreviado (Espanhol)
Gênesis	Génesis	Gn	Gen
Êxodo	Éxodo	Êx	Ex
Levítico	Levítico	Lv	Lev
Números	Números	Nm	Núm
Deuteronômio	Deuteronomio	Dt	Deut
Josué	Josué	Js	Jos
Juízes	Jueces	Jz	Jue
Rute	Rut	Rt	Rut
1 Samuel	1 Samuel	1 Sm	1 Sam
2 Samuel	2 Samuel	2 Sm	2 Sam
1 Reis	1 Reyes	1 Rs	1 Rey
2 Reis	2 Reyes	2 Rs	2 Rey
1 Crônicas	1 Crónicas	1 Cr	1 Cro
2 Crônicas	2 Crónicas	2 Cr	2 Cro
Esdras	Esdras	Ed	Esd
Neemias	Nehemías	Ne	Neh
Ester	Ester	Et	Est
Jó	Job	Jó	Job
Salmos	Salmos	Sl	Sal
Provérbios	Proverbios	Pr	Prov
Eclesiastes	Eclesiastés	Ec	Ecl
Cantares de Salomão	Cantar de los Cantares	Ct	Cant
Isaías	Isaías	Is	Is
Jeremias	Jeremías	Jr	Jer
Lamentações de Jeremias	Lamentaciones de Jeremías	Lm	Lam
Ezequiel	Ezequiel	Ez	Ez
Daniel	Daniel	Dn	Dan
Oseias	Oseas	Os	Os
Joel	Joel	Jl	Jl
Amós	Amós	Am	Am
Obadias	Abdías	Ob	Abd
Jonas	Jonás	Jn	Jon
Miqueias	Miqueas	Mi	Miq
Naum	Nahúm	Na	Nah
Habacuque	Habacuc	Hab	Hab
Sofonias	Sofonías	Sof	Sof
Ageu	Hageo	Ag	Hag
Zacarias	Zacarías	Zc	Zac
Malaquias	Malaquías	Ml	Mal
Mateus	Mateo	Mt	Mat
Marcos	Marcos	Mc	Mar
Lucas	Lucas	Lc	Luc
João	Juan	Jo	Jn
Atos dos Apóstolos	Hechos de los Apóstoles	At	Hch
Romanos	Romanos	Ro	Rom
1 Coríntios	1 Corintios	1 Co	1 Cor
2 Coríntios	2 Corintios	2 Co	2 Cor
Gálatas	Gálatas	Gl	Gál
Efésios	Efesios	Ef	Ef
Filipenses	Filipenses	Fp	Fil
Colossenses	Colosenses	Cl	Col
1 Tessalonicenses	1 Tesalonicenses	1 Ts	1 Tes
2 Tessalonicenses	2 Tesalonicenses	2 Ts	2 Tes
1 Timóteo	1 Timoteo	1 Tm	1 Tim
2 Timóteo	2 Timoteo	2 Tm	2 Tim
Tito	Tito	Tt	Tit
Filemom	Filemón	Fm	Flm
Hebreus	Hebreos	Hb	Heb
Tiago	Santiago	Tg	Stg
1 Pedro	1 Pedro	1 Pe	1 Ped
2 Pedro	2 Pedro	2 Pe	2 Ped
1 João	1 Juan	1 Jo	1 Jn
2 João	2 Juan	2 Jo	2 Jn
3 João	3 Juan	3 Jo	3 Jn
Judas	Judas	Jd	Jud
Apocalipse	Apocalipsis	Ap	Ap

Missing books from the Apocrypha and Septuagint

The following books from the Apocrypha and Septuagint are completely missing from this library:

Baruch
Additions to Daniel
Prayer of Azariah
Bel and the Dragon
Song of the Three Young Men
Susanna
2 Esdras
Additions to Esther
Epistle of Jeremiah
Judith
3 Maccabees
4 Maccabees
Prayer of Manasseh

We also need regular expressions, valid verses, and max verse number by chapter for all of the books from the Apocrypha and Septuagint, so the above books, plus:

1 Esdras
1 Maccabees
2 Maccabees
Sirach/Ecclesiasticus
Tobit
Wisdom of Solomon

Cannot use many common versions: ESV, NASB, NKJV

I receive the following error when using many common English versions.

MissingBookFileError: <Version.ENGLISH_STANDARD: 'ESV'>

Philemon is getting parsed as Philippians and is causing an error during normalization of the reference

The following is throwing an error even though it is a valid reference.

import pythonbible as bible
text = "Philemon 1:9"
references = bible.get_references(text)

It is because the regular expression for the book of Philippians returns a match for this because it finds the abbreviation "Phil". We need to still support that abbreviation in Philippian's regular expression but only if it is not immediately followed by "emon".

Words in text are interpreted as a reference

Thanks for this :)

When trying to get references from text strings sometimes a word matching a bible book returns a full book reference. For example, the word "mark" in here is giving me an extra ref. Is there a way we can add a flag on the get_references function to say if a chapter number is required in the match?

text = """This one thing I do, forgetting those things which are behind.... I press toward the mark
    for the prize of the high calling of God in Christ Jesus. Phil 3:13, 14"""
references = bible.get_references(text)

print(bible.format_scripture_references(references))

#Mark;Philippians 3:13-14

NEW_AMERICAN_STANDARD seems to be broken

Hello,

Thank you for your work on this project! This is quite nice... I don't know if this is expected, but for some reason it just does not work...

import pythonbible as bible

reference = bible.NormalizedReference(bible.Book.GENESIS, 1, 1, 1, 31)
verse_ids = bible.convert_reference_to_verse_ids(reference)
for verse_id in verse_ids:
    verse = bible.get_verse_text(verse_id, version=bible.Version.NEW_AMERICAN_STANDARD)
    print(verse)

That throws the following error...

$ python bible.py
Traceback (most recent call last):
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/bible/bibles.py", line 54, in get_bible
    return BIBLES[version][bible_type]
           ~~~~~~^^^^^^^^^
KeyError: <Version.NEW_AMERICAN_STANDARD: 'NASB'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mpenning/bible.py", line 7, in <module>
    verse = bible.get_verse_text(verse_id, version=bible.Version.NEW_AMERICAN_STANDARD)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/formatter.py", line 425, in get_verse_text
    bible = get_bible(version, "plain_text_readers")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/virtualenv/py311_test/lib/python3.11/site-packages/pythonbible/bible/bibles.py", line 56, in get_bible
    raise MissingVerseFileError from error
pythonbible.errors.MissingVerseFileError
(py311_test) mpenning@mudslide:~$

If there is some reason that the verse for that version cannot be retrieved, it's better to make that clear (i.e. You must do foo to support bible.Version.NEW_AMERICAN_STANDARD). FWIW, you can retrieve quite a bit of the bible for free via biblegateway

Problems to understand some abreviations

Hi,

I made a new version of books in a fork to support portuguese/spanish.

I tested one by one, but i have a problem with only 5 books:

- Search for Philippians does not work (fp 1)
- The search for Leviticus does not work (lv 1)
- The search for Obadiah does not work (ob 1)
- The search for Lucas does not work (lc 1)
- The search for Colossians does not work (cl 1)

For example (above) it understand "lev 1" but not "lv 1", the same for others. I tested with LEVITICUS = 3, "Leviticus", r"(Lev|Lv)", ("Lev", "Lv") but it don't work too.

I don't know what im doing wrong. Can you point me what is wrong in regexp for this five cases?

from __future__ import annotations

from enum import Enum
from typing import Any
from typing import Type


def _build_book_regular_expression(
    book: str,
    prefix: str | None = None,
    suffix: str | None = None,
) -> str:
    return _add_suffix(_add_prefix(book, prefix), suffix)


def _add_prefix(regex: str, prefix: str | None = None) -> str:
    return regex if prefix is None else rf"(?:{prefix})(?:\s)?{regex}"


def _add_suffix(regex: str, suffix: str | None = None) -> str:
    return regex if suffix is None else rf"{regex}(?:\s*{suffix})?"


_SAMUEL_REGULAR_EXPRESSION = r"(Samuel|Sam\.*|Sa\.*|Sm\.*)"
_KINGS_REGULAR_EXPRESSION = r"(Kings|Kgs\.*|Kin\.*|Ki\.*|Reyes|Reis|Re\.*|Rs\.*)"
_CHRONICLES_REGULAR_EXPRESSION = r"(Chronicles|Chron\.*|Chro\.*|Chr\.*|Crónicas|Crônicas|Cr\.*)"
_JOHN_REGULAR_EXPRESSION = r"(John|Joh\.*|Jhn\.*|Jo\.*(?!shua|b|nah|el)|Jn\.*|Juan|João|Jn\.*)"
_CORINTHIANS_REGULAR_EXPRESSION = r"(Corinthians|Corintios|Coríntios|Co\.*)"
_THESSALONIANS_REGULAR_EXPRESSION = r"(Thessalonians|Tesalonicenses|Tessalonicenses|Th\.*|Ts\.*)"
_TIMOTHY_REGULAR_EXPRESSION = r"(Timothy|Timoteo|Timóteo|Ti\.*|Tm\.*)"
_PETER_REGULAR_EXPRESSION = r"(Peter|Pedro|Pe\.*|Pt\.*)"

_MACCABEES_REGULAR_EXPRESSION = r"(Maccabees|Macabeos|Macabeus|Ma\.*|M\.*)"

_FIRST = r"1|I\s+|1st\s+|First\s+|Primero\s+|Primeiro\s+|1\s+"
_SECOND = r"2|II|2nd\s+|Second\s+|Segundo\s+|2\s+"
_THIRD = r"3|III|3rd\s+|Third\s+|Tercero\s+|Terceiro\s+|3\s+"

_FIRST_BOOK = rf"{_FIRST}|(First\s+Book\s+of(?:\s+the)?)"
_SECOND_BOOK = rf"{_SECOND}|(Second\s+Book\s+of(?:\s+the)?)"

_EPISTLE_OF_PAUL_TO = r"Epistle\s+of\s+Paul\s+(?:the\s+Apostle\s+)?to(?:\s+the)?"
_GENERAL_EPISTLE_OF = r"(?:General\s+)?Epistle\s+(?:General\s+)?of"

_FIRST_PAUL_EPISTLE = rf"{_FIRST}|(First\s+{_EPISTLE_OF_PAUL_TO})"
_SECOND_PAUL_EPISTLE = rf"{_SECOND}|(Second\s+{_EPISTLE_OF_PAUL_TO})"

_FIRST_GENERAL_EPISTLE = rf"{_FIRST}|(First\s+{_GENERAL_EPISTLE_OF})"
_SECOND_GENERAL_EPISTLE = rf"{_SECOND}|(Second\s+{_GENERAL_EPISTLE_OF})"
_THIRD_GENERAL_EPISTLE = rf"{_THIRD}|(Third\s+{_GENERAL_EPISTLE_OF})"


class Book(Enum):
    """Book is an Enum that contains all the books of the Bible.

    :param name: the unique text identifier of the book
    :type name: str
    :param value: the unique numerical identifier of the book
    :type value: int
    :param title: the common English name of the book
    :type title: str
    :param regular_expression: the regular expression for the book
    :type regular_expression: str
    :param abbreviations: the allowed title abbreviations for the book
    :type abbreviations: tuple[str, ...]
    """

    def __new__(
        cls: Type[Book],
        *args: dict[str, Any],
        **kwargs: dict[str, Any],
    ) -> Book:
        obj: Book = object.__new__(cls)
        obj._value_ = args[0]
        return obj

    def __init__(
        self: Book,
        _: int,
        title: str,
        regular_expression: str,
        abbreviations: tuple[str, ...],
    ) -> None:
        """Set the title and regular_expression properties."""
        self._title_ = title
        self._regular_expression_ = regular_expression
        self._abbreviations_ = abbreviations

    @property
    def title(self: Book) -> str:
        return self._title_

    @property
    def regular_expression(self: Book) -> str:
        return self._regular_expression_

    @property
    def abbreviations(self: Book) -> tuple[str, ...]:
        return self._abbreviations_

    GENESIS = 1, "Genesis", r"(Gen\.*(?:esis)?|Gén\.*(?:esis)?|Gên\.*(?:esis)?|Gn\.*)", ("Gen", "Gn")
    EXODUS = 2, "Exodus", r"(Exo\.*(?:d\.*)?(?:us)?|Éxo\.*(?:do)?|Êxo\.*(?:do)?|Éx|Êx|Ex\.*)", ("Exo", "Exod", "Éx", "Êx", "Ex")
    LEVITICUS = 3, "Leviticus", r"(Lev\.*(?:iticus)?|Lev\.*(?:ítico)?|Lv\.*)", ("Lev", "Lv")
    NUMBERS = 4, "Numbers", r"(Num\.*(?:bers)?|Num\.*(?:eros)?|Núm\.*|Nm\.*)", ("Num", "Núm", "Nm")
    DEUTERONOMY = 5, "Deuteronomy", r"(Deu\.*(?:t\.*)?(?:eronomy)?|Deu\.*(?:teronomio)?|Dt\.*)", ("Deu", "Deut", "Dt")
    JOSHUA = 6, "Joshua", r"(Joshua|Josh\.*|Jos\.*|Jsh\.*|Josué|Js\.*)", ("Jos", "Jsh", "Josh", "Js")
    JUDGES = 7, "Judges", r"(Judges|Judg\.*|Jdgs\.*|Jdg\.*|Jueces|Jue\.*|Jz\.*)", ("Jdg", "Jdgs", "Judg", "Jue", "Jz")
    RUTH = 8, "Ruth", r"(Ruth|Rut\.*|Rth\.*|Rt\.*)", ("Rth", "Rut", "Rt")
    SAMUEL_1 = (
        9,
        "1 Samuel",
        _build_book_regular_expression(
            _SAMUEL_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
            suffix=r"Otherwise\s+Called\s+The\s+First\s+Book\s+of\s+the\s+Kings",
        ),
        ("Sa", "Sam", "Sm", "1Sm"),
    )
    SAMUEL_2 = (
        10,
        "2 Samuel",
        _build_book_regular_expression(
            _SAMUEL_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
            suffix=r"Otherwise\s+Called\s+The\s+Second\s+Book\s+of\s+the\s+Kings",
        ),
        ("Sa", "Sam", "Sm", "2Sm"),
    )
    KINGS_1 = (
        11,
        "1 Kings",
        _build_book_regular_expression(
            _KINGS_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
            suffix=r"\,\s+Commonly\s+Called\s+the\s+Third\s+Book\s+of\s+the\s+Kings",
        ),
        ("Re", "Rey", "Reis", "Reyes", "Kgs", "Kin", "Ki", "1Rs"),
    )
    KINGS_2 = (
        12,
        "2 Kings",
        _build_book_regular_expression(
            _KINGS_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
            suffix=r"\,\s+Commonly\s+Called\s+the\s+Fourth\s+Book\s+of\s+the\s+Kings",
        ),
        ("Re", "Rey", "Reis", "Reyes", "Kgs", "Kin", "Ki", "2Rs"),
    )
    CHRONICLES_1 = (
        13,
        "1 Chronicles",
        _build_book_regular_expression(
            _CHRONICLES_REGULAR_EXPRESSION,
            prefix=_FIRST_BOOK,
        ),
        ("Cr", "Crón", "Crôn", "Chron", "Chro", "Chr", "1Cr"),
    )
    CHRONICLES_2 = (
        14,
        "2 Chronicles",
        _build_book_regular_expression(
            _CHRONICLES_REGULAR_EXPRESSION,
            prefix=_SECOND_BOOK,
        ),
        ("Cr", "Crón", "Crôn", "Chron", "Chro", "Chr", "2Cr"),
    )
    EZRA = 15, "Ezra", r"(Ezr\.*(?:a)?|Esd\.*|Esdras|Ed\.*)", ("Ezr", "Esd", "Ed")
    NEHEMIAH = 16, "Nehemiah", r"(Neh\.*(?:emiah)?|Ne\.*|Neemias|Ne\.*)", ("Neh", "Ne")
    ESTHER = 17, "Esther", r"(Est\.*(?:h\.*)?(?:er)?|Est\.*|Ester|Et\.*)", ("Est", "Esth", "Et")
    JOB = 18, "Job", r"(Job|Jb\.*|Jó\.*)", ("Job", "Jb", "Jó")
    PSALMS = (
        19,
        "Psalms",
        r"(Psalms|Psalm|Pslm\.*|Psa\.*|Psm\.*|Pss\.*|Ps\.*|Salmos|Sal\.*|Sl\.*)",
        ("Ps", "Psa", "Pslm", "Psm", "Pss", "Sal", "Sl"),
    )
    PROVERBS = (
        20,
        "Proverbs",
        r"(Proverbs|Prov\.*|Pro\.*|Prv\.*|Proverbios|Prov\.*|Provérbios|Pv\.*)",
        ("Pro", "Prov", "Prv", "Pv"),
    )
    ECCLESIASTES = (
        21,
        "Ecclesiastes",
        r"(Ecclesiastes(?:\s+or\,\s+the\s+Preacher)?|Eclesiastés|Eclesiastes"
        r"|Eccles\.*(?!iasticus?)|Ecles\.*"
        r"|Eccle\.*(?!siasticus?)|Ecle\.*"
        r"|Eccl\.*(?!esiasticus?)(?!us?)|Ecl\.*"
        r"|Ecc\.*(?!lesiasticus?)(?!lus?)|Ec\.*|Ecl\.*|Qoh\.*)",
        ("Ec", "Ecc", "Eccl", "Eccle", "Eccles", "Ecl", "Ecle", "Ecles", "Qoh"),
    )
    SONG_OF_SONGS = (
        22,
        "Song of Songs",
        r"(Song(?: of (Solomon|Songs|Sol\.*))?|Cantar de los Cantares|Cânticos|Cantares|Ct\.*)"
        r"|Canticles|(Canticle(?: of Canticles)?)|SOS|Cant",
        ("Cant", "Canticle", "Canticles", "Song", "Song of Sol", "SOS", "Ct"),
    )
    ISAIAH = 23, "Isaiah", r"(Isa\.*(?:iah)?|Isaias|Isa\.*|Is\.*)", ("Isa", "Is")
    JEREMIAH = 24, "Jeremiah", r"(Jer\.*(?:emiah)?|Jeremias|Jer\.*|Je\.*|Jr\.*)", ("Jer", "Je", "Jr")
    LAMENTATIONS = (
        25,
        "Lamentations",
        _build_book_regular_expression(
            r"(Lam\.*(?:entations)?|Lamentaciones|Lamentações|Lam\.*|Lm\.*|Lá\.*)",
            suffix=r"of\s+Jeremiah",
        ),
        ("Lam", "Lm", "Lá"),
    )
    EZEKIEL = 26, "Ezekiel", r"(Ezekiel|Ezequiel|Eze\.*|Ezq\.*|Ezk\.*|Ez\.*)", ("Eze", "Ezq", "Ezk", "Ez")
    DANIEL = 27, "Daniel", r"(Dan\.*(?:iel)?|Dan\.*|Dn\.*)", ("Dan", "Dn")
    HOSEA = 28, "Hosea", r"(Hos\.*(?:ea)?|Oseas|Os\.*|O\.*)", ("Hos", "Os", "O")
    JOEL = 29, "Joel", r"(Joe\.*(?:l)?|Joel|Jl\.*)", ("Joe", "Jl")
    AMOS = 30, "Amos", r"(Amo\.*(?:s)?|Amós|Am\.*)", ("Amo", "Am")
    OBADIAH = 31, "Obadiah", r"(Oba\.*(?:d\.*(?:iah)?)?|Abdías|Obd\.*|Abd\.*|Ob\.*|Ab\.*)", ("Oba", "Obd", "Abd", "Ob", "Ab")
    JONAH = 32, "Jonah", r"(Jonah|Jon\.*|Jnh\.*|Jonás|Jn\.*|Jnh\.*)", ("Jnh", "Jon", "Jn")
    MICAH = 33, "Micah", r"(Mic\.*(?:ah)?|Miqueas|Mi\.*|Mq\.*)", ("Mic", "Mi", "Mq")
    NAHUM = 34, "Nahum", r"(?<!Jo)(Nah\.*(?:um)?|Nahúm|Na\.*)", ("Nah", "Na")
    HABAKKUK = 35, "Habakkuk", r"(Hab\.*(?:akkuk)?|Habacuc|Hab\.*|Hb\.*|Hc\.*)", ("Hab", "Hb", "Hc")
    ZEPHANIAH = 36, "Zephaniah", r"(Zep\.*(?:h\.*(?:aniah)?)?|Sofonías|Zefanias|Sof\.*|Zef\.*|Sf\.*|Zp\.*)", ("Zep", "Sof", "Zef", "Sf", "Zp")
    HAGGAI = 37, "Haggai", r"(Hag\.*(?:gai)?|Ageo|Ag\.*|Hg\.*)", ("Hag", "Ag", "Hg")
    ZECHARIAH = 38, "Zechariah", r"(Zec\.*(?:h\.*(?:ariah)?)?|Zacarías|Zacarias|Zac\.*|Zc\.*)", ("Zec", "Zac", "Zc")
    MALACHI = 39, "Malachi", r"(Mal\.*(?:achi)?|Malaquías|Malaquias|Mal\.*|Ml\.*)", ("Mal", "Ml")
    MATTHEW = 40, "Matthew", r"(Mat\.*(?:t\.*(?:hew)?)?|Mateo|Mat\.*|Mt\.*)", ("Mat", "Matt", "Mt")
    MARK = 41, "Mark", r"(Mark|Mar\.*|Mrk\.*|Marcos|Mr\.*|Mc\.*)", ("Mar", "Mrk", "Mr", "Mc")
    LUKE = 42, "Luke", r"(Luk\.*(?:e)?|Lucas|Luc\.*|Lc\.*)", ("Luk", "Luc", "Lc")
    JOHN = (
        43,
        "John",
        rf"(?<!(?:1|2|3|I)\s)(?<!(?:1|2|3|I)){_JOHN_REGULAR_EXPRESSION}",
        ("Jhn", "Jn", "Jo", "Joh"),
    )
    ACTS = (
        44,
        "Acts",
        _build_book_regular_expression(
            r"(Act\.*(?:s)?|Hechos|Atos|Act\.*|He\.*|At\.*)",
            suffix="of the Apostles",
        ),
        ("Act", "He", "At"),
    )
    ROMANS = 45, "Romans", r"(Rom\.*(?:ans)?|Romanos|Rom\.*|Rm\.*)", ("Rom", "Rm")
    CORINTHIANS_1 = (
        46,
        "1 Corinthians",
        _build_book_regular_expression(
            _CORINTHIANS_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Co", "Cor", "1Co"),
    )
    CORINTHIANS_2 = (
        47,
        "2 Corinthians",
        _build_book_regular_expression(
            _CORINTHIANS_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Co", "Cor", "2Co"),
    )
    GALATIANS = 48, "Galatians", r"(Gal\.*(?:atians)?|Gálatas|Gal\.*|Gl\.*)", ("Gal", "Gl")
    EPHESIANS = 49, "Ephesians", r"(?<!Z)(Eph\.*(?:es\.*(?:ians)?)?|Efesios|Efésios|Efe\.*|Ef\.*)", ("Eph", "Ephes", "Efe", "Ef")
    PHILIPPIANS = (
        50,
        "Philippians",
        r"(Ph(?:(p\.*)|(?:il\.*(?!e\.*(?:m\.*(?:on)?)?)(?:ippians)?))|Filipenses|Flp\.*|Fp\.*)",
        ("Php", "Phil", "Flp", "Fp"),
    )
    COLOSSIANS = 51, "Colossians", r"(Col\.*(?:ossians)?|Colosenses|Colossenses|Col\.*|Cl\.*)", ("Col", "Cl")
    THESSALONIANS_1 = (
        52,
        "1 Thessalonians",
        _build_book_regular_expression(
            _THESSALONIANS_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Th", "Thes", "Thess", "Ths", "1Ts"),
    )
    THESSALONIANS_2 = (
        53,
        "2 Thessalonians",
        _build_book_regular_expression(
            _THESSALONIANS_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Th", "Thes", "Thess", "Ths", "2Ts"),
    )
    TIMOTHY_1 = (
        54,
        "1 Timothy",
        _build_book_regular_expression(
            _TIMOTHY_REGULAR_EXPRESSION,
            prefix=_FIRST_PAUL_EPISTLE,
        ),
        ("Ti", "Tim", "1Tm"),
    )
    TIMOTHY_2 = (
        55,
        "2 Timothy",
        _build_book_regular_expression(
            _TIMOTHY_REGULAR_EXPRESSION,
            prefix=_SECOND_PAUL_EPISTLE,
        ),
        ("Ti", "Tim", "2Tm"),
    )
    TITUS = 56, "Titus", r"(Tit\.*(?:us)?|Tito|Tit\.*|Tt\.*)", ("Tit", "Tt")
    PHILEMON = (
        57,
        "Philemon",
        r"(Philemon|Philem\.*|Phile\.*|Phlm\.*|Phi\.*(?!l)|Phm\.*|Filemón|Filemon|Flm\.*|Fm\.*)",
        ("Phi", "Phile", "Philem", "Phlm", "Phm", "Flm", "Fm"),
    )
    HEBREWS = 58, "Hebrews", r"(Heb\.*(?:rews)?|Hebreos|Hebreus|Heb\.*|Hb\.*)", ("Heb", "Hb")
    JAMES = 59, "James", r"(Ja(?:me)?s\.*|Santiago|Tiago|San\.*|Stg\.*|Tg\.*)", ("Jas", "San", "Stg", "Tg")
    PETER_1 = (
        60,
        "1 Peter",
        _build_book_regular_expression(
            _PETER_REGULAR_EXPRESSION,
            prefix=_FIRST_GENERAL_EPISTLE,
        ),
        ("Pe", "Pet", "Pt", "1Pe"),
    )
    PETER_2 = (
        61,
        "2 Peter",
        _build_book_regular_expression(
            _PETER_REGULAR_EXPRESSION,
            prefix=_SECOND_GENERAL_EPISTLE,
        ),
        ("Pe", "Pet", "Pt", "2Pe"),
    )
    JOHN_1 = (
        62,
        "1 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_FIRST_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "1Jo"),
    )
    JOHN_2 = (
        63,
        "2 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_SECOND_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "2Jo"),
    )
    JOHN_3 = (
        64,
        "3 John",
        _build_book_regular_expression(
            _JOHN_REGULAR_EXPRESSION,
            prefix=_THIRD_GENERAL_EPISTLE,
        ),
        ("Jhn", "Jn", "Jo", "Joh", "3Jo"),
    )
    JUDE = 65, "Jude", r"(Jud\.*(:?e)?(?!ges)|Judas|Jd\.*)", ("Jud", "Jd")
    REVELATION = (
        66,
        "Revelation",
        _build_book_regular_expression(
            r"(Rev\.*(?:elation)?|Apocalipsis|Apocalipse|Rev\.*|Ap\.*)",
            suffix="of ((Jesus Christ)|John|(St. John the Divine))",
        ),
        ("Rev", "Ap"),
    )
    ESDRAS_1 = (
        67,
        "1 Esdras",
        _build_book_regular_expression(
            r"(Esdras|Esdr\.*|Esd\.*|Es\.*)",
            _FIRST,
        ),
        ("Es", "Esd", "Esdr"),
    )
    TOBIT = 68, "Tobit", r"(Tobit|Tob\.*|Tb\.*|Tobías|Tobias|Tb\.*)", ("Tb", "Tob")
    WISDOM_OF_SOLOMON = (
        69,
        "Wisdom of Solomon",
        r"(Wisdom of Solomon|Wisdom|Sabiduría|Sabedoria|Wisd\.* of Sol\.*|Wis\.*|(?<!Hebre)Ws\.*)",
        ("Wis", "Wisd of Sol", "Ws", "Sab", "Sb"),
    )
    ECCLESIASTICUS = (
        70,
        "Ecclesiasticus",
        r"(Sirach|Sir\.*|Eclesiástico|Eclesiástico|Ecclesiasticus|Ecclus\.*)",
        ("Ecclus", "Sir", "Eclo", "Ecl"),
    )
    MACCABEES_1 = (
        71,
        "1 Maccabees",
        _build_book_regular_expression(
            _MACCABEES_REGULAR_EXPRESSION,
            _FIRST,
        ),
        ("M", "Ma", "Mac", "Macc"),
    )
    MACCABEES_2 = (
        72,
        "2 Maccabees",
        _build_book_regular_expression(
            _MACCABEES_REGULAR_EXPRESSION,
            _SECOND,
        ),
        ("M", "Ma", "Mac", "Macc"),
    )

Thanks.

Problem in compound verses with prefix

These works

bible.get_references('Ezra 1:1-Ezra 2:1')
bible.get_references('1 Kings 1:1-3')

But these return wrong references or fail to return references

bible.get_references("1 Kings 1:1-Kings 1:10")
bible.get_references("1 Chronicles 1:5-1 Chronicles 1:7")

My hunch is the 2nd "1" in "1 Chronicles" is getting picked up like I had tried to find "1 Chronicles 1:5-1"

A few potential test fails?

Thanks again for this :)

I have a couple strings that produce errors/unexpected results. Are any of these valid as is?

import pythonbible as bible


tests = [
'Or, Micah. 2Ch. 34:20', # probably fails because of "Micah."
'Or, of. Psalm. 46, title.', # unsure
'Or, A psalm for Asaph to give instruction. Psalms. 74, title.', # unsure
'Or, anathema. 1 Corinthians 1Co. 16:22', # misses 2nd ref, because it things the 1 on "1Co." goes with the first ref? maybe ok?
'Or, loving to the brethren. 1Peter 1Pe. 1:22', # same as above
'1Peter. 1:22' # no ref found here. if there is a dot after a full book name, we don't get matches.
]

for text in tests:
    try:
        references = bible.get_references(text)

        print(references)
    except BaseException as e:
        print(e)

How to return range os verses instead of single verse?

Hi,

When i parse "Obadiah 1", it return "31001001" and not the list with full chapter verses.

There is some option to return full list?

Thanks.

Error when getting KJV verse text for 41009038 (Mark 9:38)

When attempting to get the KJV verse text for Mark 9:38 (verse_id = 41009038).

import pythonbible as bible
parser = bible.get_parser(version=bible.Version.KING_JAMES)
verse_text = parser.get_verse_text(41009038)

Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 101, in get_verse_text
    paragraphs = _get_paragraphs(self.tree, self.namespaces, [verse_id], **kwargs)
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 132, in _get_paragraphs
    paragraph_element, verse_ids, current_verse_id, **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 184, in _get_paragraph_from_element
    **kwargs,
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 242, in _handle_child_element
    return paragraph, skip_till_next_verse, new_current_verse_id
UnboundLocalError: local variable 'new_current_verse_id' referenced before assignment

A way to use without load bibles

Hi,

I want use only the parser part of this library but instead of ti, it is loading all the bible data and my server is crashing because of memory amount:

avinuteologia-1   |     import pythonbible as pb
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/__init__.py", line 30, in <module>
avinuteologia-1   |     from .formatter import format_scripture_references
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/formatter.py", line 8, in <module>
avinuteologia-1   |     from pythonbible.bible.bibles import get_bible
avinuteologia-1   |   File "/usr/local/lib/python3.10/site-packages/pythonbible/bible/bibles.py", line 5, in <module>
avinuteologia-1   |     import pythonbible.bible.asv.html as asv_html
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
avinuteologia-1   |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 879, in exec_module
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 975, in get_code
avinuteologia-1   |   File "<frozen importlib._bootstrap_external>", line 1074, in get_data
avinuteologia-1   | MemoryError

Good job :)

BUG: get_verse_text(67001009) gives wrong value

1 Esdras 1:9 has verse ID 67001009.

Calling get_verse_text(67001009) gives the entire text of the hebrew bible and new testament, instead of the verse in question. This appears to be true regardless of the value of the version argument.

It also appears to be true for subsequent verses, e.g. 67001010, etc.

I am using pythonbible==0.11.0.

When getting the references for a string containing multiple references separated by commas, if the reference has a prefix, the prefix could be mistaken for a chapter/verse.

For example, the string:

"1 Corinthians 1:1, 2 Corinthians 1:1"

Should return the following two references when getting the references:

[
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    ),
    NormalizedReference(
        book=<Book.CORINTHIANS_2: 47>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    )
]

However, the "2" in 2 Corinthians is interpreted as a verse rather than part of the book title, and the following references are returned:

[
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=1,
        end_chapter=1,
        end_verse=1,
        end_book=None
    ),
    NormalizedReference(
        book=<Book.CORINTHIANS_1: 46>,
        start_chapter=1,
        start_verse=2,
        end_chapter=1,
        end_verse=2,
        end_book=None
    )
]

Potential issue with duplicated verses in references

Thanks again for this, I'm getting a lot of use :)

I found another potential issue. When a set of text has repeating references, they are grouped by book, but it seems not by chapter or verse. Meaning that the formatted output has duplicate verses. Here's an example, notice the refs for John and Hebrews for dupe verses, and Genesis for dupe chapter. I added a dupe remove on the verse id's to get the output I was expecting:

import pythonbible as bible

text ='Jeremiah 10:11-12;John 1:1;Hebrews 1:8-12;Genesis 1:1,2:4,2:7;Malachi 3:18;John 1:1;Psalms 33:6,9,136:5;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-10,11:3'

references = bible.get_references(text)
formatted = bible.format_scripture_references(references)

print(formatted)

# list and set to remove dups
verse_ids = list(set(bible.convert_references_to_verse_ids(references)))

new_references = bible.convert_verse_ids_to_references(verse_ids)
formatted_2 = bible.format_scripture_references(new_references)

print(formatted_2)

output:

# initial output
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1,1,1-3;Colossians 1:16-17;Hebrews 1:8,8-9,9-10,10-12,11:3

# with dupes manually removed list(set(...)) on verse_ids
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-12,11:3

Missing the first few words of some verses in ASV

When getting the verse text for Exodus 20:3 in the ASV version, the first two words are missing.

import pythonbible as bible
references = bible.get_references("Exodus 20:3")
verse_ids = bible.convert_references_to_verse_ids(references)
kjv_parser = bible.get_parser(version=bible.Version.KING_JAMES)
kjv_verse_text = kjv_parser.get_verse_text(verse_ids[0])

The verse text for KJV looks right:

'3. Thou shalt have no other gods before me.'

But, when I get the verse text for ASV:

asv_parser = bible.get_parser(version=bible.Version.AMERICAN_STANDARD)
asv_verse_text = asv_parser.get_verse_text(verse_ids[0])

The ASV verse text is missing the "Thou shalt", but it is in the XML file.

'3. have no other gods before me.'

Support for <note><rdg></rdg></note> tags (optional verses? especially in the ASV version)?

For example:

<verse osisID="Matt.17.21" sID="Matt.17.21.seID.24668" n="21" />
<note type="translation" osisRef="Matt.17.21" osisID="Matt.17.21!note.1" placement="foot">
<reference type="source" osisRef="Matt.17.21">17:21
</reference>Many authorities, some ancient, insert v. 21.
<rdg>But this kind goeth not out save by prayer and fasting.
</rdg>See Mrk 9:29.</note>
<verse eID="Matt.17.21.seID.24668" /></p><p>

Can't get references for an entire book of the Bible

I can't get the normalized references for an entire book of the bible. For example:

import pythonbible as bible
references = bible.get_references("Genesis")

Raises the following error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 23, in get_references
    references.extend(normalize_reference(match[0]))
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 49, in normalize_reference
    sub_reference, book, start_chapter
  File "C:\projects-git\python-bible\pythonbible\parser.py", line 82, in _process_sub_reference
    start_chapter = int(min_chapter_and_verse[0].strip())
ValueError: invalid literal for int() with base 10: ''

I would expect it to return:

[(<Book.GENESIS: 1>, 1, 1, 50, 26)]

pythonbible package exceed memory(Process running mem=660M(129.0%)) usage on Heroku server

Process running mem=660M(129.0%) when using this package on heroku

Error when getting Genesis 1 with one verse per paragraph

The following code raises a ValueError:

import pythonbible as bible
parser = bible.get_parser()
references = bible.get_references("Genesis 1")
verse_ids = bible.convert_references_to_verse_ids(references)
passage = parser.get_scripture_passage_text(verse_ids)
passage2 = parser.get_scripture_passage_text(verse_ids, one_verse_per_paragraph=True)

This is the error I'm getting:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 84, in get_scripture_passage_text
    paragraphs = _get_paragraphs(self.tree, self.namespaces, verse_ids, **kwargs)
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 139, in _get_paragraphs
    tree, namespaces, verse_ids[current_verse_index:], **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 132, in _get_paragraphs
    paragraph_element, verse_ids, current_verse_id, **kwargs
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 170, in _get_paragraph_from_element
    child_element, verse_ids, new_current_verse_id
  File "C:\projects-git\python-bible\pythonbible\bible\osis\parser.py", line 295, in _is_next_verse
    book_id, chapter, verse = child_element.get("osisID").split(".")
ValueError: too many values to unpack (expected 3)

get_referances() is failing to detect

Working on a project and been having issues with the function properly getting the scripture.

i.e. "Exodus 5:26" will not return a NormalizedReference

import pythonbible as bible
bible.get_references('Exodus 5:26')  # -> []
bible.get_references('Exodus')  # -> [NormalizedReference(book=<Book.EXODUS: 2>, start_chapter=1, start_verse=1, end_chapter=40, end_verse=38, end_book=None)]
]

fuzzy searches, get_references for messy ASR

Machine Generated ASR programs like open-Ai's Whisper are on the rise and tend to output messy formatting of scripture, with difficulties in consistent int/ordinals/words for book/chapter/verse numbers, spans, and have varying capitalizations problems, etc.

Here are a handful of examples lines from webvtt/srt outputs from a batch I've run recently:

Second Timothy chapter two verses three and four says endure hardship
If you read Ephesians four 17 through 32 all the ammunition
remember that powerful message of Paul in first Corinthians nine
in Jesus's first sermonic presentation on planet earth in Matthew five through seven,
Jesus said over in Matthew chapter six, verse number 12,
Genesis four, 25.
and forth between Haggai two and Ezra three.
and go and report to John one-fifteen and thirty.
I want to focus on here is Colossians chapter three, 22 through verses through chapter four, verse one.
In 1 Corinthians 9.22, you see Paul saying
says in Mark 16 10 that the disciples were
through that fire, 1 Kings 18.24-38, 1 Chronicles 21.26, 2 Chronicles 7.1-3.
open their Bibles to first Corinthians 14, 34, 35 and say, look
Genesis 1, 26, 2, 7, and 21, 22.
look in Revelations 21, 1 through 7, you can start reading all about
Psalms 103.12 says
for one another Galatians 6 1 & 2 clearly gives us

It will take a post-processing step to clean this sort of data up for nearly anyone using these tools seriously and while feeding the inputs into an LLM or NLP tookit may make sense, it would be swell if a library like this one could do some of the heavy lifting to normalize scripture referenced in a string. Tall order/deep rabbit hole, I understand, but worth a shot.

Suggest a reformat_fuzzy_references that returns (attempts) a reformatted input_string with even a subset of the most common speech patterns into a normalized form. Bonus points if the user can have some configuration control on output styles, e.g. omit "chapter" or use "v./vv."

Assumed gotchas:

Strings may contain other semi-formatted numbers that a simple regex search may false flag upon:
- I was just in class at 8.30 with my friend Wilson
- We're going to talk at 3.30 this afternoon about the discipline of grace and there is
- So in Acts chapter 2, 3,000 were saved.

Roman Numeral Chapter Numbers

I was looking through a book of sermons and noticed that all of the chapter numbers in the scripture references where Roman numerals. In the interest of being able to easily parse scanned text for scripture references, we probably ought to update our regular expressions to find references with chapter numbers in Roman numeral form and update the normalize function to convert it into the appropriate integer value.

These references also do not contain the colon (probably since it is unnecessary with the chapter numbers and verse numbers being in a different format) but rather a period.

For example:

Matthew xvii. 19-21
Isa. iii. 10, 11
Jeremiah xlviii. 11, 12
1 John v. 10
2 Kings vii. 2

avendesora / pythonbible Goto Github PK

pythonbible's Introduction

Hello there 👋

pythonbible's People

Contributors

Stargazers

Watchers

Forkers

pythonbible's Issues

Recommend Projects

Recommend Topics

Recommend Org