Coder Social home page Coder Social logo

sfm-utils's Introduction

sfm-utils

Utilities to parse book translations in SFM text files (.txt, .rtf, .sfm) into JSON objects, and then write out the books into SFM suitable for Paratext or .tsv. When directories are processed, each book input is written to individual .sfm files.

Assumptions:

  • Each text file is for a single chapter of a book
  • Each directory of files is for a single book
  • Text filenames consist of a bookname separated by space/underscore with the chapter number (e.g. Judges_19...txt)

Usage

Note for developers: Replace sfm-utils.exe references with node dist/index.js.

Command-line

Usage: sfm-utils.exe -p p_arg [-f f_arg | -t t_arg | -d d_arg | -j j_arg | -s s_arg]

Parameters

    Required
    -p [Paratext project name (can be 3-character abbreviation)]

    Optional for processing txt or sfm files - one of:
    -f [A single SFM file (can be an entire book)]
    -t [A single Toolbox text file (one chapter of a book)]
    -d [Directory of Toolbox text files for a single book (one chapter per file)]
    -j [JSON file representing a single book - used for testing conversion to SFM]
    -s [Directory of directories (each subdirectory is a separate book)]

    Optional for processing rich text (rtf) files - one of:
    -b  [A single rtf text file (one chapter of a book)]
    -bd [Directory of rtf text files for a single book (one chapter per file)]
    -bs [Directory of directories (each subdirectory is a separate book)]

Help

For additional help:

sfm-utils.exe -h

Developer Setup

These utilities require Git, Node.js, and TypeScript (installed locally). Back translations in .rtf text files will also need UnRTF installed for converting the Rich Text format (only works on Linux).

Install Git

Download and install Git

https://git-scm.com/downloads

Install Node.js and Dependencies

Download and install the latest current version for Node.js (>=18.12.0)

https://nodejs.org/en/download/current/

After installing Node.js, reboot your PC, open Git Bash to this directory and install this project's dependencies:

npm install

This will install TypeScript locally and can be accessed with

npx tsc

Install UnRTF for .rtf Files

This is needed if the source files are .rtf Rich Text Format, and currently only works on Linux. Download at https://www.gnu.org/software/unrtf/#downloading

or on command line:

sudo apt install unrtf

Compiling sfm-utils

This compiles the TypeScript source files in src/ into Javascript (dist/)

To rebuild the project

npm run build

To watch the project and recompile automatically

npm run watch

Note: Our .vscode > tasks.json comes with "runOptions" of npm run watch set to "folderOpen", which means that by default, Visual Studio code always calls npm run watch upon startup in the sfm-utils project, and developers don't have to compile manually.

Debugging with Visual Studio Code

Open Folder as a VS Code Project

Edit your applicable parameters in launch.json. If using Windows paths, you'll need to escape the slashes (e.g. -j "C:\\somewhere\\to\\text-or-json-files")

Publishing sfm-utils.exe

This optional step creates a standalone Windows executable sfm-utils.exe so it can be run without Node.js. Published artifacts will be in the deploy/ directory.

npm run publish

Unit Tests (TODO)

Unit tests are run with the AVA test runner. Remember to build sfm-utils before running tests. Terminal output best viewed in VS Code.

npm run test

License

Copyright (c) 2022-2023 SIL International. All rights reserved. Licensed under the MIT license.

sfm-utils's People

Contributors

darcywong00 avatar laineyhm avatar

Watchers

Cambell avatar  avatar

Forkers

darcywong00

sfm-utils's Issues

bug: certain marker transitions

Just noting some of the bugs seen in the source text with the following marker combinations.
At this point, it's faster to fix the source text than debug/update this project

Case 1

Verse bridges currently are expected to go from something like \vs 14-15a
But some texts go

\tx
\vs 14b
\tx
\vs 14c-15a
\tx
\15b-16a

so the current regex can't bridge 14-15-16

Case 2

section header splitting a verse

\tx
\vs 16a
\tx
\vs (section title)
\tx
\vs 16b

The current state machine will start vs 16. then make a section title \ s2, then it doesn't handle \vs 16b correctly

Case 3

Verse bridges are expected to be marked in one of these:

\vs (7-8)a
\vs [7-8]a

so the regex currently doesn't handle:

\vs 7-8 (a)

Likely won't fix these...

feat: Request to write output log as an "Extra Book"

As we iterate over the latest set of txt files, we've been copying the console log to an output file. It contains FYI about when a chapter has an unexpected number of verses.

(edited)
@sdysart requests

I want to include the output file as an 'extra book' so the content isn't lost. I've edited the output txt file so that it will work with Paratext, and then I imported it into Paratext.

Basically I

  • added \id XXA - [Project Name] as the first line. This tells Paratext to make an extra book 'A'. Paratext will ignore all text above the \id line so this goes first
  • added \rem to the beginning of every other line.
    The end user would have to know that they shouldn't edit this 'Extra Book' as any added information will risk getting replaced should there be another input.

This probably involves editing books.ts to account for the extra book, but then have special handling since it's not writing "verses".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.