Coder Social home page Coder Social logo

catseye / t-rext Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 1.0 22 KB

MIRROR of https://codeberg.org/catseye/T-Rext : A command-line tool that attempts to rectify punctuation and spacing in (generated) text files

Home Page: https://catseye.tc/node/T-Rext

License: The Unlicense

Python 95.37% Shell 4.63%
text-processing sanitization filtering text-sanitization

t-rext's Introduction

T-Rext

T-Rext is a command-line filter that attempts to clean up spacing, punctuation, and capitalization in a text file. Its purpose is so that, when you are writing a text generator, such as a Markov processor, you need not worry too much about its output format; just toss its output through T-Rext when you're done to make it more presentable.

The current version of T-Rext is 0.3, which runs under either Python 2.7 or Python 3.x. Docker images based on appropriate versions of cPython for each version are available on Docker Hub.

Usage

Usage from the Command Line

bin/t-rext raw_output.txt > cleaned_output.txt

This will take lines that look like this:

" Well , " said the king , , " no . "

and reformat them to look like this:

“Well,” said the king, “no.”

To use T-Rext from any working directory, add the bin directory in this repository to your PATH. For example, you might add this line to your .bashrc:

export PATH=/path/to/this/repo/bin:$PATH

An easy way to accomplish the above is to install shelf, then dock T-Rext using

shelf_dockgh catseye/T-Rext

Usage from Python

T-Rext is built on an over-engineered library of pipeline processors, which you can use directly (note, its interface is not stable and liable to change.) To use the T-Rext Python modules in other Python programs, make sure the src directory of this repository is on your PYTHONPATH. For example, you might add this line to your .bashrc:

export PYTHONPATH=/path/to/this/repo/src:$PYTHONPATH

Then you can add imports like this to the top of your script:

from t_rext.processors import TrailingWhitespaceProcessor

Tests

This is a test suite, written in Falderal format, for the t-rext utility. It also serves as documentation for said utility.

-> Tests for functionality "Clean up punctuation and spaces"

Spaces before commas and periods are elided.

| Well , that is good .
= Well, that is good.

Multiple commas are collapsed into a single comma.

| Well , , that is good .
= Well, that is good.

Multiple periods are not collapsed into a single period.

| Well . . . that is good.
= Well... that is good.

Quotes are oriented.

| "Yes," he said.
= “Yes,” he said.

Single spaces after opening quotes and before closing quotes are elided.

| " Yes , " he said.
= “Yes,” he said.

But not the other way 'round.

| Muttering "Yes," he turned around.
= Muttering “Yes,” he turned around.

Multiple spaces after opening quotes and before closing quotes are elided.

| "   Yes ,   " he said.
= “Yes,” he said.

But not the other way 'round.

| Muttering   "Yes,"    he turned around.
= Muttering   “Yes,”    he turned around.

Quotes do not match across paragraphs.

| Turbid "Waters" that "leak.
| 
| You "don't" have a clue.
= Turbid “Waters” that “leak.
= 
= You “don't” have a clue.

Single spaces before apostrophes are elided in some situations.

| It wasn 't Arthur 's car.
= It wasn't Arthur's car.

Punctuation at the beginning of a line is elided in some cases.

| , where he said so.
= Where he said so.

Capitalization is applied at the beginning of a line, and the beginning of a sentence.

| , where. he said so.
= Where. He said so.

| Really?    that was... so
= Really?    That was... so

Two full stops becomes an ellipsis. Full stop then comma becomes just a comma.

| It was.. the nice., thing.
= It was... the nice, thing.

t-rext's People

Contributors

cpressey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lazuraslong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.