Coder Social home page Coder Social logo

word2num's Introduction

word2num ๐Ÿ’ฌ โ†’ ๐Ÿ”ข

PyPI Version

word2num is a Python package for converting numbers expressed in natural language to numerical values. It supports:

  • Fractions
  • Decimals
  • Negative values
  • Large numbers into the quintillions
  • Digit sequences
  • Fuzzy string matching

Table of Contents


๐Ÿ› ๏ธ Installation

To use word2num, you must first install it. You can do this using pip by running the following command in your terminal:

pip install word2num

๐Ÿ’ป Usage

Once installed, you can use word2num to convert numbers expressed in natural language to numerical values. To parse a single string, use the word2num convenience function:

from word2num import word2num

word2num("fifty-seven thousand four hundred and twenty-one")  # 57421

If you need to parse multiple strings, you can create your own instance of Word2Num and call its parse method:

from word2num import Word2Num

w2n = Word2Num()
w2n.parse("one hundred and one")     # 101
w2n.parse("seventeen billion")       # 17000000000
w2n.parse("negative eight")          # -8
w2n.parse("half")                    # 0.5
w2n.parse("one and three quarters")  # 1.75
w2n.parse("one three three seven")   # 1337

Note that these functions will return None if a valid numerical value couldn't be interpreted.

๐Ÿป Fuzzy String Matching

word2num uses fuzzy string matching to help parse misspelled number words.

Default Fuzzy Threshold

By default, word2num uses a fuzzy threshold of 80, which means that it will match a word to a number if the fuzzy score is 80 or higher.

Custom Fuzzy Threshold

You can change the fuzzy threshold by passing a fuzzy_threshold parameter to the word2num function or to the Word2Num class constructor:

# Using the word2num function
word2num("soxteeeen", fuzzy_threshold=60) # [16]

# Using the Word2Num class
w2n = Word2Num(fuzzy_threshold=60)
w2n.parse("twoo hunrdered and twienty-too")  # [222]

Disable Fuzzy Matching

To disable fuzzy matching (exact matching only), you can set the fuzzy_threshold to 100:

w2n = Word2Num(fuzzy_threshold=100)
w2n.parse("two hundered and twinty-two")  # None

๐ŸŒ Language Support

  • English
  • Spanish

We'd love to add support for other languages. Contributions are more than welcome, so if you're interested in contributing, see the "Contributing" section below!

๐Ÿค Contributing

Contributions to word2num are more than welcome! If you'd like to contribute, please follow these guidelines:

  • Make sure the tests pass by running pytest in the root directory of the repository.
  • If appropriate, add new tests to cover your changes.
  • Follow the existing code style and conventions.
  • Create a pull request with your changes.

๐Ÿ“ƒ License

word2num is released under the MIT License. See LICENSE.txt for more information.

word2num's People

Contributors

angeldeejay avatar doppio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

angeldeejay

word2num's Issues

Following test fails

I am trying a few test and following fails

    ("zero five zero four 2002", "05042002"), ## None TYpe
    ("nineteen seventy-six", "1976"), ##  '95'   ?? why?

Complex number sequences are not supported

As discussed in #2, this package could support complex number sequences:

  • "nineteen seventy-six" โ†’ 1976.
  • "fifty-five thirty-three nine" โ†’ 55339

Currently, the values get added together ("1976" -> 19 + 70 + 6 = 95). Simple digit sequences are already supported ("one five three" โ†’ 153), but sequences of numbers that are not simple digits do not get parsed correctly.

Pull requests on this issue are welcome. ๐Ÿ™‚

The best so far , please keep it it up.

Tried text_to_number , word2number , this one is the best!
I had tried some test cases that fails , may be you can support more. i will report in next issue.

unexpected behavior when passed "and" or "negative"

hi, thanks for this neat little package!

I'm aware that in your implementation you used spacy or some other model to identify number words before passing them to word2num, so may not have come across these problems.

this is obviously not the intended use case in that situation, but:

  • when word2num is only given the string "and" it'll return 0.
  • when word2num is only given the string "negative" throws a list index out of range as it checks if a denominator is present.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.