Coder Social home page Coder Social logo

Comments (3)

albarrentine avatar albarrentine commented on August 20, 2024 2

While it's not ideal, it's fairly common for Python packages to have system-wide requirements. For instance, many of the Python GIS packages (Shapely, Rtree, etc.) depend on system libraries that are not bundled and which the user may already have installed for Postgres or something else. Same is true for some of the C-based database bindings which may require libmysqlclient, etc.

For libraries that are intended to be used only with Python, it's more common to bundle the C lib, but that's not the case for libpostal. We also have bindings to Go, Node, Ruby, Java, PHP, R, etc. There's even a Postgres extension.

Libpostal is a bit different from most packages because it features a production-grade, trained machine learning model that takes up about 1.8GB of space at present, which is a lot more than people are accustomed to downloading when installing a package. Because of the heavier-than-usual space requirement, and the fact that many people are using this on AWS, containers, VMs, etc. there's not necessarily a sensible default for where the datadir should go (on AWS machines, the default used in Autotools, "/usr/local/share" might be taking up valuable space on a root volume). Making the Python library fully pip-installable would, I think, involve producing wheels for the various platforms with the compiled libpostal binaries and the models.

The datadir in libpostal is currently set at configure/compile-time. However, all of libpostal's setup functions (called once at import time) have *_setup_datadir variants which allow passing in a directory at runtime. As such, it should be possible to add wheel distributions which bundles libpostal and the libpostal_data script which downloads the data files (and gets installed in e.g. /usr/local/bin by default), configures a datadir on the Python side, and then use the configured datadir at import time. At minimum though, the default behavior would need to check for an existing libpostal installation so the user doesn't inadvertently download the model twice if they already have libpostal or one of the other bindings installed.

Happy to accept pull requests as long as they take into account our various requirements.

from pypostal.

adriangb avatar adriangb commented on August 20, 2024 1

Maybe you can take the approach that packages like TensorFlow do to load pertained models? Have a function that initiates the download or something like that. I do think it would be nice to distribute prebuilt binaries.

from pypostal.

ynouri avatar ynouri commented on August 20, 2024

How would the datadir be configured on the Python side? If we take the parser for example, should we make a call to libpostal_setup_parser_datadir(char *datadir) in the init_parser function with datadir read from from an environment variable?

https://github.com/openvenues/pypostal/blob/master/postal/pyparser.c#L175

from pypostal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.