Coder Social home page Coder Social logo

billsim's Issues

Issues with running billsim on Python3.8 and Python3.9

I tried running billsim with python3.8 and python3.9, and got two separate issues.

With 3.8

___________ ERROR collecting tests/constants_test.py ___________
tests/constants_test.py:3: in <module>
    from billsim.pymodels import BillPath
src/billsim/pymodels.py:36: in <module>
    class Section(SectionMeta):
src/billsim/pymodels.py:37: in Section
    similar_sections: list[SimilarSection]
E   TypeError: 'type' object is not subscriptable
_____________ ERROR collecting tests/utils_test.py _____________
tests/utils_test.py:9: in <module>
    from billsim.pymodels import BillPath
src/billsim/pymodels.py:36: in <module>
    class Section(SectionMeta):
src/billsim/pymodels.py:37: in Section
    similar_sections: list[SimilarSection]
E   TypeError: 'type' object is not subscriptable

With 3.9

from lxml import etree
ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /opt/homebrew/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so: mach-o, but wrong architecture
        /opt/homebrew/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so: mach-o, but wrong architecture

This latter one seems to be a issue with m1 macs from considering the error complains about architecture.

Improve batch saving

See https://github.com/aih/billsim/blob/main/src/billsim/utils_db.py#L382

We currently use sqlalchemy for this batch save operation. However, in some cases, it causes errors:

Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/3.9.1/envs/py391/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1799, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.pyenv/versions/3.9.1/envs/py391/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.StatementTooComplex: stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.

We may try to increase max_stack_depth, but the statements also appear to be unnecessarily large and recursive.

Can we take advantage of psycopg3 improvements and use it directly for saves?

Run tests on PR

This issue is to create a github action to run tests on PR

Create generic array similarity functions

This function builds on the functions in this repository and the Go functions in aih/bills.

Assumptions:

  • Each 'document' consists of an array of strings. The document has a unique id and each item in the array is also uniquely identified (either by an id or its ordinal position in the array).
  • The length of each document array may vary

The generic similarity functions would:

  1. Calculate a vocabulary of n-grams from the total corpus of documents (an array of documents).
  2. Vectorize the documents so that they each document can be stored as a (sparse) array of the length of the vocabulary
  3. Store the vectorized matrix of all documents in a pickle file (or eventually in Postgresql) (MOD- matrix of all documents)
  4. Calculate the similarity between each item of each array and all other items in the MOD
  5. Apply an item threshold to find similar items for each item in a document
  6. Apply a document threshold to find similar documents
  7. Return 5 and 6 in a model form that can be stored to a database (item-to-item and document-to-document similarity)

Need to pass in environment variables to BillSim

For a Library of Congress project, we need to connect to Postgres on port 5433, not port 5432.

The billsim code doesn't seem to use the values defined in the .env file at the top-level. I thought it did use those .env values, but potentially I hardcoded port 5433 into the billsim package and forgotten it, then reinstalled billsim and wiped that change away.

How can we best pass the postgres port and host into the billsim package? Potentially we should pass it at runtime rather than as an env variable? Or is there some way to have the billsim package use the .env file at the top level of the repo? This would appear to be an issue for any user of the billsim package, as generally you'd want to pass in username/password/etc and not use defaults.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.