Coder Social home page Coder Social logo

speeches's Introduction

speeches's People

Contributors

hp0404 avatar

Stargazers

 avatar

Watchers

 avatar

speeches's Issues

GET - sort logic

Current implementation sorts table by 'created_at' column:

speeches/app/main.py

Lines 32 to 37 in 0d97e4e

statement = (
select(Speeches)
.order_by(Speeches.created_at.desc())
.offset(offset)
.limit(limit)
)

I think we should first sort by 'date' DESC, then 'create_at' ASC

Refine Matcher's patterns

I've noticed a few issues so far:

  • incorrect matching, e.g. special characters (dates, dashes, etc.) are treated as PoS and combined with proper PoS give 'false' phrases
  • duplication, e.g. the same chunk of text might get matched with multiple patterns giving > 1 match - which might be fine as long as we have a clear approach to 'counting' unique phrases

This issue will be updated

separate tables?

I noticed that filtering full dataset takes a lot of time. This led me to think I might try moving 'texts' into a separate table, thus allowing fast filtering on metadata and then only joining relevant texts from another table.

Not sure if it helps but worth trying

validate config on assignment

Base settings class validates DATABASE_URI only once:

POSTGRES_SERVER: str
POSTGRES_USER: str
POSTGRES_PASSWORD: str
POSTGRES_DB: str
DATABASE_URI: Optional[Union[str, PostgresDsn]] = None
@validator("DATABASE_URI", pre=True)
def assemble_db_connection(cls, v: Optional[str], values: Dict[str, Any]) -> Any:
if isinstance(v, str):
return v
return PostgresDsn.build(
scheme="postgresql",
user=values.get("POSTGRES_USER"),
password=values.get("POSTGRES_PASSWORD"),
host=values.get("POSTGRES_SERVER"),
path=f"/{values.get('POSTGRES_DB') or ''}",
)

If we change some of the fields after the settings class has been initialized, the validators won't run again. In regular usage it's fine as we don't expect settings to change unless we're directly testing different setups.

I'd like to have this fixed so that we could inject some overriding settings as a fixture shared across all tests and then maybe change fields again within the scope of some specific test functions.

We could set different POSTGRES_DB value for pytest or tweak SMTP_USER / SMTP_PASSWORD while testing email notifications

move enpoints from main.py to separate routers

This issues should be treated as a road-map:

SPEECHES:

  • /speeches/
    • POST: insert new document + extract features on the fly
    • GET: query metadata from a database
  • /speeches/{id}
    • GET: get document (probably metadata + joined text + features (optionally)
    • DELETE: delete {id} speech from all tables

FEATURES:

  • /features/
    • POST: send text and receive extracted features, without writing to a database
  • /features/{document_id}
    • GET: query all features associated with document_id (optionally filtering using feature_type)
    • DELETE: delete {document_id} features

auth

add auth for POST requests - to be sure that only a single user (cronjob) has write rights.

TODOs:

  • try the simplest approach (without fastapi.security - as query)
  • set up proper OAuth2 using fastapi docs (as bearer token)

change structure

I'm trying to think ahead (what's the easiest structure to maintain & build upon?)

  • (maybe) move routes-specific code from main.py to separate routers/ module
  • (maybe) move auth.py to core/ (purely because I've seen other people do it, but have no idea if it's the 'best practice')
  • (maybe) merge database.py and core/config as database.py simply imports config's PostgreSQL connection string and creates sqlmodel engine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.