speeches's Issues
tests
move enpoints from main.py to separate routers
This issues should be treated as a road-map:
SPEECHES:
- /speeches/
- POST: insert new document + extract features on the fly
- GET: query metadata from a database
- /speeches/{id}
- GET: get document (probably metadata + joined text + features (optionally)
- DELETE: delete {id} speech from all tables
FEATURES:
- /features/
- POST: send text and receive extracted features, without writing to a database
- /features/{document_id}
- GET: query all features associated with document_id (optionally filtering using feature_type)
- DELETE: delete {document_id} features
separate tables?
I noticed that filtering full dataset takes a lot of time. This led me to think I might try moving 'texts' into a separate table, thus allowing fast filtering on metadata and then only joining relevant texts from another table.
Not sure if it helps but worth trying
validate config on assignment
Base settings class validates DATABASE_URI
only once:
Lines 26 to 42 in c599bd7
If we change some of the fields after the settings class has been initialized, the validators won't run again. In regular usage it's fine as we don't expect settings to change unless we're directly testing different setups.
I'd like to have this fixed so that we could inject some overriding settings as a fixture shared across all tests and then maybe change fields again within the scope of some specific test functions.
We could set different POSTGRES_DB
value for pytest or tweak SMTP_USER
/ SMTP_PASSWORD
while testing email notifications
change structure
I'm trying to think ahead (what's the easiest structure to maintain & build upon?)
- (maybe) move routes-specific code from main.py to separate routers/ module
- (maybe) move auth.py to core/ (purely because I've seen other people do it, but have no idea if it's the 'best practice')
- (maybe) merge database.py and core/config as database.py simply imports config's PostgreSQL connection string and creates sqlmodel engine
rename attributes
auth
add auth for POST requests - to be sure that only a single user (cronjob) has write rights.
TODOs:
- try the simplest approach (without fastapi.security - as query)
- set up proper OAuth2 using fastapi docs (as bearer token)
alembic - turn off default indexing of all columns
When writing 'larger' texts to a PostgreSQL database, SQLModel might throw sqlalchemy.exc.OperationalError: (psycopg2.errors.ProgramLimitExceeded)
- Values larger than 1/3 of a buffer page cannot be indexed.
This issue has been documented (tiangolo/sqlmodel#9)
store embeddings
GET - sort logic
Current implementation sorts table by 'created_at' column:
Lines 32 to 37 in 0d97e4e
I think we should first sort by 'date' DESC, then 'create_at' ASC
Refine Matcher's patterns
I've noticed a few issues so far:
- incorrect matching, e.g. special characters (dates, dashes, etc.) are treated as PoS and combined with proper PoS give 'false' phrases
- duplication, e.g. the same chunk of text might get matched with multiple patterns giving > 1 match - which might be fine as long as we have a clear approach to 'counting' unique phrases
This issue will be updated
badly formed hexadecimal UUID string
Basically encountered this issue (tiangolo/sqlmodel#25) and went for int primary key instead.
TODO:
- go back to UUID as PK
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.