joostrothweiler / politags Goto Github PK
View Code? Open in Web Editor NEWPolitags project built as part of the thesis project at Delft University of Technology
Politags project built as part of the thesis project at Delft University of Technology
Now we save X entries when CDA is found X times. Might be better to save it only once per article but with a count.
https://pypi.python.org/pypi/nameparser could be used to parse the names and process parts such as title (Dhr, Mw). Could be great for matching. We should, however, add constants such as titles (Dhr) and maybe titles/suffxes
In the database we want to keep the top X linkings. However, if a certainty is very low, we do not want to return it in the API/Poliflow.
Create a function that generates a question for an article just using article_id
To see how the database grows and what time it takes to process X amount of articles.
It is important that we are able to update the database on/after deployment without having to reseed the database. For this, we need migrations which are already implemented but not in use.
We only want to insert the most certain one into poliflow, so the api should only return the most certain one per entity (mention). However, for the human computation part we may want to be able to retrieve all of them.
So:
API -> return one linking per mention with cutoff
Interal -> keep linkings even if very uncertain
Implement basic NER module based on spacy.
NER performance is lower as a result of html tags in the description. We should make something that generates raw text and use this.
Such that poliflow can make use of these.
Politags:
Poliflow:
We want to be able to retrieve documents from poliflow. Also, we want to have a few articles in the database to test on.
In order to process articles already present in our database, we need to fetch the body data from poliflow. To do so, we must be able to access a single document from the api based on an id
Get weights based on perceptron classifier and test P/R
Find 10/20 new documents and classify by hand.
Right now we can only use init_db, whereas we would like to use:
We have migrations set up, as well as a database model (design). Now we have to implement migrations that actually create the database tables.
Use the xml provided in the archive to seed the database on politicians and parties
Right now we still store locations/misc etc. Maybe we do want to store everything to improve NER/NED but it may increase db size too much.
Right now we still return 'standard' NER, whereas we would like to return a list of parties/politicians
We want to be able to test different setups according to certain metrics. These need to be defined and created.
Check with poliflow team to decide whether we want to also for instance process the title and party from which the document is crawled and insert this info as enrichment in the database.
Fix something to link question to other database objects.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.