lblend / mann-eller-kvinne Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 40.24 MB

🤵 En nettside som bruker maskinlæring til å gjette om du er mann eller kvinne basert på hva du skriver 💃

Home Page: https://mannellerkvinne.lblend.moe

License: GNU General Public License v3.0

Shell 2.28% Python 35.03% Dockerfile 0.60% PureBasic 62.09%

norsk norwegian machine-learning naive-bayes-classifier reccurent-neural-network norge gender-classification gender

mann-eller-kvinne's Issues

Front end crashes on any input

The site produces the following error:

Steps to reproduce

Type any chacter in the text field on the front page. Happens on both localhost and https://mannellerkvinne.lblend.moe/

Add frontend switch to swap between classifiers

Add frontend switch to swap between classifiers. The switch should change the json field "clf" when calling the API.
Something along the lines of

{
    clf: switch ? 'rnn' : 'bayes'
    ...
}

Add automated frontend code style linting

Similar to the python linter for the back end. There should be an automated style checker for the front end javascript code.

Specify lib versions for requirements.txt

This reduces the risk of incompatible installations. We already did this with numpy due to some bug
but ideally, we should do it for all of them

Make rnn (or other) model to beat naïve bayes baseline

Turned out that the naive bayes model is still better than the rnn. This is unacceptable, and therefore we must tune hyperparameters and experiment so that it can outperform the baseline

Build Flask app in Docker image

Add Black formatting git hook / github action

Automatically format files with black

Add automatic "awaiting approval" labeling using some bot

Add automatic "awaiting approval" labeling when someone submits an "enhancement" issue. Also, let anyone submit "enchancement" issues.

Resolve warnings produced by tensorflow when loading RNN

A bunch of warnings occur when loading the keras model. Not sure why, but it should either be resolved or suppressed

Validate classifiers

Estimate classifiers performance on development and train set using various metrics.
This can be done e.g. using a jupyter notebook. Results can be presented in readme.
Suggestions about how it should be done or presented are very welcome

API won't run due to Tensorflow error - NotImplementedEror: Cannot convert a symbolic Tensor to a numpy array.

What is wrong?

When trying to run the API, Tensorflow raises a NotImplementedError: Cannot convert a symbolic Tensor (bidirectional/forward_lstm/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

System info

OS: Ubuntu 20.04, x86
Release: 2.1.0

consider using git submodules for corpus

Site crashes on load when using Safari on iOS

See vercel/next.js#8347 for a potential solution

Translate README into english

Offer an english version of the README along with the norwegian version in order to make the code more accessible to people.

This should apply to all documents in the repo. This means that at the time of writing this, the contribution guide and the backend README needs to be translated as well.

Auto-publish prebuilt docker images to dockerhub/github

Replace all NLTK with scikit-learn

NLTK is an old fashioned, awkward (imo). I think we should opt for a more modern classifier API. scikit learn is more modern and elegant. I think we should use this for the naive bayes (and possibly other) classifiers

Move front end/backend into separate repos?

Perhaps it's more clean given that the programs can be run independently.
Wdyt?

Make config setup for hyperparameter tuning of deep model

We should use config files to manage hyperparameter tuning. Yaml is a pretty decent and readable format.
I'll be working on it as part of my mission to beat the bayes model.

Make About page logo image responsive

The logo on the about page is currently set to a fixed size, its full width and height. This should be responsive.

Implement a logistic regression classifier

Take inspiration from the paper which the dataset we use originated from
https://aclanthology.org/2020.gebnlp-1.11.pdf

Use more semantic variable names and field names

Change names like "clf", "M", and "F" to more semantic and easy-to-understand names. It's not obvious to neither people reading the code nor consumers of the API what the values mean.

If we are going through with this, I want keep this change on hold until we've fully rewritten the backend. This makes it so that the finished rewrite will be a 3.0 release and this change the 4.0 release. This creates a consistent correlation between the version numbers of the frontend and the backend. Though, not a requirement it is good to keep it this way for now.

Add option for environment variables in docker compose

Fix text margins. About page

Margins are inconsistent

Find a proper way to set dev api-url

The current soultion requires changing a line of code each time you want to switch it. We should have a .env config or something similar to set react environment variables when switching between dev and build. It might require installing additional dependencies (E.g. webpack or dotenv)

This article describes a couple ways of doing it:
https://trekinbami.medium.com/using-environment-variables-in-react-6b0a99d83cf5

lblend / mann-eller-kvinne Goto Github PK

mann-eller-kvinne's Issues

Steps to reproduce

What is wrong?

System info

Recommend Projects

Recommend Topics

Recommend Org