kescardoso / datasetbucket Goto Github PK

View Code? Open in Web Editor NEW

6.0 2.0 2.0 10.81 MB

A dataset bucket with a machine learning bias auditor. Built with Python-Flask, MaterializeCSS and the Kaggle API.

Home Page: https://datasetbucket.herokuapp.com

License: MIT License

Python 53.37% CSS 1.92% JavaScript 0.87% HTML 43.84%

python python3 flask flask-application mongodb mongodb-database machine-learning unbiased

datasetbucket's Issues

Fix PDF title

Sometimes, the PDF title isn't correct. We need to check that it is getting passed the correct string. It is possible this might be fixed when we fix the issue of the old datasets not being downloaded.

Improve PDF analytics

Break down number of samples for each value
Plot histograms/diagrams representing distributions for each feature
Other analytics improvements

Add charts/graphs

Add charts/graphs to the report to show the analysis visually

Add dictionary of country demographics

Add population demographics of each country to be used in the analysis of the data.
We want to compare the demographics in the data to the demographics in a given population, to check if the data is representative of the whole.

Setup File Uploading with Cloud Storage

On add_dataset, the file is only successfully uploaded in local deployment.
After deploying our app to heroku, a fix is needed: either link files uploading to mongodb via b64 or via a cloud storage like s3 to handle user uploads in the deployed version (ultimately necessary).

readme and devpost

edit and update Readme and Devpost descriptions

Fix image analysis

image analysis files need their paths updated. Images in a dataset are not currently getting analyzed or reported.

Delete .vscode file

@kescardoso I want to delete the .vscode folder and .DS_Store in main, but I don't want to break the deployment. Can we delete it safely?

Replace time.sleep()

Need to rewrite time.sleep() statements into async statements to deal with multithreading.
time.sleep() is used in multiple files.

Maybe try:
https://realpython.com/async-io-python/

Delete old datasets from deployed heroku app

When a dataset is analyzed, the old zip + unzipped datasets need to be deleted.
Or, we need to delete the zip after we have sent the report

generate a pdf of the report

The results from the analysis will be available to download as a pdf, in addition to being displayed in the html.

Deploy webApp on Dfinity

Research how to deploy on Difinity
Once app is created, deploy

Make dowload progress spiner stop after folder is downloaded

This fix can be achieved with jquery, a good example in this stackoverflow https://stackoverflow.com/questions/5757801/javascript-how-to-show-spinner-until-file-downloads

Fix Category Bug

Categories:
after installing the location select functionality, bugs in category selection appeared

previous multi selection not returning from db on edit view
string and list not rendering properly (again!)

Dynamic Search + Pagination

To be Implemented:
On datasets.html

Dynamic Search
Pagination

Build initial python-flask app

Build the initial app with flask.
Ensure that we can parse files well with flask

Refactor variable names and comments

Variables names are in the incorrect format.
Code needs more comments

Fix .jpeg and .png extraction and os.walks

Better UX/UI/Design rules

Add:

Footer
About/instructions page
Credits

Improve user-friendly and aesthetic features:

color key
buttons
typography

More info/functionalities for user profile

Develop user profile with more functionalities (not a priority, but cool if we have time)

hyperlink author on datasets.html
NavBar with active user session tag
User info (name, position, photo, links, etc.)
User tasks, activities, contributions, stats....

Some things left to figure out and learn 🤔

Add 2 more acceptable formats to JSON files

We want to include 3 diffterent formats supported:
{'root': {dict}} ,
{'content': {}, 'annotation':[{'labels':{}},{'points':{}}], 'extras':{}} , currently the only accepted format
{'keyword':{}, 'keyword':{}, 'keyword':{}}

Location selection and query

Create Location functionality for datasets with dynamic query and db binding

Link field with jinja tags (for pdf reports retrieval)

Feature to be implemented as an alternative:

Figure out a way to add a link input to Add New Dataset, to include pdf reports from a cloud drop box.
Specially in case the JSON file handling doesn't integrate well to the project