Coder Social home page Coder Social logo

rottingresearch / rottingresearch Goto Github PK

View Code? Open in Web Editor NEW
14.0 4.0 19.0 76.34 MB

A project devoted to helping academics and researchers provide robust citations and mitigate link rot.

Home Page: https://rottingresearch.org

License: GNU General Public License v3.0

Python 8.95% CSS 16.31% HTML 67.56% Procfile 0.06% JavaScript 6.78% Dockerfile 0.34%
hacktoberfest flask-application python linkrot flask academia content drift link research

rottingresearch's Introduction

Rotting Research Logo

Introduction

A project devoted to helping academics and researchers provide robust citations and mitigate link rot. Visit rottingresearch.org to see it in action.

Mission

Link rot is an established phenomenon that affects everyone who uses the internet. Researchers looking at individual subjects have recently addressed the extent of link rot’s influence on scholarly publications. One recent study found that 36% of all links in research articles were broken. 37% of DOIs, once seen as a tool to prevent link rot, were broken (Miller, 2022).

Rotting Research allows academics and researchers to upload their work and check the reliability of their citations. It extracts all of the links from the document and then checks to see if the link is accessible to the public.

Check out our website at rottingresearch.org.

The status of our services can be observed at status.rottingresearch.org/status/rr.

Installation

Requirements

  • Python3 (3.10+)
  • Pip3
  • Redis

Docker Instructions

Local Development

  • Set the APP_SECRET_KEY="RANDOM_SECRET_KEY"
  • Run the docker container using docker-compose up --build. You can use the -d flag to run the containers in 'detached' mode.
  • Open 127.0.0.1:8080 in your browser.

As docker volume is used, any changes made are reflected immediately. To view the container logs you can use docker logs -f rottingresearch. The -f flag is used for following the logs.

Building Image

  • Build the docker image docker build --tag rottingresearch .
  • Run image docker run -d -p 8080:8080 rottingresearch

Linux/Mac

  • Clone Repository: git clone https://github.com/rottingresearch/rottingresearch
  • Change directory to rottingresearch - cd rottingresearch
  • Run source setup.sh - the script will automatically install the packages and setup the environment variables

Windows

  • Clone Repository: git clone https://github.com/rottingresearch/rottingresearch
  • Change directory to rottingresearch - cd rottingresearch
  • Install Python Packages: pip3 install -r requirements.txt
  • Edit app.py and set app.config['UPLOAD_FOLDER'] to a valid temporary folder.
  • Set APP_SECRET_KEY environment variable - setx APP_SECRET_KEY "random"
  • Set ENV running environment variable setx ENV "DEV"
  • Run redis redis-server
  • Set REDIS_URL environment setx REDIS_URL "redis://localhost:6379"
  • Run app python3 app.py
  • Run Celery worker celery -A app:celery_app worker -B
  • Open 127.0.0.1:8080 on your browser.

Code of Conduct

For our code of conduct, please visit our Code of Conduct page.

License

This program is licensed with a GPLv3 License.

rottingresearch's People

Contributors

aditirao7 avatar ahnaf-codes avatar anmolag10 avatar anshikjain18 avatar blncmusa avatar dependabot[bot] avatar jayeclark avatar joaovictor3g avatar m-faheem-khan avatar mailtodanish avatar marshalmiller avatar rajdeep1311 avatar timcrob avatar vladimirsosnitskiy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rottingresearch's Issues

Create Docker Image

Creating a Docker Image would allow us to test and deploy this app much faster. I'd prefer to use Docker over other containerization solutions, but am open to ideas.

Flask sessions and cookies for multiple users

Enable proper usage for multiple users using flask sessions and cookies.
Currently, 2 different users analysing the same pdf and downloading references raises an error because files already exist in the downloads folder.

Add Link Archiving

I'd like to add a feature that takes all links that are verified to be active and add them to the Internet Archive Wayback Machine to preserve them in time. Ideally, this would be added to the parent project, Linkrot.

The basic concept is that if you navigate to https://web.archive.org/save/{url} the service automatically archives that page. So after verifying that it returns a valid code, we would just connect to all of those sites, and it would create a snapshot. I'd love for this to be an option on the results page. So after all the links are checked, you have the option to archive the valid ones only. This way it is optional, and we don't take more resources than we need.

Another option would be to use ArchiveNow. This repository isn't updated regularly, so I'd prefer using Linkrot, but if this is the easiest way to achieve the desired outcome, I am OK with that.

Anyone able to complete this task, please take a stab at it.

Add Redis

A lot of new features are being added, like link archiving and multiple file upload. It has become apparent that we will need to scale and Redis seems like the logical next step. Open to other suggestions.

Revise DOI and ARXIV sanitation URLS

Right now they are pulling URLS with doi.org or arxiv.org. The concern would be that it would have a false negative if used in Internet Archive URLs that contain the URL that is archived. I will Address ASAP.

Project Banner

I need a Project Banner so that it shows up when you share this repo. Will eventually use it on pip and App as well.

Deployment Issue

I'm trying to setup the continuous integration for the permanent home for the app but I seem to be having issues with the secret. I'm sure it's me not having a full understanding of how it works in Flask. I'm trying to use the GitHub secrets function. I made one called Heroku_Secret and put it in the code but the app still appears to not recognize it has a value. @aditirao7 Any ideas? You've deployed it on your test site. What am I missing?

Few links have incorrect status

Describe the bug
A few of the links show status N/A although their redirect URLs lead to 200 status pages.

To Reproduce
Steps to reproduce the behavior:

  1. Go to rottingresearch.org
  2. Upload 2206.00785.pdf
  3. Number of arxiv and DOI references shows 0

Expected behavior
Should show 200 status for the last 2 arxiv links.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

  • OS: Linux (Ubuntu 20.04)
  • Browser: Chrome
  • Version: 117.0.5938.92

Refactor Code & Adopt Standard Code Linter

The HTML code stinks mainly the upload.html because there is no code consistency and the Javascript is all over the place. I believe refactoring it would improve readability and code hygiene.

Solution: Standard Code Lint configuration (Prettier)

Inconsistent Indentation & Closing of Tags
https://github.com/marshalmiller/rottingresearch/blob/cf3181de677964a656c41debaf366599f2a7c137/templates/upload.html#L86-L126

Javascript Optimzation - JS code can be moved to the bottom(order or priority)
https://github.com/marshalmiller/rottingresearch/blob/cf3181de677964a656c41debaf366599f2a7c137/templates/upload.html#L29-L50

Create App Logo

I am looking for a distinct logo to use for this app. Something that shows the theme of the app and is attractive.

Incorrectly parsing mailto URI as http://mailto:[email protected]

Describe the bug
While testing the application I noticed that the application is parsing mailto URIs incorrectly. The link is supposed to be mailto://[email protected] but is being shown as http://mailto:[email protected].

PDF: Machine learning and the physical sciences

The issue exists because the sort_ref(ref) function in the app.py appends all references of type URL to the URLs array a simple check to see if the URI is a mailto URI can fix this.

Code: app.py/sort_refs

imageedit_5_8414372469

To Reproduce
Steps to reproduce the behavior:

  1. Start the Application
  2. Make sure the APP_SECRET_KEY environment variable is set
  3. Upload the Machine learning and the physical sciences PDF
  4. See URL References Section on the right

Expected behavior
Either the mailto URI should not be shown at all OR be moved under Linkrot Summary

Screenshots
Screenshot of the mailto URI being parsed as HTTP URI
imageedit_5_8414372469

Possibly move mailing references to its own section under the Linkrot Summary section
linkrot

Desktop (please complete the following information):

  • OS: Ubuntu 20.04 LTS x64
  • Browser: chromium
  • Version: 101.0.4951.64 (Official Build) snap (64-bit)

Create Database to store results

I think it would be extremely helpful if we were to store the results from each analysis in a database. I do not want to store user data. But things like the date written (pulled from metadata), and data submitted to us. Overall links, DOI links, Error Codes, and such. This would allow us to keep long-term stats for further research. Even how the same documents change over time. We could then create a dashboard to share those results.

Update CSS to add a few pixels of space between buttons

On the analysis page, the 2 download buttons have no space between them which can be noticed on hovering over them. I think adding some space in between would make the page look much better. This could be a good first issue to pick up.

Number of arxiv and DOI references not updating

Describe the bug
The number of arxiv and DOI issue do not seem to reflect on the top of the report.

To Reproduce
Steps to reproduce the behavior:

  1. Go to rottingresearch.org
  2. Upload 2206.00785.pdf
  3. Number of arxiv and DOI references shows 0

Expected behavior
Should show 3 and 2 respectively.

Screenshots
image
image

Desktop (please complete the following information):

  • OS: Linux (Ubuntu 20.04)
  • Browser: Chrome
  • Version: 117.0.5938.92

Upload Multiple Files at Once

A great feature would be if we could upload more than one file at a time. I know this is a major step, but I believe that this would be of great use to many.

Timeout Issues

Sometimes, with larger files, the process can time out and result in a 500 server error.

Remove Paste a Link Feature

Screen Shot 2022-10-25 at 7 55 14 PM

We should remove the ability to paste a link and limit it to just uploading a file. The reason is because of copyright issues. Publishers tend to be highly litigious. With the link feature, the website downloads a copy and analyzes it, whereas when someone uploads a file, they are bound to the user agreement and release the site of any responsibility. I know this is pretty cynical, but I think it's an unfortunate reality. If someone wants to run it on remote files, they can use the Python app, Linkrot.

Make better-looking javascript alerts

Alerts are generated when the user submits without uploading a file/url or when the file/url is not a pdf.

Screenshot from 2021-10-05 23-35-51

The alerts are generated using the standard window.alert() and don't look that great. Improve alert design.

Agreement Checkbox

We should add some sort of agreement checkbox. We can add like a tool tip about how the uploader agrees that they have the rights to upload the document or whatever. Just to cover certain liabilities.

Add Summary to Analysis

Can we add the summary to the report that it generates? The original script generates something like this:
image

Merge Celery Container with Main Container

Right now, the app can spin up three containers—one for the app, one for the celery workers, and one for Redis. Since you can always manually add Redis with the environments and official image, it seems strange to maintain one ourselves. So now we have two. We can't do a docker pull because it will never have the celery image. If possible, I would like to find a way to incorporate the celery worker into the main image. I see many advantages to this, which I can expand on if you'd like to.

(Update) CodeQL Github action to run only when change in Python

I think we should update the CodeQL/Analyze (python) to run only when there has been changes made to python code. As now whenever changes to any none .py code is made the CodeQL for python is still run.

image

We can also remove the Autobuild section from the action as there is no code compilation being done.
https://github.com/marshalmiller/rottingresearch/blob/cf3181de677964a656c41debaf366599f2a7c137/.github/workflows/codeql-analysis.yml#L56-L57

Incomplete Analysis

Describe the bug
The status codes for many of the links do not get updated on the analysis page. Only noticed this bug on my mobile, desktop seems fine. It looks like the link checking is aborted halfway for some reason.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://rottingresearch.org/
  2. Check link https://arxiv.org/pdf/2206.00785.pdf
  3. Many links are N/A status code even if they work

Expected behavior
All links should have a valid status code.

Screenshots

WhatsApp Image 2022-06-27 at 11 35 28 PM (1)
WhatsApp Image 2022-06-27 at 11 35 28 PM

Smartphone (please complete the following information):

  • Device: Pixel 3a
  • OS: Android
  • Browser: Chrome
  • Version: 102.0.5005.125

Create Simulated Tests with Playwright

In order to make sure every build works correctly and connects with both the celery and Redis servers, we need to create a test that spins up a live server and tests the functions using simulation. In my research, it appears that Playwright would be the best solution. It works with Pytest and Flask to accomplish this. Link below:

https://playwright.dev/python/

Fix mobile view

The mobile view needs to be made responsive, restructured and redesigned. Respective screenshots are attached below.

WhatsApp Image 2022-06-27 at 11 35 26 PM (1)

WhatsApp Image 2022-06-27 at 11 35 26 PM

WhatsApp Image 2022-06-27 at 11 46 50 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.