Coder Social home page Coder Social logo

arxiv-vanity / arxiv-vanity Goto Github PK

View Code? Open in Web Editor NEW
1.6K 23.0 93.0 2.18 MB

Renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.

Home Page: https://www.arxiv-vanity.com

License: Apache License 2.0

Python 78.08% HTML 14.82% Shell 0.21% CSS 2.74% JavaScript 3.83% Dockerfile 0.32%
latex academic-publishing science arxiv

arxiv-vanity's Introduction

arXiv Vanity

arXiv Vanity renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.

It turns this sort of thing:

Into this:

This is the web interface for viewing papers. The actual LaTeX to HTML conversion (the interesting bit) is done by Engrafo.

Running in development

Install Docker for Mac or Windows.

Do the initial database migration and set up a user:

$ script/manage migrate
$ script/manage createsuperuser

Pull the Engrafo Docker image, which is needed for rendering papers:

$ docker pull arxivvanity/engrafo

Then to run the app:

$ docker-compose up --build

Your app is now available at http://localhost:8000. The admin interface is at http://localhost:8000/admin/.

You can scrape the latest papers from arXiv by running:

$ script/manage scrape_papers

It'll probably fetch quite a lot, so hit ctrl-C when you've got enough.

Running tests

$ script/test

Using a development version of Engrafo

Engrafo is the LaTeX to HTML converter. If you are working on Engrafo, you might want to use the version you are working on locally.

To do that, run script/docker-build in your local Engrafo directory. This will create an image called engrafo-dev.

Then, in the arXiv Vanity directory (the same one this readme is in), create a file called .env to tell arXiv Vanity to use that image to render papers:

ENGRAFO_IMAGE=engrafo-dev

VS Code development environment

This project is configured with a dev container to get completions, etc inside VS Code. When VS Code opens, click "reopen in container" in the popup and it'll run the development environment inside the same container used by docker-compose.

Sponsors

Thanks to our generous sponsors for supporting the development of arXiv Vanity! Sponsor us to get your logo here.

YLD

arxiv-vanity's People

Contributors

bfirsh avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar imgbotapp avatar jai-deepsource avatar jkukul avatar ryota-mo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arxiv-vanity's Issues

Fix up bug report UI

  • Use bootstrap modal and forms
  • Replace alert() with bootstrap modal/alerts
  • Make lip more subtle, with sans-serif
  • Put lip at bottom on mobile, without screenshot UI

Periodically re-render papers

When Engrafo gets updates, the renders on Arxiv Sanity won't be updated.

Two options:

  • We keep all renders and re-render periodically in bulk. It could skip re-rendering if Engrafo hasn't been updated.
  • Renders are just caches and expire after a bit of time. The upside of this approach is it is cheaper to run, but means users will have to sit and wait for a paper sometimes. And, search engines won't work.

Random access papers

It should be possible to view any paper on Arxiv. This should probably be some kind of search box which converts an Arxiv URL into a web page, Scihub style. The URL /papers/[any arxiv ID] should also work.

For the backend, if a paper is accessed that doesn't exist, it should fetch metadata from arxiv then spin up a rendering job. The user could probably be shown a spinner and it could poll to refresh when it has rendered.

Move repositories to organisation & open source

Display metadata on paper

We should display things like:

  • Link to Arxiv page
  • Link to original PDF
  • Tags
  • Links to Arxiv author pages
  • Incoming citations (link to google scholar or similar?)
  • Links to arxiv sanity?

Download bulk sources from Arxiv

So we aren't downloading from the web each time somebody renders an old paper.

https://arxiv.org/help/bulk_data_s3

Rough plan for implementation:

  • Every week Django management command downloads the manifest of sources from S3
  • For each tarball in manifest, if tarball is new or checksum has changed, download tarball (tarball checksums should be stored in database)
  • For each source that is new/has changed, upload source file to directory on S3

The next step of this is to actually use the bulk sources for rendering, which depends on #123.

Make feedback form pretty

  • Use bootstrap modal and forms
  • Replace alert() with bootstrap modal/alerts
  • Make lip more subtle, with sans-serif
  • Put lip at bottom on mobile, without screenshot UI #48
  • Reset form / create new form each time (you can't report multiple bugs without refreshing)
  • Taking screenshots doesn't seem to work in development

Make a Chrome extension

It should add a "View as HTML or Arxiv Vanity" link to any arxiv.org/abs/... link. I wonder if we can overlay on PDF pages, too.

Perhaps we could make a bookmarklet, too.

Clearer warning & beta message

Warn that papers are likely to be broken - link to PDF version of paper, and also point out report a problem and github repo.

Engrafo job should report completion immediately

Currently, render job state is synced with Hyper in batch using ./manage.py update_render_state. This could be done immediately if the render job called a URL on completion.

In theory, this would mean we could pass --rm to the container and not have to clean up containers in batch, but we might not want to do that in case there is an error that means the completion handler isn't called. We'd then lose the logs and never know why it failed.

  • add endpoint to update render state
  • call endpoint when engrafo job ends
  • periodically update render state of dead jobs and remove containers that have finished

Launch checklist

  • remove basic auth
  • rerender pages that are on random loop on home page
  • post to HN
  • tweet

Display warning if paper looks broken

"Hey this looks broken. Give it a try if you want, but you probably want to look at the PDF."

It should only show if there is a bug that actually makes the paper unreadable. Just bad styling is not the end of the world.

Warnings it should display:

  • Missing citations. (.engrafo-missing-cite)
  • Missing figures. (.engrafo-missing-ref)
  • Broken math. (.mjx-merror)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.