Light

arxiv-vanity / arxiv-vanity Goto Github PK

View Code? Open in Web Editor NEW

1.6K 23.0 93.0 2.18 MB

Renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.

Home Page: https://www.arxiv-vanity.com

License: Apache License 2.0

Python 78.08% HTML 14.82% Shell 0.21% CSS 2.74% JavaScript 3.83% Dockerfile 0.32%

latex academic-publishing science arxiv

arxiv-vanity's Introduction

arXiv Vanity

arXiv Vanity renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.

It turns this sort of thing:

Into this:

This is the web interface for viewing papers. The actual LaTeX to HTML conversion (the interesting bit) is done by Engrafo.

Running in development

Install Docker for Mac or Windows.

Do the initial database migration and set up a user:

$ script/manage migrate
$ script/manage createsuperuser

Pull the Engrafo Docker image, which is needed for rendering papers:

$ docker pull arxivvanity/engrafo

Then to run the app:

$ docker-compose up --build

Your app is now available at http://localhost:8000. The admin interface is at http://localhost:8000/admin/.

You can scrape the latest papers from arXiv by running:

$ script/manage scrape_papers

It'll probably fetch quite a lot, so hit ctrl-C when you've got enough.

Running tests

$ script/test

Using a development version of Engrafo

Engrafo is the LaTeX to HTML converter. If you are working on Engrafo, you might want to use the version you are working on locally.

To do that, run script/docker-build in your local Engrafo directory. This will create an image called engrafo-dev.

Then, in the arXiv Vanity directory (the same one this readme is in), create a file called .env to tell arXiv Vanity to use that image to render papers:

ENGRAFO_IMAGE=engrafo-dev

VS Code development environment

This project is configured with a dev container to get completions, etc inside VS Code. When VS Code opens, click "reopen in container" in the popup and it'll run the development environment inside the same container used by docker-compose.

Sponsors

Thanks to our generous sponsors for supporting the development of arXiv Vanity! Sponsor us to get your logo here.

arxiv-vanity's People

Contributors

Stargazers

Watchers

Forkers

awesome-archive kakaolabs zhuwenxiao vibster zchenry benjamesbabala blackaceatzworg ghosthamlet kaeken1jp omprakash95 codeaudit yangtsoo thunderkid dreadlord1984 rygbee andreykurenkov grseb9s aiquest biaoxyz afcarl mozyok drscotthawley arxiv kornbergfresnel auserj dany2345 gleisonbt gongxijun kisimple spencerx phuicy azai91 simpleweiwei rafaelmri alexmedison tamuhey vgg4resnet sn1b shiresebastian foeinlove pkucss navpreetsamra shihuaxing xiansweety nkconnor davetheslayer forklifters nalan1976 dolftax zengai amirunpri2018 ryota-mo kei-1986 xr86 bellyfat frankiegu tomatepilatus morioprog sbusso lkonga shyamalschandra metavai marwahaha dheerajjadhav wanghaisheng parety python-repository-hub quantummixer felixceard mistobaan rickeyestes2 sergionoe brittlewis12 anaclumos apollohuang1 explcre modelturnedgeek xushilundao tiantianlecheng trocker zack-ashen ctavolazzi logithm

arxiv-vanity's Issues

Paginate / infinite scroll home page

Better tag colors

Colors are randomly generated, and some are a bit naff. Follow-up of #5.

Favicon

Fix up bug report UI

Use bootstrap modal and forms
Replace alert() with bootstrap modal/alerts
Make lip more subtle, with sans-serif
Put lip at bottom on mobile, without screenshot UI

Google Analytics

Convert plain Arxiv IDs

Or things which look like "arXiv:1502.04623" in the convert box

Write integration test

Browser test, if possible. Pass through add_docker in Codeship to make it work.

Move "report a problem" from Engrafo

Strip ".pdf" suffix from links

Putting https://arxiv.org/pdf/1512.03385.pdf in the box doesn't work

Only list machine learning papers on home page

Now any papers can be rendered, non-ML papers will turn on home page.

About page

Write basic test for displaying papers

At least check it works.

Exception logging

Probably with Heroku's Sentry addon or something.

Handle papers that don't have source available

e.g. https://arxiv.org/abs/1705.04924v3

Periodically re-render papers

When Engrafo gets updates, the renders on Arxiv Sanity won't be updated.

Two options:

We keep all renders and re-render periodically in bulk. It could skip re-rendering if Engrafo hasn't been updated.
Renders are just caches and expire after a bit of time. The upside of this approach is it is cheaper to run, but means users will have to sit and wait for a paper sometimes. And, search engines won't work.

List papers which fail to render

With some way of showing that it is not ready yet and you are being linked to the PDF.

Random access papers

It should be possible to view any paper on Arxiv. This should probably be some kind of search box which converts an Arxiv URL into a web page, Scihub style. The URL /papers/[any arxiv ID] should also work.

For the backend, if a paper is accessed that doesn't exist, it should fetch metadata from arxiv then spin up a rendering job. The user could probably be shown a spinner and it could poll to refresh when it has rendered.

Images on paper list

Pick a suitable image from the article, somehow.

Move repositories to organisation & open source

Human-friendly names & colours for tags

Fix pages being too short

The paper rendering page has a white bar at the bottom because the footer isn't long enough.

Display metadata on paper

We should display things like:

Link to Arxiv page
Link to original PDF
Tags
Links to Arxiv author pages
Incoming citations (link to google scholar or similar?)
Links to arxiv sanity?

Download bulk sources from Arxiv

So we aren't downloading from the web each time somebody renders an old paper.

https://arxiv.org/help/bulk_data_s3

Rough plan for implementation:

Every week Django management command downloads the manifest of sources from S3
For each tarball in manifest, if tarball is new or checksum has changed, download tarball (tarball checksums should be stored in database)
For each source that is new/has changed, upload source file to directory on S3

The next step of this is to actually use the bulk sources for rendering, which depends on #123.

Error when submitting bug report

/submit-feedback returns 403 with CSRF token error

Example link goes to different paper

Regardless of the displayed string, the link always goes to https://www.arxiv-vanity.com/papers/1705.06031v2/

Waiting for paper to render shouldn't keep connection open

That ain't gonna scale.

Style rendering progress/error pages

Automatically update Engrafo version and clean up old images

At the moment I'm just manually running hyper pull bfirsh/engrafo to deploy a new version.

Handle source which is just a .tex file

e.g. 1708.05118v1

related to #8

Document architecture

To help contributors. We can copy a lot of this from the google doc.

Make report issue work on mobile

It should just be a link at the bottom, without screenshot UI.

See also #14

Make feedback form pretty

Use bootstrap modal and forms
Replace alert() with bootstrap modal/alerts
Make lip more subtle, with sans-serif
~~Put lip at bottom on mobile, without screenshot UI~~ #48
Reset form / create new form each time (you can't report multiple bugs without refreshing)
Taking screenshots doesn't seem to work in development

Track different versions of papers

v1, v2, etc. At the moment we naively assume each version is a new paper.

Make a Chrome extension

It should add a "View as HTML or Arxiv Vanity" link to any arxiv.org/abs/... link. I wonder if we can overlay on PDF pages, too.

Perhaps we could make a bookmarklet, too.

Display math in title and abstract

Clearer warning & beta message

Warn that papers are likely to be broken - link to PDF version of paper, and also point out report a problem and github repo.

Sweep up renders where the container doesn't exist

This means the render failed for some unknown reason and should be re-run.

Make footer not fixed on mobile

Engrafo job should report completion immediately

Currently, render job state is synced with Hyper in batch using ./manage.py update_render_state. This could be done immediately if the render job called a URL on completion.

In theory, this would mean we could pass --rm to the container and not have to clean up containers in batch, but we might not want to do that in case there is an error that means the completion handler isn't called. We'd then lose the logs and never know why it failed.

add endpoint to update render state
call endpoint when engrafo job ends
periodically update render state of dead jobs and remove containers that have finished

Add paper to random list

https://www.arxiv-vanity.com/papers/1710.07035/

Set caching headers correctly so Cloudflare caches stuff

Things to think about:

CSRF tokens (we can probably do away with most of these)
Session/login cookies (ignore? don't cache if you're logged in?)

Add link to list of all papers somewhere

https://www.arxiv-vanity.com/papers/

At the very least so search engines can index it.

Some containers aren't being deleted on hyper.sh

Investigate why...

Add a user agent to all arxiv.org requests

To be nice.

Fix styling of rendering/error pages

This broke when changing home page.

Make text in bar at the top smaller on mobile

Same size as metadata

Launch checklist

remove basic auth
rerender pages that are on random loop on home page
post to HN
tweet

Credits & link to GitHub repository

... somewhere. Us and Distill.

Display warning if paper looks broken

"Hey this looks broken. Give it a try if you want, but you probably want to look at the PDF."

It should only show if there is a bug that actually makes the paper unreadable. Just bad styling is not the end of the world.

Warnings it should display:

Missing citations. (.engrafo-missing-cite)
Missing figures. (.engrafo-missing-ref)
Broken math. (.mjx-merror)

Check index.html exists on S3 before marking paper as rendered successfully

Sometimes index.html doesn't exist even though Engrafo has exit code 0.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.