Coder Social home page Coder Social logo

arxiv.py's Introduction

arxiv.py

Python wrapper for the arXiv API: http://arxiv.org/help/api/index

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

They have an API that uses ATOM feeds to serve simple database queries. Unfortunately, handling these ATOM requsts can be clumsy (especially given inconsistency in data between different result objects, even in the same query). This is where arxiv.py comes it: it constructs requests for arXiv and gets ATOM feeds via a simple handful of methods, and parses the results into an intuitive format.

Cool demos hopefully coming soon!

Docs

To get the arxiv package, simply run pip install arxiv at the command line.

At the beginning of your Python script, include the line import arxiv.

Query

arxiv.query(s, prune=True, start=0, max_results=10)

Sends arXiv a simple query, and returns a list of results, each of which is a dict representing an article that matches the query. The articles are ordered for relevance by arXiv.

  • When bool prune is True (default), a number of artifacts of the ATOM-to-dict conversion are removed from each result to isolate the useful fields. When prune is False, prune_query_result is not called and those key/value pairs are not removed.
  • Integer start identifies a 0-indexed position where the query results begin. For example, query('term', start=4) will only request and return results indexed 4-14.
  • Integer max_results identifies the number of results to be returned (thus, query will return results at positions start through start + max_results). There are some upper limits involved; if you want to pull >60,000 results at a time you should look at the arXiv API documentation.

Clean query results

arxiv.mod_query_result(result)

Takes a query result dict representing an article and modifies some keys and values to be more user-readable. See code for specifics.

arxiv.prune_query_result(result)

Takes a query result dict representing an article and removes some keys that are redundant or useless. See code for specifics.

Download PDF

arxiv.download(obj)

Looks up keys pdf_url and title on dict obj. Downloads the PDF from pdf_url and saves it to {title}.pdf in the present working directory.

arxiv.py's People

Contributors

jacquerie avatar japoneris avatar lukasschwab avatar mdamien avatar msoelch avatar natfarleydev avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.