Coder Social home page Coder Social logo

arxivpy's Introduction

Arxivpy

License

Python wrapper for arXiv API. Here are related libraries and repositories: arxiv.py, python_arXiv_parsing_example.py and arxiv-sanity-preserver. arXiv is an open-access journal which has 1M+ e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

Example

Here is an example on how to use arxivpy.

import arxivpy
articles = arxivpy.query(search_query=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
                         start_index=0, max_index=200, results_per_iteration=100,
                         wait_time=5.0, sort_by='lastUpdatedDate') # grab 200 articles

Input search_query can be list of categories or string of arXiv formatted query. Output is a list of dictionary parsed from arXiv XML file. This example will parse 200 last update papers (from index 0 to 200), 100 at a time with wait time around 5 seconds (see note below if scraping many papers).

Queries

You can use other search queries, for example:

search_query=['cs.DB', 'cs.IR']
search_query='cs.DB' # select only Databases papers
search_query='au:kording' # author name includes Kording
search_query='ti:deep+AND+ti:learning' # title with `deep` and `learning`
search_query='abs:%22deep+learning%22' # deep learning as a phrase

Or you can make simple search query using arxivpy.generate_query

search_query = arxivpy.generate_query(terms=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
                                      prefix='category', boolean='OR')

Or convert plain simple text to arXiv query using arxivpy.generate_query_from_text

query = arxivpy.generate_query_from_text("author k kording & author achakulvisut & title science & abstract recommendation") # awesome paper
articles = arxivpy.query(search_query=query)

More search query prefixes, booleans and categories available can be seen from wiki page. More example queries can be found from arXiv user manual

Download PDF

You can also use arxivpy.download to download the articles to given directory. Here is a snippet to do that.

arxivpy.download(articles, path='arxiv_pdf')

Note from API

  • The maximum number of results returned from a single call (max_index) is limited to 30000 in slices of at most 2000 at a time.
  • In case where the API needs to be called multiple times in a row, we encourage you to play nice and incorporate a 3 seconds delay in your code.

Installation

The easiest way is to use pip.

pip install git+https://github.com/titipata/arxivpy

You can also do it manually by cloning the repository and run setup.py to install the package.

git clone https://github.com/titipata/arxivpy
cd arxivpy
python setup.py install

Dependencies

arxivpy's People

Contributors

titipata avatar hrg921 avatar bluenex avatar rigelk avatar lis123kr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.