Coder Social home page Coder Social logo

olivettigroup / article-downloader Goto Github PK

View Code? Open in Web Editor NEW
112.0 17.0 33.0 7.88 MB

Uses publisher APIs to programmatically retrieve scientific journal articles for text mining.

Home Page: https://pypi.python.org/pypi/articledownloader

License: MIT License

Python 100.00%
api-wrapper python

article-downloader's Introduction

article-downloader

Circle CI Documentation Status DOI

Uses publisher-approved APIs to programmatically retrieve large amounts of scientific journal articles for text mining. Exposes a top-level ArticleDownloader class which provides methods for retrieving lists of DOIs (== unique article IDs) from text search queries, downloading HTML and PDF articles given DOIs, and programmatically sweeping through search parameters for large scale downloading.

Important Note: This package is only intended to be used for publisher-approved text-mining activities! The code in this repository only provides an interface to existing publisher APIs and web routes; you need your own set of API keys / permissions to download articles from any source that isn't open-access.

Full API Documentation

You can read the documentation for this repository here.

Installation

Use pip install articledownloader. If you don't have pip installed, you could also download the ZIP containing all the files in this repo and manually import the ArticleDownloader class into your own Python code.

Usage

Use the ArticleDownloader class to download articles. You'll need an API key, and please respect each publisher's terms of use.

It's usually best to add your API key to your environment variables with something like export API_KEY=xxxxx.

You can find DOIs using a CSV where the first column corresponds to search queries, and these queries will be used to find articles and retrieve their DOIs.

Examples

Downloading a single PDF article

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')
my_file = open('my_path/something.pdf', 'w')  # Need to use 'wb' on Windows

downloader.get_pdf_from_doi('my_doi', my_file, 'crossref')

Downloading a single HTML article

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')
my_file = open('my_path/something.html', 'w')

downloader.get_html_from_doi('my_doi', my_file, 'elsevier')

Getting metadata

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader(els_api_key='your_elsevier_API_key')

#Get 500 DOIs from articles published after the year 2000 from a single journal
downloader.get_dois_from_journal_issn('journal_issn', rows=500, pub_after=2000)

#Get the title for a single article (only works with CrossRef for now)
downloader.get_title_from_doi('my_doi', 'crossref')

#Get the abstract for a single article (only works with Elsevier for now)
downloader.get_abstract_from_doi('my_doi', 'elsevier')

Using search queries to find DOIs

CSV file:

search query 001,
search query 002,
search query 003,
.
.
.

Python:

from articledownloader.articledownloader import ArticleDownloader
downloader = ArticleDownloader('your_API_key')

#grab up to 5 articles per search
queries = downloader.load_queries_from_csv(open('path_to_csv_file', 'r'))

dois = []
for query in queries:
  dois.append(downloader.get_dois_from_search(query))

for i, doi in enumerate(dois):
    my_file = open(str(i) + '.pdf', 'w')
    downloader.get_pdf_from_doi(doi, my_file, 'crossref') #or 'elsevier'
    my_file.close()

article-downloader's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

article-downloader's Issues

Using in psychopy

I'm trying to use articledownloader through psychopy but this line of code:
from articledownloader.articledownloader import ArticleDownloader
produces this error:
File"C:\Users\legg0028\Dropbox\flinders_university\owen_R_classes\download_journal\0.1_article_downlod\articledownloader\articledownloader.py", line 5, in <module>from autologging import logged, traced ImportError: No module named autologging

Is this because I have not imported the requirements (require, etc.)? If so, do you know how I can import these through psychopy?

Query Construction

This looks interesting, but I'm not sure how to construct queries properly. How should the queries in the csv be constructed to retrieve DOI's from Elsevier, say?

Unable to obtain html/pdf files

Hello

I have tried the following to download a paper:

import articledownloader
from articledownloader.articledownloader import ArticleDownloader

downloader = ArticleDownloader(els_api_key='I have used my elsevier API key here')
sample_file=open('firstpdf.pdf','w')
downloader.get_pdf_from_doi('10.1016/j.jbiosc.2010.11.006','sample_file', 'elsevier')
sample_file.close()

In the above piece of code, I have used my elsevier API key.

The issue is, I am unable to write anything in the file that I have created. I tried writing in HTML format, which did not help too. I was wondering if there is some issue with the method.

Kindly do the needful.

articledownloader not found

Hi

I have installed articledownloader using pip install articledownloader. When I try to run a python script, it shows that article downloader is not found.

It will be great if I can get any help on this.

Error while using article downloader

Hi,

I am getting an error.
C:\ProgramData\Anaconda3\lib\site-packages\articledownloader\articledownloader.py in
3 import re
4 import json
----> 5 import scrapers
6 from autologging import logged, traced
7 from csv import reader

ModuleNotFoundError: No module named 'scrapers'

I tried installing scraper and scrapers separately. I could not find scrapers anywhere.

Looking forward to the solution.

The isssues with get_html_from_doi

I found, it couldnot extract full text with the type of .html when I used the get_html_from_doi for springer and wiley, only the abstract obtained. But when I used the get_pdf_from_doi , the full text with the type .pdf were obtained. Are there any issues with the get_html_from_doi or other problems?
Thank you.

Code issue

Our group used to use this package for some research, but it seems not available now, maybe we think that publisher website has been upgraded, Wiley and ACS is not available now, while Springer can still work. I'm curious if anyone else can still handle these codes now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.