Coder Social home page Coder Social logo

greenelab / greenblack Goto Github PK

View Code? Open in Web Editor NEW
3.0 5.0 0.0 1010 KB

Does green OA via preprinting reduce Sci-Hub usage?

License: Creative Commons Zero v1.0 Universal

Jupyter Notebook 99.84% Python 0.16%
sci-hub biorxiv preprints unpaywall journals publication publishing dataset notebook

greenblack's Introduction

Work in progress

This codebase has not undergone internal peer review as per the Greene Lab policy.

Environment

This repository uses conda to manage its environment as specified in environment.yml. Install the environment with:

conda env create --file=environment.yml

Then use conda activate greenblack and conda deactivate to activate or deactivate the environment.

greenblack's People

Contributors

dhimmel avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

greenblack's Issues

Extracting access status of journal articles from Unpaywall

Unpaywall has a bulk download of their database available at https://unpaywall.org/products/snapshot. Completing a survey is required to obtain a download link. I filled out the survey and got the following download link:

https://s3-us-west-2.amazonaws.com/unpaywall-data-snapshots/unpaywall_snapshot_2018-09-24T232615.jsonl.gz

The website did not mention whether the database is released under any sort of license. I think it's likely that contents of the database are not subject to copyright or that our project would be fair use, so I will likely include subsets of the bulk download in this repository.

Multiple bioRxiv preprints link to the same journal publication

As part of this project, we're working with the Rxivist database of bioRxiv preprints. We've extracted a portion of this database and stored it in the file data/01.preprints.tsv.

Based on this dataset, it looks like there are 28 journal articles that are linked to by multiple bioRxiv preprints. In other words, bioRxiv sometimes thinks multiple preprints have been published in the same journal article.

The Python code

import pandas
url = 'https://github.com/greenelab/greenblack/raw/8f4502d5f62064dd483fbec18d21e5e31c35dc03/data/01.preprints.tsv'
preprint_df = pandas.read_csv(url, sep='\t').dropna()
# bioRxiv preprints with duplicated journal publications
duplicate_df = preprint_df[preprint_df.journal_doi.duplicated(keep=False)].sort_values(['journal_doi', 'preprint_doi'])
duplicate_df.to_csv('duplicated.tsv', sep='\t', index=False)
duplicate_df.journal_doi.nunique()

Here's the output table:

rxivist_preprint_id preprint_date preprint_doi journal_date journal_doi
9880 2017-06-27 10.1101/156331 2017-11-24 10.1007/s00401-017-1789-4
9828 2017-07-18 10.1101/165373 2017-11-24 10.1007/s00401-017-1789-4
16018 2016-12-07 10.1101/092221 2017-10-13 10.1007/s12559-017-9518-9
4696 2017-01-02 10.1101/097675 2017-10-13 10.1007/s12559-017-9518-9
10702 2014-09-04 10.1101/008755 2015-04-17 10.1016/j.ajhg.2015.03.004
10428 2016-04-05 10.1101/046995 2015-04-17 10.1016/j.ajhg.2015.03.004
20822 2017-05-12 10.1101/137190 2018-05-21 10.1016/j.cognition.2018.04.017
20709 2017-12-11 10.1101/231837 2018-05-21 10.1016/j.cognition.2018.04.017
30027 2015-10-14 10.1101/029066 2017-10-30 10.1038/s41598-017-14523-5
29724 2017-05-17 10.1101/138784 2017-10-30 10.1038/s41598-017-14523-5
10577 2015-09-08 10.1101/026278 2017-02-23 10.1038/srep43054
26249 2015-12-08 10.1101/033944 2017-02-23 10.1038/srep43054
10256 2016-09-30 10.1101/078360 2017-05-05 10.1073/pnas.1704442114
10029 2017-04-03 10.1101/122218 2017-05-05 10.1073/pnas.1704442114
20878 2016-10-07 10.1101/079699 2017-12-04 10.1080/01480545.2017.1405971
20857 2017-01-12 10.1101/099952 2017-12-04 10.1080/01480545.2017.1405971
28398 2016-11-23 10.1101/085795 2017-01-06 10.1093/jxb/erw488
28406 2016-11-16 10.1101/088153 2017-01-06 10.1093/jxb/erw488
26406 2015-07-07 10.1101/022061 2016-04-27 10.1098/rsob.160009
8583 2015-09-18 10.1101/027151 2016-04-27 10.1098/rsob.160009
9475 2017-03-20 10.1101/118729 2018-02-15 10.1101/gr.230433.117
10042 2017-03-21 10.1101/119016 2018-02-15 10.1101/gr.230433.117
26141 2016-03-24 10.1101/045369 2016-09-19 10.1111/jeb.12972
25961 2016-08-18 10.1101/070318 2016-09-19 10.1111/jeb.12972
17987 2018-03-28 10.1101/290866 2018-05-17 10.1152/japplphysiol.00012.2018
17938 2018-05-16 10.1101/324020 2018-05-17 10.1152/japplphysiol.00012.2018
25942 2016-08-30 10.1101/070003 2017-03-29 10.1371/journal.pcbi.1005375
25731 2017-01-27 10.1101/103739 2017-03-29 10.1371/journal.pcbi.1005375
20939 2014-11-29 10.1101/011908 2015-03-25 10.1371/journal.pone.0119337
18271 2014-12-26 10.1101/013268 2015-03-25 10.1371/journal.pone.0119337
3791 2017-10-10 10.1101/201251 2018-04-26 10.1371/journal.pone.0196135
3785 2017-10-19 10.1101/205542 2018-04-26 10.1371/journal.pone.0196135
9091 2018-05-09 10.1101/317891 2018-07-31 10.1371/journal.pone.0197699
8894 2018-07-09 10.1101/365130 2018-07-31 10.1371/journal.pone.0197699
12107 2017-02-14 10.1101/108639 2017-11-30 10.1523/jneurosci.1724-17.2017
14821 2017-06-30 10.1101/157628 2017-11-30 10.1523/jneurosci.1724-17.2017
10615 2015-06-07 10.1101/020529 2015-09-10 10.1534/g3.115.021659
26423 2015-06-12 10.1101/020826 2015-09-10 10.1534/g3.115.021659
10483 2016-02-03 10.1101/038729 2016-07-29 10.1534/genetics.116.187369
10155 2017-01-09 10.1101/098095 2016-07-29 10.1534/genetics.116.187369
26064 2016-05-26 10.1101/055517 2017-03-17 10.1534/genetics.116.196303
25909 2016-09-28 10.1101/078279 2017-03-17 10.1534/genetics.116.196303
10215 2016-11-17 10.1101/088260 2017-05-26 10.1534/genetics.116.198424
10213 2016-11-17 10.1101/088385 2017-05-26 10.1534/genetics.116.198424
7850 2017-04-03 10.1101/123554 2017-11-02 10.3390/e19110584
7427 2017-05-22 10.1101/140913 2017-11-02 10.3390/e19110584
22979 2017-03-31 10.1101/122580 2017-10-18 10.7554/elife.27356
23938 2017-04-08 10.1101/125765 2017-10-18 10.7554/elife.27356
22956 2017-04-19 10.1101/128595 2017-12-14 10.7554/elife.27827
19859 2017-11-01 10.1101/212274 2017-12-14 10.7554/elife.27827
9875 2017-04-24 10.1101/130054 2017-07-17 10.7554/elife.28069
9997 2017-04-28 10.1101/131995 2017-07-17 10.7554/elife.28069
16093 2016-11-03 10.1101/085548 2018-01-08 10.7554/elife.28927
4204 2017-06-28 10.1101/157263 2018-01-08 10.7554/elife.28927
11378 2017-04-07 10.1101/125419 2018-10-16 10.7554/elife.34870
11082 2018-01-25 10.1101/253872 2018-10-16 10.7554/elife.34870

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.