Coder Social home page Coder Social logo

inbase's Introduction

InBase

Build Status Coverage License: CC0-1.0

InBase provides a convenient pandas DataFrame of the 585 inteins in the unmaintained inteins.com InBase database. The protein sequences are available as biopython SeqRecord objects, but otherwise nothing else is changed from the inteins.com metadata.

InBase was collected using scrapy and can updated as detailed in the "update database" section below.

Installation

pip install --user git+https://github.com/omsai/inbase

Usage

from inbase import INBASE

# See first few lines of all inteins.
INBASE.head()
# See first intein.
INBASE.ix[0]
# Access biopython seq record information of first intein.
INBASE.ix[0, 'Intein aa Sequence']
# Count archea inteins.
INBASE['Domain of Life'].unique()
(INBASE['Domain of Life'] == 'Archaea').sum()
# Count all inteins.
len(INBASE)

Development Environment

Virtual environments and tests are orchestrated using tox. Install tox using pip:

pip install --user tox

Make sure that ~/.local/bin or similar is in your path per PEP 370.

Install without tests:

tox --notest -e py27

Update DataBase

Unfortunately scrapy does not provide an update function to check against the existing JSON data. One has to redownload the database, but which only takes a few seconds. First, you will need to clone this repository and create a "development environment" as described in the section above. Then initialize the data environment with the scrapy extras package:

tox --notest -e data

Check the current number of inbase records:

cat data/inbase.json | wc -l | xargs expr -2 +

Redownload the data:

rm data/inbase.json
.tox/data/bin/scrapy runspider -o data/inbase.json inbase/update.py

Check the new number of records:

cat data/inbase.json | wc -l | xargs expr -2 +

If there indeed are more records, update your Manifest checksums, re-run the data tests and update your git repository and submit a pull request:

version=$(date +%Y%m%d.1)
sed -i -E "s#(version=').*('.+)#\1${version}\2#" setup.py
.tox/data/bin/gemato create --hashes "MD5 SHA1 SHA256" data/
tox -e data
git commit setup.py data/* -m "MAINT: Update inbase database on $(date -I)"
git push

Tests

Run all non-data tests using:

tox

Debug failing tests:

tox --pdb

If you add dependencies and get import errors, you need to recreate the tox environment:

tox --recreate

When you edit the files, you're likely going to create lots of linter errors caught by the tox unit tests if your text editor doesn't have interactive error reporting. If you use Emacs, you can configure it for python development by installing elpy.

inbase's People

Contributors

omsai avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.