Coder Social home page Coder Social logo

givemedata's Introduction

givemedata

A convenient way to use and explore multiple databases from Python

What is this?

Using multiple data sources may become a pain to manage. You have different credentials here and there, different db names, different drivers and wrappers. You don't like that, do you?

  • Givemedata is a Python tool that allows to define a single configuration file containing all the data sources that you are using
  • Then it provides a simple way to access all of those data sources leveraging the power of Pandas

Here is a simple example:

Step 1 (Installation):

pip install givemedata

Step 2 (Configuration):

Create a configuration file in YAML format - either in your home directory to use just by yourself or in /etc/givemedata/ to provide the configuration to multiple users at once (e.g. if you're using JupyterHub)

The file must be named either givemedata.yaml or .givemedata.yaml (in case if you prefer to have a hidden file)

Example of configuration:

Work:
  Prod:
    WebApp:
      RO: postgresql://read_only_user:[email protected]:5432/web
      RW: postgresql://read_write_user:[email protected]:5432/web
    Analytics: postgresql://analytics_prod_user:[email protected]:5432/analytics
  Stage:
    Analytics: postgresql://analytics_stage_user:[email protected]:5432/analytics
Personal:
  ML_class: postgresql://me:[email protected]:5432/ml

As you can see givemedata supports hierarhical structure where you can logically place your data sources in a way that is most intuitive and clear.

Basically you define a dictionary-like stucture that can be recursively nested. Data sources are defined as connection strinsg like this: <DB_TYPE>://<DB_USER>:<DB_PASSWORD>@<IP>:<PORT>/web_app_backend

Step 3 (Loading givemedata object):

In a Python console of your choice (IPython, Jupyter, an IDE, whatever..):

from givemedata import Data

Data is an interface object created by givemedata based on the config you've just specified. It has the same nested structure as the config and even supports autocompletion :)

Let's assume we want to address the Analytics db on production. This can me made as easy as this:

db = Data.Work.Prod.Analytics

Step 4 (Using data sources):

The DB object in givemedata have a few methods attached to it:

  • sql(query, *, limit) -> returns an SQL query result as a Pandas DataFrame
  • public_tables (property) -> returns a dataframe with information about all public tables in the DB.
  • all_tables (property) -> returns a dataframe with information about all tables in the DB - both public and service.
  • public_fields (property) -> all public fields
  • all_fields (property) -> all fields, public and service
These properties also provide some helpers:
  • (for both - fields and tables)

  • search(term_as_string) -> searches the table/field names by the given term -> returns object similar to itself so can be chained

  • sample(df_index_as_int) -> displays sample of table rows or field values

  • (for tables only)

  • describe(df_index_as_int) -> displays meta data about the table - fields, datatypes, etc

Limitations & Other Notes

  • By now only PostgreSQL is supported as the project is in the earliest possible development stage. This was made to support my own workflow but I have plans to improve this and add support for different databases such as MySQL, SQLite3, Cassandra, etc
  • Only read operations are supported (this is the first priority in the road map, so it will be fixed soon)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.