gtoonstra / databook Goto Github PK
View Code? Open in Web Editor NEWA facebook for data
License: Apache License 2.0
A facebook for data
License: Apache License 2.0
Use an API to connect to Tableau and extract the quer(ies) used to populate a dashboard. Try to extract with the API to with datasource it connects to identify the database.
I've written a library called "sqlineage" that helps to find out the tables that are involved in the query, which should make it possible to establish relationships between the tableau dashboard and the underlying datasources.
The template operator should implement a method to 'audit' the transfer of data from one source to a destination and write some metadata somewhere describing the transfer (at a minimum the source db/schema/table and the destination) and perhaps some statistics like rowcount, when it was last run, etc.
Use sqlalchemy to build a generic "metadata extractor" from a database. Allow for filtering of some tables and then build a json extract file with column names, primary key, comments and data type information. This can be used on a table info page.
Extract metadata about hive tables, turn them into import files and run them through the importer.
Right now https://github.com/gtoonstra/databook#prerequisites clearly states:
"You'll need a Mac or Linux with a docker installation to run the sample deployment of databook."
I don't own a Mac, and I'd like to use a combination of WSL (a Ubuntu based sub-system that keeps getting closer and closer to being a "capable enough" Linux) and Docker on my Windows machine in place of having to wrangle a VM.
This issue (which I guess I just volunteered to work on fixing ๐) is just to have a place to track what I come across:
Today I just got one step closer to it working, as company IT here finally let Win 10 version 1709 (Fall Creators Update) out of the bag. It resolved an issue with gUnicorn couldn't run because WSL was missing /proc/<pid>/status
in Win 10 version 1703 (this BTW also affected airflow webserver
with Airflow 1.9.0)
So, long story short: To run databook webserver
on WSL bash - a prerequisite is you need to be on v1709
Windows docker uses windows mount point names. This doesn't work well with the current docker-compose files. I hope https://nickjanetakis.com/blog/setting-up-docker-for-windows-and-wsl-to-work-flawlessly#ensure-volume-mounts-work will provide an elegant fix to that
more...? I hope not ...
When I have it working my plan is to submit a PR on the README or something ...
Write a crawler for a github repository to extract SQL code (DDL) and extract some metadata about it:
Then add this metadata to an input file that enriches the graph database.
@gtoonstra , just FYI that Lyft has built a data portal project named Amundsen, which is also inspired by Airbnb data portal. Our project is now open source:
The last repo will be a extractor/model library repo which is intent to be using in Airflow DAG. We put some examples on how to use that library in an Airflow DAG.
Thanks,
The group's page can visualize a bit of information, but there's very little interaction ability at the moment. The "add link" doesn't work and no information about the group is present (except for the memberships).
Make it possible to share some data with the group or otherwise add some links where this sharing info is stored (confluence, wiki, etc).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.