Coder Social home page Coder Social logo

pinterest / querybook Goto Github PK

View Code? Open in Web Editor NEW
1.7K 34.0 213.0 47.73 MB

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

Home Page: https://www.querybook.org

License: Apache License 2.0

Shell 0.13% JavaScript 2.52% HTML 0.04% Makefile 0.05% Dockerfile 0.06% Python 30.80% Mako 0.01% TypeScript 58.34% CSS 0.01% SCSS 4.26% Mustache 0.03% MDX 3.76%
metastore analyses hive presto notebook typescript flask celery charting

querybook's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

querybook's Issues

Query Execution Access Control

  • Adds control to who can access query executions, logs, and results.
  • Adds request for access functionality to query executions

Add the ability to customize welcome & no environment messages

Story
As a user, it would be confusing when I go on DataHub and it does not tell me why I cannot see any environments.
As an admin, I want to give pointers to new users when they first visit DataHub.

Acceptance

  • Welcome & No environment messages can be customized through something similar to the plugins model
  • Admins can use markdown to provide a custom message

Filter example queries by query engine

Story
As a user, I would only want to view query examples by a certain query engine

Assumption

  • Adding query engine filter would not impact the count of other filters (by user, by table join)

Acceptance

  • User can filter query examples so that only examples ran with one selected query engine would show up

Scheduled DataDoc V2

Story
As a user I want to export multiple query results externally

Acceptance

  • Scheduled DataDocs allows exporting multiple query results with custom exporter settings for each

Add private DataDoc to search functionality

Currently, all private DataDocs are not indexed on Elasticsearch for simplification of logic. Since most of the DataDocs will be private by default with FGAC, it is essential to make them searchable from Elasticsearch. The new Elasticsearch table for DataDocs should include 2 more fields: public and readable_user_ids. The second field readable_user_ids should include every user who can access this private DataDoc.

Better unit test system with example test data

As a developer, I want to set up data source unit tests quickly with some example data in database

Acceptance:

  • Use demo data to setup test
  • fixture should be function level, so they can be swapped in/out
  • create 1 or more unit test examples that use these data

Auto format breaks when encountering s3 urls

expected formatting:

DELETE JAR s3://test-bucket/hadoopusrs/prod/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

-> same

actual formatting

DELETE JAR s3://test-bucket/hadoopusrs/prod/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

->

DELETE JAR s3: / / test - bucket / hadoopusrs / prod / test -0.5 - SNAPSHOT / test -0.5 - SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

Improve ElasticSearch for code search

When searching xxx.yyy in data doc search, yyy would return nothing and users have to search xxx.yyy to find the result. The strategy will be provide multiple analyzers to analyze code and rich text differently

DataDoc Date Range filter does not work

The request does not return when you add a filter for start date or end date.
Things to check:

  • Why does the search request fail with a start date or end date as a filter
  • There should be an error UI when the request fails, instead of being stuck at the loading state.

Many to many query engine <-> environment + orderable query engine

Story
As an admin, I want to add the same query engine to different environments without worrying about duplicating the config.
As an admin, I want to be able to order query engine in the dropdown so that I can order them differently for the user.

Assumption

  • A query engine and environment should be joined with an intermediate table
  • Extend single environment check to multiple should be easy

Acceptance

  • Admins can add the same query engine to multiple environments
  • Admins can order environment via drag and drop UI and it gets reflected in the query engine selector / query status etc

Show frequent users of a table

As a user, I want to see who are the frequent users of a table so I can ask them questions.

Assumption:
Use query samples to obtain info about the common query runners

Acceptance:

  • A new UI that shows a list of top 10 users of a table and their frequency
  • (optional) ability to filter query samples by users

[BUG] Using EXTRACT from presto syntax breaks syntax highlighting

Problem:
The sql-lexer assumes that anything that is a VARIABLE type following a FROM statement is a table and breaks the suggestions.

Root cause:
Presto allows a FROM clause in front of things other than table names

The types supported by the extract function vary depending on the field to be extracted. Most fields support all date and time types.
extract(field FROM x) โ†’ bigint
Returns field from x.

Code where this fails:

         while (!stream.eol()) {
            // here the match fails, and because nothing gets consumed it goes off in an infinite loop if the match is handled
            // Maybe the right thing to do is, if there's no match, break out of the stream matching?
            const match = stream.match(/^([_\w\d]+|`.*`)\.?/, true);

           // this fails and kicks you out of the loop, but then the suggestions stop working
            if (match[1]) {
                let part = match[1];
                if (part.charAt(0) === '`') {
                    // remove first and last char
                    part = part.slice(1, -1);
                }
                parts.push(part);
            }

short snippet of what caused this:

   SELECT *
   FROM table_2
  JOIN table_1
        ON table_1.field_1 = table_2.field_2
       AND extract(YEAR FROM field_1_date) = table_2.field_year

Cell Deletion UX

  • disable deleting cell with backspace
  • add keyboard shortcut with confirmation

Add a user setting for query results text size

There is a user setting for editor text size. It would be nice to have a similar setting for the text size of the query results.
We can also reuse such setting for query results size

Table warning system

Add a table warning system in DataHub where users can put their own warning messages for a table. This warning message will be shown by the linter while user is writing code.

Datahub Notification Plugin

Create Notifier plugin model to allow for different orgs to add new notification services such as ms teams. Notifier will handle sending query completion messages as well as doc permission change messages to DataHub users.

Querybook is not aware of default schema information

By default the default schema name is 'default', which does not apply to all cases since this can be overridden in the connection string. This setting would also be different for different query engines, for example, sqlite's default is actually 'main' instead of 'default'

Acceptance

  • backend should perform query analysis based on dynamic default schema information based on language and connection setting
  • Frontend should grab this information from backend and perform a similar analysis

Exporter V2

Together with #202, they should help with the experience of exporting

Story
As an user I want to export my entire DataHub query results without worrying about the preview size

Acceptance

  • The size of query results is exported not subjected to the size of query result but the maximum accepted size of the exporter
  • The export process should be async now and should optionally report a progress

Improve row samples

Add field selection to row samples, by default, all columns are selected
Users can export the raw query
Users can copy the result to clipboard as tsv

Don't show table fields if there is no information

This change will apply to the following views:

  • Tooltip view
  • Sidebar view
  • Full table view

Fields such as partition, hive metastore information, query users, should be all hidden if there is no information to show

Add vscode support for DataHub

Story
As a user, I want to use Vscode to develop DataHub with minimal amount of effort

Acceptance

  • Suggest a list of extensions that are essential for DataHub development (prettier, black, etc)
  • Suggest some standard vscode settings (formatting on save etc...)
  • Make devcontainer.json so that users can easily launch datahub with vscode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.