Coder Social home page Coder Social logo

pinot-docs's Introduction

description
Apache Pinot is a real-time distributed OLAP datastore purpose-built for low-latency, high-throughput analytics, and perfect for user-facing analytical workloads.

Introduction

Apache Pinot™ is a real-time distributed online analytical processing (OLAP) datastore. Use Pinot to ingest and immediately query data from streaming or batch data sources (including, Apache Kafka, Amazon Kinesis, Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage).

{% hint style="info" %} We'd love to hear from you! Join us in our Slack channel to ask questions, troubleshoot, and share feedback. {% endhint %}

Apache Pinot includes the following:

  • Ultra low-latency analytics even at extremely high throughput.
  • Columnar data store with several smart indexing and pre-aggregation techniques.
  • Scaling up and out with no upper bound.
  • Consistent performance based on the size of your cluster and an expected query per second (QPS) threshold.

It's perfect for user-facing real-time analytics and other analytical use cases, including internal dashboards, anomaly detection, and ad hoc data exploration.

{% embed url="https://youtu.be/_lqdfq2c9cQ" %} What is Apache Pinot? (and User-Facing Analytics) by Tim Berglund {% endembed %}

User-facing real-time analytics

User-facing analytics refers to the analytical tools exposed to the end users of your product. In a user-facing analytics application, all users receive personalized analytics on their devices, resulting in hundreds of thousands of queries per second. Queries triggered by apps may grow quickly in proportion to the number of active users on the app, as many as millions of events per second. Data generated in Pinot is immediately available for analytics in latencies under one second.

User-facing real-time analytics requires the following:

  • Fresh data. The system needs to be able to ingest data in real time and make it available for querying, also in real time.
  • Support for high-velocity, highly dimensional event data from a wide range of actions and from multiple sources.
  • Low latency. Queries are triggered by end users interacting with apps, resulting in hundreds of thousands of queries per second with arbitrary patterns.
  • Reliability and high availability.
  • Scalability.
  • Low cost to serve.

Why Pinot?

Pinot is designed to execute OLAP queries with low latency. It works well where you need fast analytics, such as aggregations, on both mutable and immutable data.

User-facing, real-time analytics

Pinot was originally built at LinkedIn to power rich interactive real-time analytics applications, such as Who Viewed Profile, Company Analytics, Talent Insights, and many more. UberEats Restaurant Manager is another example of a user-facing analytics app built with Pinot.

Real-time dashboards for business metrics

Pinot can perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. For instance, at LinkedIn, Pinot powers dashboards for thousands of business metrics. Connect various business intelligence (BI) tools such as Superset, Tableau, or PowerBI to visualize data in Pinot.

Enterprise business intelligence

For analysts and data scientists, Pinot works well as a highly-scalable data platform for business intelligence. Pinot converges big data platforms with the traditional role of a data warehouse, making it a suitable replacement for analysis and reporting.

Enterprise application development

For application developers, Pinot works well as an aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for a query using SQL. You can also use Pinot to aggregate data across a microservice architecture into one easily queryable view of the domain.

Pinot tenants prevent any possibility of sharing ownership of database tables across microservice teams. Developers can create their own query models of data from multiple systems of record depending on their use case and needs. As with all aggregate stores, query models are eventually consistent.

Get started

If you're new to Pinot, take a look at our Getting Started guide:

{% content-ref url="basics/getting-started/" %} getting-started {% endcontent-ref %}

To start importing data into Pinot, see how to import batch and stream data:

{% content-ref url="basics/data-import/" %} data-import {% endcontent-ref %}

To start querying data in Pinot, check out our Query guide:

{% content-ref url="users/user-guide-query/" %} user-guide-query {% endcontent-ref %}

Learn

For a conceptual overview that explains how Pinot works, check out the Concepts guide:

{% content-ref url="basics/concepts/" %} concepts {% endcontent-ref %}

To understand the distributed systems architecture that explains Pinot's operating model, take a look at our basic architecture section:

{% content-ref url="basics/architecture.md" %} architecture.md {% endcontent-ref %}

pinot-docs's People

Contributors

agavra avatar egalpin avatar gortiz avatar jackie-jiang avatar jadami10 avatar jasperjiaguo avatar jeffbolle avatar jugomezv avatar kelseiv avatar kirkrodrigues avatar kkcorps avatar klsince avatar ksnijjer avatar matthewhelmke avatar mayankshriv avatar mneedham avatar navina avatar noramullen1 avatar npawar avatar sabrinazhaozyf avatar siddharthteotia avatar snleee avatar somandal avatar swaminathanmanish avatar t0mpere avatar tlberglund avatar vvivekiyer avatar xiangfu0 avatar yashmayya avatar zhtaoxiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pinot-docs's Issues

Running the kafka-using quickstart examples results in an error

When I try running

docker run \
    -p 9000:9000 \
    apachepinot/pinot:0.9.3 QuickStart \
    -type stream

or

docker run \
    -p 9000:9000 \
    apachepinot/pinot:0.9.3 QuickStart \
    -type stream_json_index

I get this error:

***** Starting Kafka *****
***** Starting meetup data stream and publishing to Kafka *****
javax.websocket.DeploymentException: Handshake error.
...
Caused by: org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 404.
	at org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:299)
	at org.glassfish.tyrus.container.grizzly.client.GrizzlyClientFilter.handleHandshake(GrizzlyClientFilter.java:322)
	at org.glassfish.tyrus.container.grizzly.client.GrizzlyClientFilter.handleRead(GrizzlyClientFilter.java:291)

The other quickstart examples, those not containing kafka, run without error.
Is there some step I'm missing? Do these quickstart examples assume I already have Kafka running?

Superset new requires a SECRET_KEY environment variable

The docker instructions for SuperSet need to be changed
from:
docker run
--network pinot-demo
--name=superset
-p 8088:8088
-d apachepinot/pinot-superset:latest
to:
docker run
--network pinot-demo
--name=superset
-p 8088:8088
-e SUPERSET_SECRET_KEY=
-d apachepinot/pinot-superset:latest

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.