[Foundation Course] Apache Ignite Essentials: Key Design Principles for Building Data-Intensive Applications

This project is designed for a free instructor-led training on the Ignite essential capabilities and architecture internals. Check the complete schedule and join one of our upcoming training sessions.

Setting Up Environment

Java Developer Kit, version 8 or later
Apache Maven 3.0 or later
Your favorite IDE, such as IntelliJ IDEA, or Eclipse, or a simple text editor.

Clone The Project

Clone the training project with Git or download it as an archive:

git clone https://github.com/GridGain-Demos/ignite-essentials-developer-training.git

(optionally), open the project in your favourite IDE such as IntelliJ or Eclipse, or just use a simple text editor and command-line instructions prepared for all the samples.

Starting Ignite Cluster

Start a two-node Ignite cluster:

Open a terminal window and navigate to the root directory of this project.
Use Maven to create a core executable JAR with all the dependencies (Note, build the JAR even if you plan to start the sample code with IntelliJ IDEA or Eclipse. The JAR is used by other tools throughout the class):
```
mvn clean package -P core
```
If you see build errors, it may be because a firewall or proxy server is blocking access to GridGain's External Maven Repo which is used to download the module that connects to Control Center.
Start the first cluster node (or just start the app with IntelliJ IDEA or Eclipse):
```
java -cp libs/core.jar training.ServerStartup
```
Open another terminal window and start the second node:
```
java -cp libs/core.jar training.ServerStartup
```

Both nodes auto-discover each other and you'll have a two-nodes cluster ready for exercises.

Connecting to GridGain Control Center

You use GridGain Control Center throughout the course to see how Ignite distributes records, to execute and optimize SQL queries, and to monitor the state of the cluster.

Go to https://portal.gridgain.com.
Create an account to sign in into Control Center.
Just in case, generate a new token for the cluster (the default token expires in 5 minutes after the cluster startup time):
- Open a terminal window and navigate to the root directory of this project.
- Generate the token (the ManagementCommandHandler is the tool used by the management.sh|bat script of the Ignite Agent distribution package, you just call it directly with this training to skip extra downloads):
```
java -cp libs/core.jar org.gridgain.control.agent.commandline.ManagementCommandHandler --token
```
Register the cluster with Control Center using the token.

Creating Media Store Schema and Loading Data

Now you need to create a Media Store schema and load the cluster with sample data. Use SQLLine tool to achieve that:

Open a terminal window and navigate to the root directory of this project.
Assuming that you've already assembled the core executable JAR with all the dependencies, launch a SQLLine process:
```
java -cp libs/core.jar sqlline.SqlLine
```

Connect to the cluster:

!connect jdbc:ignite:thin://127.0.0.1/ ignite ignite

Load the Media Store database:
```
!run config/media_store.sql
```

Keep the connection open as you'll use it for following exercises.

Data Partitioning - Checking Data Distribution

With the Media Store database loaded, you can check how Ignite distributed the records within the cluster:

Open the Caches Screen of Control Center.
While on that screen, follow the instructor to learn some insights.

Optional, scale out the cluster by the third node. You'll see that some partitions were rebalanced to the new node.

Affinity Co-location - Optimizing Complex SQL Queries With JOINs

Ignite supports SQL for data processing including distributed joins, grouping and sorting. In this section, you're going to run basic SQL operations as well as more advanced ones.

Querying Single Table

Go to the SQL Notebooks Screen of Control Center.

Run the following query to find top-20 longest tracks:

SELECT trackid, name, MAX(milliseconds / (1000 * 60)) as duration FROM track
WHERE genreId < 17
GROUP BY trackid, name ORDER BY duration DESC LIMIT 20;

Joining Two Non-Colocated Tables

Modify the previous query by adding information about an author. You do this by doing a LEFT JOIN with the Artist table:

SELECT track.trackId, track.name as track_name, genre.name as genre, artist.name as artist,
MAX(milliseconds / (1000 * 60)) as duration FROM track
LEFT JOIN artist ON track.artistId = artist.artistId
JOIN genre ON track.genreId = genre.genreId
WHERE track.genreId < 17
GROUP BY track.trackId, track.name, genre.name, artist.name ORDER BY duration DESC LIMIT 20;

Once you run the query, you'll see that the artist column is blank for some records. That's because Track and Artist tables are not co-located and the nodes don't have all data available locally during the join phase.

Allow the non-colocated joins by enabling the Allow non-colocated joins checkbox on the Control Center screen.
Run the query again to see a complete and correct result.

Joining Two Co-located Tables

The non-colocated joins used above come with a performance penalty, i.e., if the nodes are shuffling large data sets during the join phase, your performance will be impacted. However, it's possible to co-locate Track and Artist tables, and avoid the usage of the non-colocated joins:

Search for the CREATE TABLE Track command in the media_store.sql file.
Replace PRIMARY KEY (TrackId) with PRIMARY KEY (TrackId, ArtistId).
Co-locate Tracks with Artist by adding affinityKey=ArtistId to the parameters list of the WITH ... operator.
As long as you changed the primary and affinity keys in runtime, you need to update the Ignite metadata before recreating the table:
- Open a terminal window and navigate to the root directory of this project.
- Enable the experimental features (Mac and Linux):
```
export IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true
```
- Enable the experimental features (Windows):
```
set IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true
```
- Clean the metadata for the Track object:
```
java -cp libs/core.jar org.apache.ignite.internal.commandline.CommandHandler --meta remove --typeName training.model.Track
```
- Clean the metadata for the TrackKey object:
```
java -cp libs/core.jar org.apache.ignite.internal.commandline.CommandHandler --meta remove --typeName training.model.TrackKey
```
Recreate the table using the SQLLine tool:
- Launch SQLine from a terminal window:
```
java -cp libs/core.jar sqlline.SqlLine
```
- Connect to the cluster:
```
!connect jdbc:ignite:thin://127.0.0.1/ ignite ignite
```
- Load the Media Store database:
```
!run config/media_store.sql
```
In Control Center, run that query once again and you'll see that all the artist columns are filled in because now all the Tracks are stored together with their Artists on the same cluster node.

Running Co-located Compute Tasks

Run training.ComputeApp that uses Apache Ignite compute capabilities for a calculation of top-5 paying customers. The compute task executes on every cluster node, iterates through local records and responds to the application that merges partial results.

Build an executable JAR with the applications' classes (or just start the app with IntelliJ IDEA or Eclipse):
```
mvn clean package -P apps
```

Run the app in the terminal:

java -cp libs/apps.jar training.ComputeApp

Check the logs of the ServerStartup processes (your Ignite server nodes) to see that the calculation was executed across the cluster.

Modify the computation logic:

Update the logic to return top-10 paying customers.
Re-build an executable JAR with the applications' classes (or just start the app with IntelliJ IDEA or Eclipse):
```
mvn clean package -P apps
```

Run the app again:

java -cp libs/apps.jar training.ComputeApp

sthapa123 / ignite-essentials-developer-training Goto Github PK

ignite-essentials-developer-training's Introduction

[Foundation Course] Apache Ignite Essentials: Key Design Principles for Building Data-Intensive Applications

Setting Up Environment

Clone The Project

Starting Ignite Cluster

Connecting to GridGain Control Center

Creating Media Store Schema and Loading Data

Data Partitioning - Checking Data Distribution

Affinity Co-location - Optimizing Complex SQL Queries With JOINs

Querying Single Table

Joining Two Non-Colocated Tables

Joining Two Co-located Tables

Running Co-located Compute Tasks

ignite-essentials-developer-training's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent