A tool to automate analytic platform evaluations
Barometer helps customers to get data points needed for service selection/service configurations for given workload.
Barometer tool is created by AWS Prototyping team (EMEA)
- Description
- Use cases
- Pre-requisites
- Installing
- Deployment
- Quickstart
- Run Benchmark Only
- Bring your own workload
- Architecture
- Cleanup
- See Also
Barometer will deploy cdk stack which is used to run benchmarking experiments. The experiment is a combination of platform and workload which can be defined using cli-wizard provided by Barometer tool. Example running experiment in Quickstart.
- Comparison of service performance: Redshift vs Redshift Serverless
- Comparison of configurations: Redshift dc2 vs ra3 node type
- Performance impact of feature: Redshift AQUA vs Redshift WLM
- Right tool for the job selection: Athena vs Redshift for your workload
- Registering your custom platform: Redshift vs My Own Database
- Registering your custom workload: My own dataset vs Redshift
- Run benchmarking only on my platform
- Bring your own workload (dataset, ddl and queries to benchmark)
Barometer supports below combinations as experiment
-
Supported platforms:
-
Supported workloads:
- Docker: Install docker service and docker cli. This tool uses docker to build image and run containers.
- Minimum disk space of 2 GB for building and deploying docker image
Clone this repository and run docker build -t barometer .
in barometer
directory (root of the git project)
- Run below command to deploy
barometer
to your aws account.
# Example 1: Passing local aws credentials to the docker container for deployment (deploying in eu-west-1 region)
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws barometer deploy eu-west-1
# Example 2: Using AWS profile (ex: dev) to deploy
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev barometer deploy eu-west-1
# Example 3: Passing aws region as environment variable
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.aws:/root/.aws -e AWS_PROFILE=dev \
-e AWS_REGION=eu-west-1 barometer deploy
# Example 4: Using aws secret access key and aws secret id to deploy (with optional session token - temporary credentials)
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
-e AWS_ACCESS_KEY_ID=<my-aws-access-key-id> \
-e AWS_SECRET_ACCESS_KEY=<my-aws-secret-access-key> \
-e AWS_SESSION_TOKEN=<my-session-token> \
barometer deploy eu-west-1
- Run below command to run
cli-wizard
oncebarometer
is successfully deployed to your AWS account.
# Example 1: Passing local aws credentials to the docker container for running wizard (deployed in eu-west-1 region)
docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \
--name barometer-wizard \
barometer wizard eu-west-1
# Example 2: Using AWS profile (ex: dev) to run wizard
docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev \
--name barometer-wizard \
barometer wizard eu-west-1
# Example 3: Using aws secret access key and aws secret id to run wizard (with optional session token - temporary credentials)
docker run -it -v /var/run/docker.sock:/var/run/docker.sock \
-e AWS_ACCESS_KEY_ID=<my-aws-access-key-id> \
-e AWS_SECRET_ACCESS_KEY=<my-aws-secret-access-key> \
-e AWS_SESSION_TOKEN=<my-session-token> \
--name barometer-wizard \
barometer wizard eu-west-1
# Example 4: Reusing wizard configurations
docker start -ia barometer-wizard
# Example 5: Persisting wizard configurations
docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \
-v ~/storage:/build/cli-wizard/storage \
--name barometer-wizard \
barometer wizard eu-west-1
This option can be used as
Benchmark your own platform
orBring your own platform
You can directly benchmark any database with this option. The option is available
under Manage Experiments > Run benchmarking only
. Depending on where the database is hosted you need to follow below
steps as prerequisites to use run benchmark only option.
- Create a new secret manager secret having values in below defined json format. All properties are case-sensitive and
required except
dbClusterIdentifier
{
"username": "database-user",
"password": "*******",
"engine": "redshift",
"host": "my-database-host.my-domain.com",
"port": 5439,
"dbClusterIdentifier": "redshift-cluster-1",
"dbname": "dev"
}
- Add tag to the secret Tag name =
ManagedBy
, Tag Value =BenchmarkingStack
. This is for Barometer to have permissions to use it - Upload your benchmarking queries to the
DataBucket
(Bucket created by BenchmarkingStack, available as Output) in new folder with any name (for example:my-benchmarking-queries
). Note: the queries can have any name and will be executed in sorted order of their names.
s3://benchmarkingstack-databucket-random-id
my-benchmarking-queries
|
| +-- query1.sql
| +-- query2.sql
- Allow network connection from
QueryRunnerSG
(Available as Output of BenchmarkingStack) to your database security group
In addition to the steps 1,2 and 3 mentioned above (both in the same VPC), follow below steps to Establish VPC Peering connection between BenchmarkingVPC and the VPC where database is hosted.
- Go to VPC console > Peering connection menu from left navigation
- Create new Peering connection selecting both VPCs (BenchmarkingVPC and DatabaseVPC)
- Accept peering connection request from Action menu
- Go to the VPC > Route tables and select any route table associated with BenchmarkingStack subnet
- Add new route with Destination = CIDR range of the DatabaseVPC and Target = Peering connection id (starts
with
pcx-
) - Repeat steps 4 and 5 for route table associated with BenchmarkingStack second subnet
- Go to the VPC > Route tables and select route table associated with DatabaseVPC subnet (if using default VPC select the only route table available)
- Add new route with Destination =
10.0.0.0/16
and Target = Peering connection id (starts withpcx-
) - Follow last
step 4 - allow network connection
fromboth in the same VPC
above.
You can bring your own workload for benchmarking to Barometer. In this context, workload is defined as files arranged in specific structure on your s3 bucket. To bring your own workload for the benchmarking you need to follow below steps as prerequisites.
- Prepare workload on your s3 bucket. It should contain folder structure as defined below. You can create folder with
the name of your workload (ex:
my-workload
) at any level in your s3 bucket. The root of your workload folder should have three sub-directories calledvolumes
,ddl
andbenchmarking-queries
.volumes
sub-directory: this directory contains scale factor for your workload. for example your workload may have dataset available in1gb
,50gb
and1tb
scales. You can create as many scale factors as you want with minimum one. Within each scale factor sub-directory you should have directory matchingtable name
with all table data in.parquet
format under it.ddl
sub-directory: this directory contains ddl-scripts to create tables respective to the platform in question. For example, ddl-scripts forredshift
platform should go under redshift folder and ddl specific tomysql
should be placed under its own directory matching with platform name. You can place more than one ddl scripts too, they will be executed in order of their names.benchmarking-queries
sub-directory: this directory contains benchmarking queries with respect to the platform in question. You can place more than one benchmarking-query files, they will be executed in order of their names per user session.
# Requires my-workload (can be any name) to follow convention on s3 bucket
my-workload
| +-- volumes
| | +-- 1gb
| | | +-- table_name_1
| | | | +-- file-1.parquet
| | | | +-- file-2.parquet
| | | +-- table_name_2
| | | | +-- file-1.parquet
| | | | +-- file-2.parquet
| +-- ddl
| | +-- redshift
| | | +-- ddl.query1.sql
| | | +-- ddl.query2.sql
| | +-- mysql
| | | +-- ddl.query.sql
| +-- benchmarking-queries
| | +-- redshift
| | | +-- query1.sql
| | | +-- query2.sql
| | +-- mysql
| | | +-- query1.sql
| | | +-- query2.sql
- Run the cli-wizard and go to
Manage workload > Add new workload
to import your workload. Wizard will validate and import workload if structure validation is successful. - Wizard will print
bucket policy
while importing your workload. Please update your s3 bucket's bucket policy with printed one.
In this project, you can find a BYOW example (custom-workload directory). You can create the same structure as mentioned above, by copying these 3 directories (SQL and DDL statements, and dataset) to your S3 bucket. After this, you can run this workload using the Barometer cli-wizard, to configure it as a "BYOW from S3" workload.
- Contains 5 SQL OLAP-like queries (.sql files).
- It disables the Redshift query results cache
- It tags the sessions for better monitoring.
- It creates three tables: one Fact table and two dimensions.
- It doesn't specify any Distribution Styles, nor Sort keys. Redshift will create these automatically, based on the workloads. You're free to change these, to analyze their query plans and performance.
- A small (less than 30MB) dataset, containing the data for the 3 tables above in Apache Parquet format.
- User deploys Barometer Benchmarking Stack
- Barometer Benchmarking stack creates infrastructure & step function workflows
- User uses cli-wizard to define & run experiments which triggers experiment runner workflow internally
- Workflow deploys, benchmarks & destroys platform (additional cloudformation stack to deploy service, e.g. Redshift Cluster)
- Workflow creates persistent dashboard registering metrics
- User uses this dashboard to compare benchmarking results
- To clean up any platform, delete stack with name starting with platform name. Example:
redshift-xyz
- Go to Cloudformation service and select stack named
BenchmarkingStack
(or runcdk destroy
from cdk-stack folder)
- Architectural & design concepts driving this project
- Benchmarking Stack infrastructure
- Cli Wizard
- How to add new platform support
- How to add new workload support