confluentinc / confluent-hybrid-cloud-workshop Goto Github PK

Confluent Hybrid Cloud Workshop

License: Apache License 2.0

CSS 28.37% Shell 6.96% Dockerfile 0.21% Python 11.03% HCL 44.54% HTML 5.98% Smarty 0.18% JavaScript 2.73%

confluent-hybrid-cloud-workshop's Introduction

Confluent Hybrid-Cloud Workshop

Overview

This repository allows you to configure and provision a cloud-based workshop using your preferred cloud provider GCP, AWS or Azure. Each workshop participant connects to their own virtual machine and is intended to act as a psuedo on-premise datacenter. A single Confluent Cloud cluster is shared by all workshop participants.

For a single workshop participant, the logical architecture looks like this.

From a physical architecture point of view, each component, except for Confluent Cloud, is hosted on the participant's virtual machine.

Each workshop participant will work through a series of Labs to create the following ksqlDB Supply & Demand Application.

Prerequisites

macOS or Linux
Terraform 0.12.20 or later
Python + Yaml
A GCP/AWS/Azure account with the appropriate privileges
For setting AWS credentials, please check the following link: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-where
A Confluent Cloud Account
MongoDB Realm CLI (required if you use the MongoDB Atlas extension)

Things to know before starting

This repository is going to create the required Confluent Cloud features (Environment, Cluster, API keys...). The cluster type by default is Basic. If it is necessary to use Standard or Dedicated cluster, please check this link to make the required changes: https://registry.terraform.io/providers/confluentinc/confluent/latest/docs/resources/confluent_kafka_cluster .

Provisioning a Workshop

Create an empty directory somewhere that will contain your workshop configuration file.

mkdir ~/myworkshop

Copy workshop-example-<cloud provider>.yaml to workshop.yaml in the directory you just created.

cp workshop-<cloud provider>-example.yaml ~/myworkshop/workshop.yaml

Edit ~/myworkshop/workshop.yaml and make the required changes.

Change your current directory to the root of the repository

cd ~/confluent-hybrid-cloud-workshop

To start provisioning the workshop, run workshop-create.py and pass in your workshop directory

python3 workshop-create.py --dir ~/myworkshop

Maybe you will need to execute the following commands before executing the previous script:

python3 -m pip install boto3
python3 -m pip install google
python3 -m pip install google-api-python-client
python3 -m pip install azure-cli

When you are finished with the workshop you can destroy it using workshop-destroy.py

python3 workshop-destroy.py --dir ~/myworkshop

Troubleshooting

If you ever need root access on the docker containers use:

docker exec --interactive --tty --user root --workdir / kafka-connect-ccloud bash

See this blog post for more info

License

This project is licensed under the Apache 2.0 - see the LICENSE.md file for details

confluent-hybrid-cloud-workshop's People

Contributors

Stargazers

Watchers

confluent-hybrid-cloud-workshop's Issues

Add optional configuration SSH Port

Add an optional configuration to define the ssh port. The default will be 22

Use Terraform provider for confluent cloud

Add S3 extension (Sink Connector)

Add tf to create s3 bucket
configure connector to sink data to it

Cannot re-run apply if Big query extension is used

If you use the Big QUery extension, re-running terraform apply will fail with error:
Error: Error updating Dataset "projects/XXX/datasets/test_tf": googleapi: Error 400: Cannot remove all owners

You might find yourself in this situation if the creation of the workshop environments failed for some reason. Re run ./workshop-create.py works if you are not using Big Query extension.
This issue is caused by the problem described here: hashicorp/terraform-provider-google#5152

A new approach to handle access was released so we should switch to this one now: https://www.terraform.io/docs/providers/google/r/bigquery_dataset_access.html

Add VM Size as a parameter

This could enable leaving workshops up for longer time

Add optional configuration VM Disk Size

This will enable leaving the VMs up for longer times. The default will be as it is now, so this change will be transparent to existing configs

Hybrid Cloud workshop Windows support ?

Hybrid Cloud workshop Windows support ?
Can i provision the workshop from a Windows machine and Use Azure CSP ?

Provisioner SSH Access Failed (Connection Timeout)

During the provisioning process it seems that the "null resource" is not able to connect to the EC2 instance.

This is the error that I get:
"Error: timeout - last error: SSH authentication failed (dc01@18...**:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain"

Any ideas?

An extension without properties generate an un-needed tfvar

Having. an extension with no properties in the yaml, like this

workshop extensions

azure-blob-storage:

Will result in this warning:

Warning: Value for undeclared variable

on terraform.tfvars line 4:
4: azure-blob-storage="None"

docker-compose.yaml ksqldb-server-ccloud config refers to /etc/ksql should be /etc/ksqldb

Should be:
KSQL_CONFIG_DIR: "/etc/ksqldb"
KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksqldb/log4j-rolling.properties"

Wrong screenshot in lab 15

This screenshot shows the cluster as CC, that is confusing as the new topic should be done for the on prem cluster

Big Query extension - some tables are missing ID

The tables created by the connector are missing the field ID

The ID is there in the topic... This used to work

Use customized images

As example for AWS: https://github.com/confluentinc/aws-confluent-devday-workshop

Add versions for terraform components

Thius will make the asset more stable (avoid breaking changes when API change)

Default is latest for KSQL

In the workshop documentation we say the default should be earliest.
People in the lab experienced that the default was latest

Customers, Products, and Suppliers do not flow to Google Big Query

At the end of the lab, I see transactional data in Big Query, but not the customers, products, or suppliers data.

In my local confluent control center, I see this data in their respective topics, and CC is properly showing their values (so it is schema-aware)

In confluent cloud (which is where the connector is configured to pull from) I see the data, but for the values i see the binary representation of the AVRO encoded data. I suspect that for whatever reason, the confluent cloud cluster is unable to deserialize the data?

The connector is running


Name                 : DC01_GCS_SINK
Class                : io.confluent.connect.gcs.GcsSinkConnector
Type                 : sink
State                : RUNNING
WorkerId             : kafka-connect-ccloud:18084

 Task ID | State   | Error Trace
---------------------------------
 0       | RUNNING |
---------------------------------

with no errors.

Could you comment on why dont I see any errors? How can I view messages that the connector "skipped" when running in KSQL mode?

Leverage of Cloud Shell on GCP

I know that Azure also has an equivalent, and AWS recently released one, but I'm familiar with GCP and built a script that can help the general public to make a quick deploy of this workshop with the use of Cloud Shell. It's a free service and it's pre-loaded with everything you need for this workshop.

It requires to go to console.cloud.google.com and run Cloud Shell with a selected Project. I extended workshop-create.py and named it workshop-quickcreate-gcp.py.

It creates the myworkshop directory, copies the yaml file, creates a key for the CE default service account (if non-existent), re-creates the yaml file for GCP using easy-to-follow prompts. After I cloned the repo, I placed it in confluent-hybrid-cloud-workshop directory.

Ran it:
python3 workshop-quickcreate-gcp.py --dir ~/myworkshop

Here is the code:


import argparse
import os
import yaml
import json
import shutil
import fileinput
import re
import glob
from subprocess import run, PIPE
from pathlib import Path

print()
print()

argparse = argparse.ArgumentParser()
argparse.add_argument('--dir', help="Workshop directory", required=True)
args = argparse.parse_args()

key_location = f'{Path.home()}/key.json'

def create_sa_key():
    s_a = run(['gcloud iam service-accounts list | grep "Compute Engine default"'],
              shell = True,
              stdout = PIPE,
              stderr = PIPE).stdout.decode('utf-8').split()[-2].strip()

    proc = run([f'gcloud iam service-accounts keys create ~/key.json --iam-account {s_a}'],
               shell = True,
               stdout = PIPE,
               stderr = PIPE)
    print(proc.stdout.decode('utf-8'))

if os.path.exists(key_location):
    with open(key_location, 'r') as fh:
        ky = json.load(fh)
        try:
            print('Using service account: ' + ky['client_email'])
        except:
            create_sa_key()
else:
    create_sa_key()

print()

run(['mkdir -p ~/myworkshop'], shell = True, stdout = PIPE, stderr = PIPE)
run(['cp workshop-example-gcp.yaml ~/myworkshop/workshop.yaml'], shell = True, stdout = PIPE, stderr = PIPE)

with open(f'{args.dir}/workshop.yaml', 'r') as fh:
    wkshp_yaml = yaml.full_load(fh)


wkshp_yaml['workshop']['core']['project'] = os.environ['DEVSHELL_PROJECT_ID']
wkshp_yaml['workshop']['core']['credentials_file_path'] = key_location

try:
    wkshp_yaml['workshop'].pop('extensions')
except:
    pass

def iterate_config_variables(var_name, var_type, var_value = None, gcp_list = None, gcp_filter = None, gcp_zone = None):

    print()
    message = f'Select a value for {var_name} (default = "{var_value}"): '
    if gcp_list in ['regions', 'zones', 'machine-types']:
        print(f'Fetching {gcp_list}...')
        command = f'gcloud compute {gcp_list} list'

        if gcp_zone:
            command += f' --zones {gcp_zone}'

        if gcp_filter:
            command += f' | grep "{gcp_filter}"'
        opts = run([command],
                    shell = True,
                    stdout = PIPE,
                    stderr = PIPE).stdout.decode('utf-8').split('\n')[1:-1]
        opts_dict = {i+1:item.split()[0].strip() for i, item in enumerate(opts)}
        for i, item in opts_dict.items():
            print(f'{i}\t{item}')
        message = f'Select a number from the options listed above (default = {var_value}): '
    while(True):
        var_input = input(message) or var_value
        if gcp_list:
            try:
                var_input = opts_dict[int(var_input)]
                break
            except:
                print('Not a valid option. Try again...')

        elif var_input:
            try:
                var_input = var_type(var_input)
                break
            except:
                print('Not a valid option. Try again...')
    return var_input

wkshp_yaml['workshop']['name'] = iterate_config_variables('workshop name',
                                                          str,
                                                          var_value = 'dc')
wkshp_yaml['workshop']['participant_count'] = iterate_config_variables('number of participants',
                                                                       int,
                                                                       var_value = 1)
wkshp_yaml['workshop']['participant_password'] = iterate_config_variables('workshop password',
                                                                          str,
                                                                          var_value = 'workshop123!')
reg = iterate_config_variables('region',
                               str,
                               var_value = 18,
                               gcp_list = 'regions')
wkshp_yaml['workshop']['core']['region'] = reg

zne = iterate_config_variables('zone',
                               str,
                               var_value = 1,
                               gcp_list = 'zones',
                               gcp_filter = reg)
wkshp_yaml['workshop']['core']['region_zone'] =zne

wkshp_yaml['workshop']['core']['vm_type'] = iterate_config_variables('machine type',
                                                                     str,
                                                                     var_value = 4,
                                                                     gcp_list = 'machine-types',
                                                                     gcp_filter = 'n1-standard',
                                                                     gcp_zone = zne)

wkshp_yaml['workshop']['core']['ccloud_bootstrap_servers'] = iterate_config_variables('ccloud bootstrap servers',
                                                                                      str)
wkshp_yaml['workshop']['core']['ccloud_api_key'] = iterate_config_variables('ccloud api key',
                                                                                      str)
wkshp_yaml['workshop']['core']['ccloud_api_secret'] = iterate_config_variables('ccloud api secret',
                                                                                      str)

print()
print('Writing this configuration to YAML file...')
print(json.dumps(wkshp_yaml, indent = 4))

with open(f'{args.dir}/workshop.yaml', 'w') as fh:
    wkshp_yaml = yaml.dump(wkshp_yaml,fh)

docker_staging=os.path.join(args.dir, ".docker_staging")
terraform_staging=os.path.join(args.dir, ".terraform_staging")

# Open and parse configuration file
with open( os.path.join(args.dir, "workshop.yaml"), 'r') as yaml_file:
    try:
        config = yaml.safe_load(yaml_file)
    except yaml.YAMLError as exc:
        print(exc)

def copytree(src, dst):
  if not os.path.exists(dst):
    os.makedirs(dst)
    shutil.copystat(src, dst)
  lst = os.listdir(src)
  for item in lst:
    s = os.path.join(src, item)
    d = os.path.join(dst, item)
    if os.path.isdir(s):
      copytree(s, d)
    else:
      shutil.copy2(s, d)

if int(config['workshop']['participant_count']) > 35:
  print()
  print("*"*70)
  print("WARNING: Make sure your Confluent Cloud cluster has enough free partitions")
  print("to support this many participants. Each participant uses ~50 partitions.")
  print("*"*70)
  print()
  while True:
    val = input('Do You Want To Continue (y/n)? ')
    if val == 'y':
      break
    elif val =='n':
      exit()

#----------------------------------------
# Create the Terraform staging directory
#----------------------------------------

# Copy core terraform files to terraform staging
copytree(os.path.join("./core/terraform", config['workshop']['core']['cloud_provider']), terraform_staging)
copytree("./core/terraform/common", os.path.join(terraform_staging, "common"))

# Copy extension terraform files to terraform staging
if 'extensions' in config['workshop'] and config['workshop']['extensions'] != None:
    for extension in config['workshop']['extensions']:
        if os.path.exists(os.path.join("./extensions", extension, "terraform")):
            copytree(os.path.join("./extensions", extension, "terraform"), terraform_staging)

# Create Terraform tfvars file
with open(os.path.join(terraform_staging, "terraform.tfvars"), 'w') as tfvars_file:

    # Process high level
    for var in config['workshop']:
        if var not in ['core', 'extensions']:
            tfvars_file.write(str(var) + '="' + str(config['workshop'][var]) + "\"\n")
    for var in config['workshop']['core']:
        if var != 'cloud_provider':
            tfvars_file.write(str(var) + '="' + str(config['workshop']['core'][var]) + "\"\n")
    if 'extensions' in config['workshop'] and config['workshop']['extensions'] != None:
        for extension in config['workshop']['extensions']:
            if os.path.exists(os.path.join("./extensions", extension, "terraform")):
                if config['workshop']['extensions'][extension] != None:
                    for var in config['workshop']['extensions'][extension]:
                        tfvars_file.write(str(var) + '="' + str(config['workshop']['extensions'][extension][var]) + "\"\n")

#----------------------------------------------------------------------------
# Create the Docker staging directory, this directory is uploaded to each VM
#----------------------------------------------------------------------------
# remove stage directory
if os.path.exists(docker_staging):
    shutil.rmtree(docker_staging)
# Create staging directory and copy the required docker files into it
os.mkdir(docker_staging)
os.mkdir(os.path.join(docker_staging, "extensions"))
copytree("./core/docker/", docker_staging)
# Copy asciidoc directory to .docker_staging
copytree(os.path.join("./core/asciidoc"), os.path.join(docker_staging, "asciidoc"))
# Deal with extensions
if 'extensions' in config['workshop'] and config['workshop']['extensions'] != None:
    # Add each extensions asciidoc file as an include in the main workshop.adoc file
    includes = []
    include_str=""
    for extension in config['workshop']['extensions']:
        if os.path.isdir(os.path.join("./extensions", extension, "asciidoc")):
            includes.append(glob.glob(os.path.join("./extensions", extension, "asciidoc/*.adoc"))[0])
    # Build extension include string
    for include in includes:
        include_str += 'include::.' + include + '[]\n'

    # Add extension includes to core workshop.adoc
    for line in fileinput.input(os.path.join(docker_staging, "asciidoc/workshop.adoc"), inplace=True):
        line=re.sub("^#EXTENSIONS_PLACEHOLDER#",include_str,line)
        print(line.rstrip())
    # Copy extension asciidoc files to docker staging
    for extension in config['workshop']['extensions']:
        if os.path.isdir(os.path.join("./extensions", extension, "asciidoc")):
            copytree(os.path.join("./extensions", extension, "asciidoc"), os.path.join(docker_staging, "extensions", extension, "asciidoc"))
    # Copy extension images to docker staging
    for extension in config['workshop']['extensions']:
        if os.path.isdir(os.path.join("./extensions", extension, "asciidoc/images")):
            copytree(os.path.join("./extensions", extension, "asciidoc/images"), os.path.join(docker_staging, "asciidoc/images"))
    # Copy extension docker files to docker staging and create docker .env file
    for extension in config['workshop']['extensions']:
        if os.path.isdir(os.path.join("./extensions", extension, "docker")):
            copytree(os.path.join("./extensions", extension, "docker"), os.path.join(docker_staging, "extensions", extension, "docker"))
            # Create .env file for docker
            if config['workshop']['extensions'][extension] != None:
                for var in config['workshop']['extensions'][extension]:
                    with open(os.path.join(docker_staging, "extensions", extension, "docker/.env"), 'a') as env_file:
                        env_file.write(var + '=' + config['workshop']['extensions'][extension][var] + "\n")
else:
    for line in fileinput.input(os.path.join(docker_staging, "asciidoc/workshop.adoc"), inplace=True):
        line=re.sub("^#EXTENSIONS_PLACEHOLDER#","",line)
        print(line.rstrip())


#-----------------
# Create Workshop
#-----------------

os.chdir(terraform_staging)

# Terraform init
os.system("terraform init")

# Terraform plan
os.system("terraform plan")

# Terraform apply
os.system("terraform apply -auto-approve")

# Show workshop details
os.system("terraform output -json external_ip_addresses > workshop_details.out")
if os.path.exists("workshop_details.out"):
    with open('workshop_details.out') as wd:
        ip_addresses = json.load(wd)
        print("*" * 65)
        print("\n WORKSHOP DETAILS\n Copy & paste into Google Sheets and share with the participants\n")
        print("*" * 65)
        print('=SPLIT("SSH USERNAME,GETTING STARTED URL,PARTICIPANT NAME/EMAIL",",")')
        for id, ip_address in enumerate(ip_addresses, start=1):
            print('=SPLIT("dc{:02d},http://{}", ",")'.format(id, ip_address))
        #print('=SPLIT("{}-{},http://{}", ",")'.format(config['workshop']['name'], id, ip_address))

    os.remove("workshop_details.out")

Adding pre-flight checks (python, terrraform etc)

Use cluster linking rather than replicator?

Should we use cluster linking rather than replicator?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.