Coder Social home page Coder Social logo

rook-on-bare-metal-workshop's Introduction

This repository is Experimental meaning that it's based on untested ideas or techniques and not yet established or finalized or involves a radically new and innovative style! This means that support is best effort (at best!) and we strongly encourage you to NOT use this in production.

Rook-on-Bare-Metal Workshop

Welcome to the "Rook on Bare Metal Workshop"! This hands on workshop takes you through using Rook to provide stateful storage atop bare metal physical services to containerized workloads. You'll be provided with dedicated bare metal infrastructure (physical hosts) installed as Kubernetes nodes, lab instructions, and an instructor to follow through all the steps.

If at anytime you break your environment or otherwise get yourself stuck, don't panic. Simply take another unassigned lab environment and pick up where you left off. We'll rebuild the broken environment and make it available for someone else.

A run through of this workshop is available from Cephalocon 2019 on YouTube: https://youtu.be/vGsnaNekRBo

Student Prerequisites

For this workshop, you'll be using a remote Kubernetes cluster so there is no need to install any software on your laptop. When attending, please be sure to bring and have preinstalled:

  • Wifi equipped laptop
  • SSH client (PuTTY)
  • Web browser

While you can be a Kubernetes and Rook beginner to take this workshop, we do expect you to have some basic familiarity with Linux and command line execution including running a text editor such as 'vi'.

Agenda

  • Lab Assignments and Verifying Cluster Setup Lab01
  • Installing and Using Rook with Ceph Lab10
  • Deploying an Application with Stateful Storage Lab20
  • Using Rook with CockroachDB Lab30
  • Growing your Storage Cluster Lab40
  • Simulating and Recovering from a Storage Node Failure Lab50
  • Object Storage with RadosGW Lab60
  • Defining a PersistentVolumeClaim Lab70
  • Monitoring Rook with Prometheus & Grafana Lab80
  • Rolling upgrading of Ceph Cluster Lab90

Student Instructions

Requirements

Before starting you will need SSH on your laptop.

Students at an in-person workshop, please start at Lab01

Running your own Workshop

You are welcome to take this workshop and run it at your own event! If you're interested in running this workshop, please feel free to read the instructions on how to setup students lab environments in the setup README.

rook-on-bare-metal-workshop's People

Contributors

dependabot[bot] avatar displague avatar ianychoi avatar johnhaan avatar johnstudarus avatar miouge1 avatar rainleander avatar vielmetti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rook-on-bare-metal-workshop's Issues

multi-node URL

When there are two nodes, the "get nodes" command might not return the correct nodes.

IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[].address}')

This might be impacting Lab 80 getting the URL.

inconsistent node names

lab30@studarus-lab:/$ ssh node1 hostname
node1
lab30@studarus-lab:/$ ssh node2 hostname
lab30-k8s-node-1

Somehow the terraform needs to be modified to get meaningful hostnames.

Confusing ceph command

Some people might be confused where they type the ceph commands.
I think it would be better that specify the prompt before some ceph commands.

Ansible version update

Per PR #68 please review a change suggested to bring Ansible to a newer point release, as a consequence of an automated review.

I'll defer to @johnstudarus to help assess whether this change is fully compatible or if it's going to trigger a cascade of other changes.

tag storage node and add to Rook later

Cluster.yml adds in the storage node right away. Can it be added later via a tag?

Use the master at first (via RookDir FS) and then later lab add storage. Students add a tag "storage" to the node and then Rook adds it? (new Lab # needed)

Uniform Standards Request: Experimental Repository

Hello!

We believe this repository is Experimental and therefore needs the following files updated:

If you feel the repository should be maintained or end of life or that you'll need assistance to create these files, please let us know by filing an issue with https://github.com/packethost/standards.

Packet maintains a number of public repositories that help customers to run various workloads on Packet. These repositories are in various states of completeness and quality, and being public, developers often find them and start using them. This creates problems:

  • Developers using low-quality repositories may infer that Packet generally provides a low quality experience.
  • Many of our repositories are put online with no formal communication with, or training for, customer success. This leads to a below average support experience when things do go wrong.
  • We spend a huge amount of time supporting users through various channels when with better upfront planning, documentation and testing much of this support work could be eliminated.

To that end, we propose three tiers of repositories: Private, Experimental, and Maintained.

As a resource and example of a maintained repository, we've created https://github.com/packethost/standards. This is also where you can file any requests for assistance or modification of scope.

The Goal

Our repositories should be the example from which adjacent, competing, projects look for inspiration.

Each repository should not look entirely different from other repositories in the ecosystem, having a different layout, a different testing model, or a different logging model, for example, without reason or recommendation from the subject matter experts from the community.

We should share our improvements with each ecosystem while seeking and respecting the feedback of these communities.

Whether or not strict guidelines have been provided for the project type, our repositories should ensure that the same components are offered across the board. How these components are provided may vary, based on the conventions of the project type. GitHub provides general guidance on this which they have integrated into their user experience.

m1.xlarge.86 support

The m1.xlarge.x86 systems are available in greater numbers than the c2.medium.x86 allowing more student environments to run at once. However, the m1.xlarge.x86 doesn't (by default) have a spare second drive for Rook to use.
The proposal is to strip the RAID 0 (mirrored /dev/sda and /dev/sdb) of the second drive (/dev/sdb) making the second drive available for Rook.

Commands to remove the second drive, remove the partitions and make it available for Rook to use.
mdadm /dev/md126 --fail /dev/sdb3
mdadm /dev/md126 --remove /dev/sdb3
mdadm /dev/md127 --fail /dev/sdb2
mdadm /dev/md127 --remove /dev/sdb2
wipefs -a /dev/sdb

setup/defaults/main.yml:
plan_k8s_nodes: m1.xlarge.x86

Ansible version and CVE-2019-10156

In kubernetes-sigs/kubespray#5049 I note that the version of Ansible pinned for Kubespray (and thus for this workshop) is pinned to a 2.7.8 version, which is subject to the CVE mentioned.

If and when Kubespray pulls its version of Ansible forward to 2.7.12 to address this issue, this repo should follow suit.

There are reasons not to go to Ansible 2.8.x yet so it's probably best to pin to a specific version of 2.7.

volumes fail to attach

Happen on both node1 and node2 when following the Wordpress/MySQL deployments with VPC. The containers are stuck forever creating because of the volumes never mounting on the node. RadiosGW works. Perhaps FlexVolumeDriver settings?

From lab103 on node1:

kubectl -n rook-ceph logs rook-ceph-agent-vjtpn

2019-07-16 13:38:23.009925 E | flexdriver: Attach volume replicapool/pvc-1f16a55b-a7cc-11e9-926b-0cc47ae5490a failed: failed to attach volume replicapool/pvc-1f16a55b-a7cc-11e9-926b-0cc47ae5490a: failed to map image replicapool/pvc-1f16a55b-a7cc-11e9-926b-0cc47ae5490a cluster rook-ceph. failed to map image replicapool/pvc-1f16a55b-a7cc-11e9-926b-0cc47ae5490a: Failed to complete 'rbd': signal: interrupt. . output:

lab file ownership

When the main.yaml playbook is run, the resulting files under the lab directories are all owned by root. These need to be owned by the lab user.

Bare Metal vs Public Cloud

Primer on what needs to be done different on Bare Metal versus running Kubernetes on a public cloud.

Exposing services (no load balancer)
Raw devices

enable password logins

By default, logins are not allowed via passwords. They need to be enabled using:

cat <> /etc/ssh/sshd_config
Match user lab*
PasswordAuthentication yes
EOF
service sshd restart

warn if homedir exists

When running setup (for new lab environments), the playbook should complain if the home directories already exist before executing.

cockroachdb port - no svc cockroachdb-admin - Lab30

root@node1:~ # IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[].address}')
root@node1:~ # PORT=$(kubectl -n rook-cockroachdb get svc cockroachdb-admin -o jsonpath='{.spec.ports[].nodePort}')
Error from server (NotFound): services "cockroachdb-admin" not found

root@node1:~ # kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 443/TCP 26h
ingress-nginx default-backend ClusterIP 10.233.24.98 80/TCP 26h
kube-system coredns ClusterIP 10.233.0.3 53/UDP,53/TCP,9153/TCP 26h
kube-system kubernetes-dashboard ClusterIP 10.233.25.18 443/TCP 26h
rook-ceph ceph-dashboard-external NodePort 10.233.51.55 8443:32241/TCP 51m
rook-ceph rook-ceph-mgr ClusterIP 10.233.62.110 9283/TCP 115m
rook-ceph rook-ceph-mgr-dashboard ClusterIP 10.233.19.21 8443/TCP 115m
rook-ceph rook-ceph-mon-a ClusterIP 10.233.28.93 6789/TCP 116m
rook-ceph rook-ceph-mon-b ClusterIP 10.233.6.185 6789/TCP 116m
rook-ceph rook-ceph-mon-c ClusterIP 10.233.3.39 6789/TCP 116m
rook-cockroachdb cockroachdb-public ClusterIP 10.233.63.216 26257/TCP,8080/TCP 11m
rook-cockroachdb rook-cockroachdb ClusterIP None 26257/TCP,8080/TCP

python-netaddr missing

Ran: ansible-playbook -i labs.ini main.yml

Got:
"fatal: [node1]: FAILED! => {"msg": "The conditional check 'kube_service_addresses | ipaddr('net')' failed. The error was: The ipaddr filt er requires python-netaddr be installed on the ansible controller"}

rgw secrets - Lab 60

Docs are currently looking for the following secret:
rook-ceph-object-user-my-store-my-user

But the following is available:
rook-ceph-rgw-my-store-keyring

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.