Apache Solr on Amazon Elastic Kubernetes Service

This repo contains sample configuration files to install Apache Solr on Amazon Elastic Kubernetes Service (EKS). It also contains some files required to run the demo. This repository walks through the installation and configuration of the following components-

An Amazon EKS Cluster with three managed node groups
An Apacle Solr cluster, also known as SolrCloud
A Zookeeper ensemble, required by SolrCloud
Apache Solr auto-scaler to scale Solr replicas
Prometheus to extract custom metrics from SolrCloud cluster, to be used by Horizontal Pod Autoscaler.
Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA) for the EKS cluster to scale the Pods within the managed node groups and to scale the compute for the EKS Cluster respectively.

Getting Started

Pre-requisites

An AWS Account.
An AWS Cloud9 workspace. Setup a Cloud9 workspace following the instructions found here
Install Kubernetes tool eksctl, kubectl and AWS CLI
Create an IAM role for your Cloud9 workspace.
Attach the IAM role to the Cloud9 workspace.
Update the IAM settings for your Cloud9 workspace.

Use the following steps to create the Solr environment

From a terminal in your Cloud9 workspace, clone this git repository and set the directory:

git clone <repo_url> apache-solr-k8s-main
cd apache-solr-k8s-main/config

Create an Amazon EKS cluster using. Note: replace <region of choice> with the AWS region you wish to deploy your EKS Cluster, for example --region=us-west-2.

eksctl create cluster --version=1.21 \
--name= solr8demo \
--region=<region of choice> \
--node-private-networking \
--alb-ingress-access \
--asg-access \
--without-nodegroup

Create the Managed Node Groups in private subnets within the cluster using:

⚠️ The managed node groups config file uses EC2 instance type m5.xlarge which is not free tier eligible. Thus, your AWS account may also incur charges for EC2. For pricing details of Amazon Elastic Kubernetes Service refer the Amazon EKS pricing page.

eksctl create nodegroup -f managedNodegroups.yml

Setup the Helm charts, and install Prometheus:

curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
helm repo add stable https://charts.helm.sh/stable/
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install Kubernetes Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify that the metrics-server deployment is running the desired number of pods with the following command.

kubectl get deployment metrics-server -n kube-system

Install ZooKeeper for SolrCloud Zookeeper ensemble:

kubectl create configmap zookeeper-ensemble-config --from-env-file=zk-config.properties
kubectl apply -f zookeeper.yml

Check status of the pods in the StatefulSet for Zookeeper by running the following command

kubectl get pods -l app=zk

Expected output should look like

NAME   READY   STATUS    RESTARTS   AGE
zk-0   1/1     Running   0          4h4m
zk-1   1/1     Running   0          4h3m
zk-2   1/1     Running   0          4h3m

Install Solr and Solr-metrics exporter:

kubectl create configmap solr-cluster-config --from-env-file=solr-config.properties
kubectl apply -f solr-cluster.yml
kubectl apply -f solr-exporter.yml

Check status of the Solr pods

kubectl get pods -l app=solr-app

Expected output

NAME     READY   STATUS    RESTARTS   AGE
solr-0   1/1     Running   0          3h59m
solr-1   1/1     Running   0          3h59m
solr-2   1/1     Running   0          3h58m

Verify that the Solr Exporter service is running on port 9983. This is important since our HPA depends on Solr metrics to be exported to Kubernetes metrics server via Prometheus.

kubectl get service/solr-exporter-service

Expected output (Note: CLUSTER-IP will likely be different)

NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
solr-exporter-service   ClusterIP   10.100.205.122   <none>        9983/TCP   4h1m

Update the prom.yml Prometheus configuration file with the solr-exporter-service IP and host port.

Find the solr-exporter-service cluster IP address using the command below

kubectl get service/solr-exporter-service -o jsonpath='{.spec.clusterIP}'

Update the prometheus.yml property in the prom.yml file as shown below and replace <solr-exporter-service-IP> with the cluster IP from above command. Save the file.

scrape_configs:
  - job_name: prometheus
      static_configs:
        - targets:
          - localhost:9090
  - job_name: solr
      scheme: http
        static_configs:
          - targets: ['<solr-exporter-service-IP>:9983']

Install Prometheus adapter:

helm install prometheus-adapter prometheus-community/prometheus-adapter \
--set prometheus.url=http://prometheus-server.default.svc.cluster.local \
--set prometheus.port=80 \
--values=adapterConfig.yml

helm install prometheus prometheus-community/prometheus \
--values prom.yml

Configure Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA) using kubectl:

kubectl apply -f hpa.yml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

Verify HPA has been setup correctly-

kubectl describe hpa

Expected output

Name:                             solr-hpa
Namespace:                        default
Labels:                           <none>
Annotations:                      <none>
CreationTimestamp:                Wed, 22 Dec 2021 19:25:18 +0000
Reference:                        StatefulSet/solr
Metrics:                          ( current / target )
  "solr_metrics" (target value):  4021 / 50k
Min replicas:                     3
Max replicas:                     20
StatefulSet pods:                 20 current / 20 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric solr_metrics(nil)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:           <none>

⚠️ The "solr_metrics" value may be 0 or a lower number when setting up the Solr deployment. However, this number is expected to change when Solr receives client requests. Also note that the maxReplicas used in the hpa.yml config file is set to 10. You may consider changing this to meet the needs of your Solr deployment. maxReplicas defines the maximum number of pods the HPA can scale up to.

Obtain SolrCloud Administration UI URL using kubectl get services solr-service from a terminal in your Cloud9 workspace. The URL will be of the form http://<xxxxx>.<region>.elb.amazonaws.com:8983.

Create a Solr Collection named Books using the Solr Administration UI and upload the sample data file data/books.json.

Cnfigure SolrCloud autoscaler by setting a Search Rate Trigger. The autoscaler config can be set using the endpoint http://<xxxxx>.<region>.elb.amazonaws.com:8983/api/cluster/autoscaling/:

curl -X POST -H 'Content-type:application/json' -d '{
            "set-trigger": {
                  "name" : "search_rate_trigger",
                  "event" : "searchRate",
                  "collections" : "Books",
                  "metric" : "QUERY./select.requestTimes:1minRate",
                  "aboveRate" : 10.0,
                  "belowRate" : 0.01,
                  "waitFor" : "30s",
                  "enabled" : true,
                  "actions" : [
                        {
                        "name" : "compute_plan",
                        "class": "solr.ComputePlanAction"
                        },
                        {
                        "name" : "execute_plan",
                        "class": "solr.ExecutePlanAction"
                        }
                  ]
            }
}' http://<xxxxx>.<region>.elb.amazonaws.com:8983/api/cluster/autoscaling/

Testing the deployment

A Python script is included in the scripts directory which can be used to test the deployment.

Change directory

cd scripts
chmod 744 ./submit_mc_pi_k8s_requests_books.py

Install the required dependencies

sudo python3 -m pip install -r ./requirements.txt

Run the script

python ./submit_mc_pi_k8s_requests_books.py -p 1 -r 1 -i 1

To run a short load test the value of flags -p, -r, and -i can be increased

python ./submit_mc_pi_k8s_requests_books.py -p 100 -r 30 -i 30000000 > result.txt

Review the result.txt file to ensure you are getting search query responses from Solr.

Cleaning up

Use the following steps to clean up the Solr environment.

Uninstall Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA):

kubectl delete -f hpa.yml
kubectl delete -f cluster-autoscaler-autodiscover.yaml

Uninstall Solr:

kubectl delete -f solr-cluster.yml
kubectl delete configmap solr-cluster-config
kubectl delete -f solr-exporter.yml

Uninstall Zookeeper:

kubectl delete -f zookeeper.yml 
kubectl delete configmap zookeeper-ensemble-config

Delete the Managed Node Groups:

eksctl delete nodegroup -f managedNodegroups.yml

Delete the Amazon EKS cluster:

eksctl delete cluster --name=solr8demo

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

ngadde / amazon-eks-arch-apache-solr Goto Github PK

amazon-eks-arch-apache-solr's Introduction

Apache Solr on Amazon Elastic Kubernetes Service

Getting Started

Pre-requisites

Use the following steps to create the Solr environment

Testing the deployment

Cleaning up

Security

License

amazon-eks-arch-apache-solr's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent