Coder Social home page Coder Social logo

cloudfoundry / app-autoscaler-release Goto Github PK

View Code? Open in Web Editor NEW
26.0 18.0 50.0 93.88 MB

Automated scaling for apps running on Cloud Foundry

License: Apache License 2.0

Shell 4.54% HTML 2.43% Go 73.55% Makefile 1.25% Ruby 1.51% Java 15.59% Dockerfile 0.24% HCL 0.27% Nix 0.54% Python 0.07%
cloud-foundry cff-wg-app-runtime-interfaces

app-autoscaler-release's Introduction

App-AutoScaler Build Status Build Status Quality Gate Status

The App-AutoScaler provides the capability to adjust the computation resources for Cloud Foundry applications through

  • Dynamic scaling based on application performance metrics
  • Scheduled scaling based on time

The App-AutoScaler has the following components:

  • api : provides public APIs to manage scaling policy
  • servicebroker: implements the Cloud Foundry service broker API
  • metricsgateway : collects and filters loggregator events via loggregator v2 API
  • metricsserver: transforms loggregator events to app-autoscaler performance metrics ( metricsgateway + metricsserver is a replacement of metricscollector)
  • metricsforwarder: receives and forwards custom metrics to loggregator via v2 ingress API
  • eventgenerator: aggregates memory metrics, evaluates scaling rules and triggers events for dynamic scaling
  • scheduler: manages the schedules in scaling policy and trigger events for scheduled scaling
  • scalingengine: takes the scaling actions based on dynamic scaling rules or schedules

Development

System requirements

Database requirement

The App-AutoScaler supports Postgres and MySQL. It uses Postgres as the default backend data store. These are run up locally with docker images so ensure that docker is working on your system before running up the tests.

Setup

Note: all of the setup is encapsulated in the makefile targets. So you can run the test targets (test|integration) directly and it will setup and start the tests.

To set up the development, firstly clone this project

$ git clone https://github.com/cloudfoundry/app-autoscaler.git

Generate scheduler test certs

Initialize the Database

Note: The makefile will init the database if it has not already been run before running the tests.

  • Postgres

    make init-db
  • MySQL

    make init-db db_type=mysql

Generate TLS Certificates

Create the certificates.

Note:

  • on macos it will install certstrap automatically but on other OS's it needs to be pre-installed
  • The makefile will create the certificates if it has not already been run before running the tests.
make test-certs

Unit tests

The default database is postgres

  • Postgres:
make test

To use a specific postgres version:

make clean #Only if you're changing versions to refresh the running docker image.
make test POSTGRES_TAG=x.y

where:

  • x is the major version

  • y is the minor version ( this can be left out to get the most recent patch)

  • MySQL:

    make test db_type=mysql

To use a specific MySQL version:

make clean #Only if you're changing versions to refresh the running docker image.
make test db_type=mysql MYSQL_TAG=x.y

where:

  • x is the major version
  • y is the minor version ( this can be left out to get the most recent patch)

Integration tests

The default database is postgres

  • Postgres:
make integration

To use a specific postgres version:

make clean #Only if you're changing versions to refresh the running docker image.
make integration POSTGRES_TAG=x.y

where:

  • x is the major version

  • y is the minor version ( this can be left out to get the most recent patch)

  • MySQL:

    make integration db_type=mysql

To use a specific MySQL version:

make clean #Only if you're changing versions to refresh the running docker image.
make integration db_type=mysql MYSQL_TAG=x.y

where:

  • x is the major version
  • y is the minor version ( this can be left out to get the most recent patch)

Build App-AutoScaler

make build

Clean up

You can use the make clean to remove:

  • database ( postgres or mysql)
  • autoscaler build artifacts

Coding Standards

Autoscaler uses Golangci and Checkstyle for its code base. Refer to style-guide

Bosh Release for app-autoscaler service

Purpose

The purpose of this bosh release is to deploy and setup the app-autoscaler service.

Usage

Bosh Lite Deployment

  • Install Bosh-cli-v2

  • Install and start BOSH-Deployment, following its README.

  • Install CF-deployment

  • Create a new autoscaler client UAA CLI is required to here to create a new UAA client id.

    • Install the UAA CLI, uaac.

      gem install cf-uaac
    • Obtain uaa_admin_client_secret

      bosh interpolate --path /uaa_admin_client_secret /path/to/cf-deployment/deployment-vars.yml
    • Use the uaac target uaa.YOUR-DOMAIN command to target your UAA server and obtain an access token for the admin client.

      uaac target uaa.bosh-lite.com --skip-ssl-validation
      uaac token client get admin -s <uaa_admin_client_secret>
    • Create a new autoscaler client

      uaac client add "autoscaler_client_id" \
          --authorized_grant_types "client_credentials" \
          --authorities "cloud_controller.read,cloud_controller.admin,uaa.resource" \
          --secret <AUTOSCALE_CLIENT_SECRET>
  • Create and upload App-Autoscaler release

    git clone https://github.com/cloudfoundry/app-autoscaler-release
    cd app-autoscaler-release
    make go-mod-tidy vendor db scheduler
    bosh create-release
    bosh -e YOUR_ENV upload-release
  • Deploy app-autoscaler with the newly created autoscaler client

    In the latest App-Autoscaler v2.0 release , App-Autoscaler retrieves application's metrics with loggregator V2 API via gRPC over mutual TLS connection.

    So the valid TLS certification to access Loggregator Reverse Log Proxy is required here. When deploying in bosh-lite, the most easy way is to provide loggregator certificates generated by cf-deployments.

    bosh -e YOUR_ENV -d app-autoscaler \
        deploy templates/app-autoscaler-deployment.yml \
        --vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml \
        -l <PATH_TO_CF_DEPLOYMENT_VAR_FILES> \
        -v system_domain=bosh-lite.com \
        -v cf_client_id=autoscaler_client_id \
        -v cf_client_secret=<AUTOSCALE_CLIENT_SECRET> \
        -v skip_ssl_validation=true
  • Deploy autoscaler with cf deployment mysql database

    Notes: It is blocked by the pull request cf-deployment #881 temporarily. If you would like to use the cf mysql, please apply the set-autoscaler-db.yml in the pull request when deploy cf deployment.

    The lastest Autoscaler release add the support for mysql database, Autoscaler can connect the same mysql database with cf deployment. Use the operation file example/operation/cf-mysql-db.yml which including the cf database host , password and tls.ca cert.

    bosh -e YOUR_ENV -d app-autoscaler \
        deploy templates/app-autoscaler-deployment.yml \
        --vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml \
        -l <PATH_TO_CF_DEPLOYMENT_VAR_FILES> \
        -v system_domain=bosh-lite.com \
        -v cf_client_id=autoscaler_client_id \
        -v cf_client_secret=<AUTOSCALE_CLIENT_SECRET> \
        -v skip_ssl_validation=true \
        -o example/operation/cf-mysql-db.yml
  • Deploy autoscaler with external postgres database and mysql database

    bosh -e YOUR_ENV -d app-autoscaler \
        deploy templates/app-autoscaler-deployment.yml \
        --vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml \
        -l <PATH_TO_CF_DEPLOYMENT_VAR_FILE> \
        -l <PATH_TO_DATABASE_VAR_FILE> \
        -v system_domain=bosh-lite.com \
        -v cf_client_id=autoscaler_client_id \
        -v cf_client_secret=<AUTOSCALE_CLIENT_SECRET> \
        -v skip_ssl_validation=true \
        -o example/operation/external-db.yml

** The DATABASE_VAR_FILE should look like as below

database:
  name: <database_name>
  host: <database_host>
  port: <database_port>
  scheme: <database_scheme>
  username: <database_username>
  password: <database_password>
  sslmode: <database_sslmode>
  tls:
    ca: |
      -----BEGIN CERTIFICATE-----

      -----END CERTIFICATE-----

The table below shows the description of all the variables:

Property Description
database.name The database name.
database.host The database server ip address or hostname.
database.port The database server port.
database.scheme The database scheme. Currently Autoscaler supports "postgres" and "mysql".
database.username The username of the database specified above in "database.name".
database.password The password of the user specified above in "database.username".
database.sslmode There are 6 values allowed for "postgres": disable, allow, prefer, require, verify-ca and verify-full. Please refer to Postgres SSL definition when define database_sslmode. For "mysql", there are 7 values allowed: false, true, skip-verify, preferred, verify-ca, verify_identity.Please refer to Mysql SSL definition(Golang) and Mysql Connector SSL
database.tls.ca PEM-encoded certification authority for secure TLS communication. Only required when sslmode is verify-ca or verify-full(postgres) or verify_identity(mysql) and can be omitted for other sslmode.

Run linting

Linting can be run through make:

make lint

Autofix can be trigger by providing the following options:

OPTS=--fix RUBOCOP_OPTS=-A make lint

Register service

Log in to Cloud Foundry with admin user, and use the following commands to register app-autoscaler service

cf create-service-broker autoscaler <brokerUserName> <brokerPassword> <brokerURL>
  • brokerUserName: the user name to authenticate with service broker. It's default value is autoscaler_service_broker_user.
  • brokerPassword: the password to authenticate with service broker. It will be stored in the file passed to the --vars-store flag (bosh-lite/deployments/vars/autoscaler-deployment-vars.yml in the example). You can find them by searching for autoscaler_service_broker_password.
  • brokerURL: the URL of the service broker

All these parameters are configured in the bosh deployment. If you are using default values of deployment manifest, register the service with the commands below.

cf create-service-broker autoscaler autoscaler_service_broker_user `bosh int ./bosh-lite/deployments/vars/autoscaler-deployment-vars.yml --path /autoscaler_service_broker_password` https://autoscalerservicebroker.bosh-lite.com

Acceptance test

Refer to AutoScaler UAT guide to run acceptance test.

Use service

To use the service to auto-scale your applications, log in to Cloud Foundry with admin user, and use the following command to enable service access to all or specific orgs.

cf enable-service-access autoscaler [-o ORG]

The following commands don't require admin rights, but user needs to be Space Developer. Create the service instance, and then bind your application to the service instance with the policy as parameter.

cf create-service autoscaler  autoscaler-free-plan  <service_instance_name>
cf bind-service <app_name> <service_instance_name> -c <policy>

Remove the service

Log in to Cloud Foundry with admin user, and use the following commands to remove all the service instances and the service broker of app-autoscaler from Cloud Foundry.

cf purge-service-offering autoscaler
cf delete-service-broker autoscaler

Monitoring the service

The app-autoscaler provides a number of health endpoints that are available externally that can be used to check the state of each component. Each health endpoint is protected with basic auth (apart from the api server), the usernames are listed in the table below, but the passwords are available in credhub.

Component Health URL Username Password Key
eventgenerator https://autoscaler-eventgenerator.((system_domain))/health eventgenerator /autoscaler_eventgenerator_health_password
metricsforwarder https://autoscaler-metricsforwarder.((system_domain))/health metricsforwarder /autoscaler_metricsforwarder_health_password
metricsgateway https://autoscaler-metricsgateway.((system_domain))/health metricsgateway /autoscaler_metricsgateway_health_password
metricsserver https://autoscaler-metricsserver.((system_domain))/health metricsserver /autoscaler_metricsserver_health_password
scalingengine https://autoscaler-scalingengine.((system_domain))/health scalingengine /autoscaler_scalingengine_health_password
operator https://autoscaler-operator.((system_domain))/health operator /autoscaler_operator_health_password
scheduler https://autoscaler-scheduler.((system_domain))/health scheduler /autoscaler_scheduler_health_password

These endpoints can be disabled by using the ops file example/operations/disable-basicauth-on-health-endpoints.yml

You can follow the development progress on Pivotal Tracker.

Deploy and offer Autoscaler as a service

Go to app-autoscaler-release project for how to BOSH deploy App-AutoScaler

Use Autoscaler service

Refer to user guide for the details of how to use the Auto-Scaler service, including policy definition, supported metrics, public API specification and command line tool.

License

This project is released under version 2.0 of the Apache License.

app-autoscaler-release's People

Contributors

aadeshmisra avatar anubhav-gupta1 avatar app-autoscaler-ci-bot avatar aqan213 avatar asalan316 avatar bonzofenix avatar boyang9527 avatar cdlliuy avatar dependabot[bot] avatar donacarr avatar fraenkel avatar garethjevans avatar geigerj0 avatar ghaih avatar itsouvalas avatar joergdw avatar kanekoh avatar kevinjcross avatar kongjicdl avatar mvach avatar olivermautschke avatar paltanmoy avatar peterellisjones avatar pradyutsarma avatar qibobo avatar renovate[bot] avatar rohitsharma04 avatar salzmannsusan avatar silvestre avatar zyjiaobj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

app-autoscaler-release's Issues

job "metircsgateway" fails to start in instance "asnozzle" because of missing policy_json table

While deploying app-autoscaler using the Readme.md, I ran into following issue:
"metricsgateway" job in "asnozzle" kept failing, and logs showed following message

{"data":{"addr":"0.0.0.0:6503","session":"10"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.health-server.new-health-server","source":"metricsgateway","timestamp":"1566995106.178214312"}
{"data":{},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.starting metricsgateway","source":"metricsgateway","timestamp":"1566995106.178723574"}
{"data":{"interval":5000000000,"session":"4"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.started","source":"metricsgateway","timestamp":"1566995106.178894281"}
{"data":{"session":"5"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.Dispather.dispatcher-started","source":"metricsgateway","timestamp":"1566995106.178979874"}
{"data":{"session":"2"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.WSHelper.setup-new-ws-connection","source":"metricsgateway","timestamp":"1566995106.179043293"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.179763317"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","session":"4"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.retrieve-app-ids","source":"metricsgateway","timestamp":"1566995106.179866314"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.180225134"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","session":"4"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.retrieve-app-ids","source":"metricsgateway","timestamp":"1566995106.180329800"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.180671692"}

I figured this could be because the tables are actually getting created by "pre-start" script for job "apiserver" part of instance "asapi" which is not updated before the updating of "asnozzle" instance.
https://github.com/cloudfoundry/app-autoscaler-release/blob/master/jobs/apiserver/templates/pre-start.erb#L46

My Temp Solution:
Reshuffle the sequence of defining instances in template/app-autoscaler-deployment.yml file such that "asnozzle" is defined before "asapi".
At least this resolved the issue for us.

Blobs not found when uploading release?

Followed the directions, cloned the repo, ran the update. Created the release successfully. When I try and upload the release to my bosh director:

  • Cannot find blob named 'apiserver/b05314ee056c084dc8cc5f1df532877b8468d62d' with SHA1 'e3e20c84fba48918acfd2048b394343c9511edd7'
  • Cannot find blob named 'metricscollector/85f0c39bcb9ef477c708d18802b4fbbfd097a7d5' with SHA1 '624ef5956dd9b9a22878ab750c99dd6ecb9c15df'
  • Cannot find blob named 'pruner/a9b04121b5fab39485f4a681698b452da0118aee' with SHA1 'd40e9d14e5a4902c9555293eea80bff26a633f4f'
  • Cannot find blob named 'scalingengine/4c4cfeeabd9e0c0e53b44a9563e209edf6eb6230' with SHA1 '6a17cd56ff164fe5cd272518b0beb782b611903c'
  • Cannot find blob named 'eventgenerator/11123ed87e47a95b8ae473c87a433bce908a5ceb' with SHA1 'a6b403300d88d06db49c18d06db9a9e65f97e1b0'

Abnormal Liquibase lock blocks app-autoscaler from starting

App-autoscaler uses Liquibase to maintain the database change sets, and runs Liquibase process as a pre-start job.
Once a Liquibase process is running , it will insert a DB lock record , and then remove it with normal completion. But in some weird situation (i.e. liquibase update was interrupted ..) , the db lock won't be removed, and blocked the further autoscaler startup.

See detail in https://www.liquibase.org/documentation/databasechangeloglock_table.html

Currently , Liquibase don't have DB lock TTL implemented, so we need to workaround this with cmd listLocks and releaseLocks.

Details steps to reproduce and fix the problem:

step1: do a liquibase update manually , and ctrl+c to break its execution.

java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver --changeLogFile=$API_DIR/db/api.db.changelog.yml update
Starting Liquibase at Tue, 20 Aug 2019 05:05:11 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
^Cautoscaler-api/1:/var/vcap/jobs/apiserver/bin# ^C

step2: now , a db lock was left in ICD database:

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver listLocks
Starting Liquibase at Tue, 20 Aug 2019 05:05:28 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Database change log locks for autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require
 - autoscaler-api-1.autoscaler-api-set.cf.svc.cluster.local (172.30.51.196) at Aug 20, 2019, 5:05:12 AM
Liquibase command 'listLocks' was executed successfully.

step3: run the update cmd in step1 again. Now , it hung just as what we found in @travagli 's cluster

To fix it, we can execute "release lock" cmd as below:

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver releaselocks
Starting Liquibase at Tue, 20 Aug 2019 05:08:36 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Successfully released all database change log locks for 'autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require'
Liquibase command 'releaselocks' was executed successfully.

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver listLocks
Starting Liquibase at Tue, 20 Aug 2019 05:08:47 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Database change log locks for autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require
 - No locks
Liquibase command 'listLocks' was executed successfully.

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver --changeLogFile=$API_DIR/db/api.db.changelog.yml update
Starting Liquibase at Tue, 20 Aug 2019 05:08:52 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Liquibase: Update has been successful.

Need a link for bosh-dns setup in docs.

Bosh-dns is not enabled by default to all deployments up to now. So, it is necessary to add a doc link to bosh and cf deployment doc to explain how to enable bosh-dns . Otherwise, the deployment of autoscaler will be problematic.

https://bosh.io/docs/dns/#links
for the entire Director via Director job configuration director.local_dns.use_dns_addresses property that if enabled affects all deployments by default. We are planning to eventually change this configuration to true by default.

https://github.com/cloudfoundry/cf-deployment#bosh-runtime-config

cf-deployment requires that you have uploaded a runtime-config for BOSH DNS prior to deploying your foundation. We recommended that you use the one provided by the bosh-deployment repo:

bosh update-runtime-config bosh-deployment/runtime-configs/dns.yml --name dns

How to use this release with bosh director?

I am having problems with route_registrar and consul. I am not able to figure out the configuration based on our cloud foundry deployment.
Any help would be highly appreciated.

Thanks

Dependency gone?

Hi,

I follow the README to deploy the autoscaler with BOSH, after run cmd: bosh create-release, following errors encountered:
...
Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:07 min
[INFO] Finished at: 2018-09-10T03:23:20-07:00
[INFO] Final Memory: 8M/111M
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-clean-plugin:jar:2.5: Could not transfer artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.5 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.48.215] failed: Connection timed out -> [Help 1]
...
[FATAL] Non-resolvable parent POM for org.cloudfoundry.autoscaler:scheduler:1.0-SNAPSHOT: Could not transfer artifact org.springframework.boot:spring-boot-starter-parent:pom:1.5.2.RELEASE from/to spring-snapshots (https://repo.spring.io/libs-snapshot): Connect to repo.spring.io:443 [repo.spring.io/35.241.58.96] failed: Connection timed out and 'parent.relativePath' points at no local POM @ line 5, column 10
...
Seems the dependency is gone, please kindly help.
BR//HAO

App autoscaler fails to start, settings.json invalid.

attempting to deploying app autoscaler 3.0.0 results in the apiserver failing to start with a message that settings.json is invalid.

from apiserver.stderr.log

Error: settings.json is invalid
at module.exports (/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/app.js:16:11)
at Object. (/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/index.js:25:56)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/app.js:16
throw new Error('settings.json is invalid');

from apiserver.stdout.log

{"timestamp":"2019-11-26T03:05:17.836Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:28.689Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:40.031Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:51.275Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:06:02.027Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:06:13.550Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}

from settings.json
"minBreachDurationSecs": 30, "minCoolDownSecs": 30, "httpClientTimeout": 5000

apiserver fails when deploying app autoscaler using bosh DNS

Trying to deploy a greenfield install of app autoscaler that uses bosh DNS. However the apiserver fails when doing so with the following error:

L Error: Action Failed get_task: Task 916d5245-4549-4acd-47e3-1bee834e78a4 result: 1 of 3 pre-start scripts failed. Failed Jobs: apiserver. Successful Jobs: route_registrar, bosh-dns.

in the var/vcap/sys/log/apiserver/pre-start.stdout.log it shows a connection attempt failed to autoscalerpostgres.service.cf.internal. I am using the bosh-dns.yml that is the exmaples dir of the app-autoscaler release folder. It shows a domain for autoscalerpostgres.service.cf.internal. I have not modified the bosh-dns.yml.

Here is more of the log from the apiserver/pre-start.stderr.log:

Starting Liquibase at Tue, 23 Jul 2019 17:35:18 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Unexpected error running Liquibase: org.postgresql.util.PSQLException: The connection attempt failed.
liquibase.exception.DatabaseException: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: The connection attempt failed.
at liquibase.integration.commandline.CommandLineUtils.createDatabaseObject(CommandLineUtils.java:132)
at liquibase.integration.commandline.Main.doMigration(Main.java:974)
at liquibase.integration.commandline.Main.run(Main.java:199)
at liquibase.integration.commandline.Main.main(Main.java:137)
Caused by: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: The connection attempt failed.
at liquibase.database.DatabaseFactory.openConnection(DatabaseFactory.java:254)
at liquibase.database.DatabaseFactory.openDatabase(DatabaseFactory.java:149)
at liquibase.integration.commandline.CommandLineUtils.createDatabaseObject(CommandLineUtils.java:97)
... 3 common frames omitted
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:292)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:454)
at org.postgresql.Driver.connect(Driver.java:256)
at liquibase.database.DatabaseFactory.openConnection(DatabaseFactory.java:246)
... 5 common frames omitted
Caused by: java.net.UnknownHostException: autoscalerpostgres.service.cf.internal
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:221)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:402)
at java.base/java.net.Socket.connect(Socket.java:591)
at org.postgresql.core.PGStream.(PGStream.java:70)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
... 10 common frames omitted

CA certificate rotation causes interruption in executed schedules

If you have more than one CA cert (while doing a CA cert rotation) on the scheduler job only one certificate will end up in the Java trust store:

$ cat /var/vcap/jobs/scheduler/config/certs/scalingengine/ca.crt
-----BEGIN CERTIFICATE-----
MIIE7jCCAtagAwIBAgIBATANBgkqhkiG9w0BAQsFADAXMRUwEwYDVQQDEwxhdXRv
...
RRCLIcypYA/ld2RGB9wq/9Fj
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIE7jCCAtagAwIBAgIBATANBgkqhkiG9w0BAQsFADAXMRUwEwYDVQQDEwxhdXRv
...
CsZEnYFcqsE/g5jJj0S/YeNG
-----END CERTIFICATE-----

$ /var/vcap/jobs/scheduler/bin/install_crt_truststore test scalingengine/ca.crt
$ /var/vcap/packages/java/bin/keytool -list -v -keystore /var/vcap/data/certs/test/cacerts 
Enter keystore password:  

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry
...

This means that the scheduler will not trust one of the certificates and there will be some time when schedules cannot be executed:

org.cloudfoundry.autoscaler.scheduler.util.error.SchedulerInternalException: Error connecting to scaling engine, failed with error: I/O error on DELETE request for "https://scalingengine.service.cf.internal:6104/v1/apps/d0910498-eabe-4014-8f42-4d9f77003bd9/active_schedules/7958": sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors; nested exception is javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors for app id: d0910498-eabe-4014-8f42-4d9f77003bd9 and schedule id: 7,958 to delete active schedule.

We probably need to split up the certs according to https://stackoverflow.com/questions/14660767/keytool-importing-multiple-certificates-in-single-file in

manage_truststore () {
operation=$1
$JDK_HOME/bin/keytool -$operation -file $CERT_FILE -keystore $TRUST_STORE_FILE -storeType pkcs12 -storepass $PASSWORD -noprompt -alias $CERT_ALIAS >/dev/null 2>&1
}

Question on autoscaling

I configured autoscaling in my dev version of cloudfoundry cf release 256

I have applied the below policy to the application .

scenario:-

My application is having one instance only .
At the start start_date_time my application scales from 1 to 3 successfully .
But at the end_time it doesn't scale back to original count which is 1

Per sample catalog definition it should scale down .(I have a feeling that i am unable to understand specific_date schedule ) Any advice is great .
instance_min_count:
type: integer
minimum: 1
description: The number of instances to scale down to once recurrence period
instance_max_count:
type: integer
minimum: 1
description: Maximum number of instances to scale up during recurrence period
initial_min_instance_count:
type: integer
minimum: 1
description: The number of instances to scale up to as soon as the recurrence period starts

  "instance_min_count": 1,
  "instance_max_count": 4,
  "schedules": {
    "timezone": "America/New_York",
    "specific_date": [{
      "start_date_time": "2018-05-30T17:45",
      "end_date_time": "2018-05-30T17:50",
      "instance_min_count": 2,
      "instance_max_count": 4,
      "initial_min_instance_count": 3
    }]
  }
}```

Need scripts to download all blobs for fissile?

Hello,

There no src for most of the packages dependency for jobs, does it mean I need to create a separate scripts to download all blobs for the packages when I want to use fissile to renter the release to docker images?

Thanks and Regards.
HAO

can bosh cli v1 handle the bosh links?

Hi,

Since we are using cf-release, so bosh cli v2 can't be used in this case.
There is one link method which is in the file app-autoscaler-release/jobs/metricscollector/templates/metricscollector.yml.erb, can bosh cli v1 handle this method?

bosh cluster deployment

currently, it is providing Bosh Lite deployment. However, it is not enough and satisfing everyone.
Do you having any plan to have bosh cluster deployment with Director ?

Consul replacement?

The current app-autoscaler-release is not deployable due to it's templates relying on CF Consul, which does not exist anymore.
Any plans to fix that? Add bosh-dns and bosh-dns-aliases to the release/templates?

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

  • Fork this repo: cfgitbot-config
  • Add your project to config-production.yml file
  • Submit a PR

If there are any questions, please reach out to [email protected].

create-release scripts not honoring proxy?

I have a linux box with no internet access but has access via a proxy. I have set http_proxy and https_proxy on the box and I can curl -x successfully through the proxy. However when I run bosh create-release it gets a good chunk of the way through but then dies on the apiserver part: Added package 'apiserver/b0ab0e6e8317cd7292c8230d491e13f232f623c7'

It looks like those scripts are not honoring the proxy? I was watching my squid proxy logs throughout the create-release process and it was getting a lot of traffic until the apiserver portion started, and then there wasn't another entry in the proxy log while the apiserver part was running and eventually failed. It's not using the http(s)_proxy settings. Here's a couple snippets of the error:

apache-maven-3.3.9/bin/mvnyjp apache-maven-3.3.9/conf/ apache-maven-3.3.9/conf/logging/ apache-maven-3.3.9/conf/logging/simplelogger.properties apache-maven-3.3.9/conf/settings.xml apache-maven-3.3.9/conf/toolchains.xml apache-maven-3.3.9/lib/ext/ apache-maven-3.3.9/lib/ext/README.txt [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building db 1.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:09 min [INFO] Finished at: 2018-03-28T15:39:32+00:00 [INFO] Final Memory: 9M/111M [INFO] ------------------------------------------------------------------------ [ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-clean-plugin:jar:2.5: Could not transfer artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.5 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.44.215] failed: Connection timed out -> [Help 1]

Mar 28, 2018 3:41:54 PM org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec execute INFO: I/O exception (java.net.NoRouteToHostException) caught when processing request to {s}->https://repo.maven.apache.org:443: No route to host Mar 28, 2018 3:41:54 PM org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec execute INFO: Retrying request to {s}->https://repo.maven.apache.org:443

scheduler vm namespace clash with cf scheduler (0.30.0)

cf-deployment introduced the use of a vm named 'scheduler' in cf-deployment 0.28.0 release. This is causing a namespace clash when consul_agents register with the consul_server(s) as there are now two clients both attempting to register with the name scheduler-0

We are seeing repeated problems with the autoscaler being able to pass the autoscaler acceptance test suite when running against cf-deployment 0.30.0 and believe this is caused by the namespace clash within consul.

Support for sending metrics over Loggregator

As a CF component it would be nice if auto scalar published health and metrics over loggregator. CF Operators have all had to integrate loggregator into their monitoring systems to monitor all of the CF Core Components. It would be nice if auto scalar supported the same metrics system as all the other Cf core components. This would allow operators who wished to add auto scalar to their system to not have to bother with monitor auto scalar different than any other CF component.

Today auto scalar provides a Prometheus endpoint which is great for those using Prometheus to monitor their Cloud Foundry Deployments but not so great for those who don't.

With the rewrite to golang that might simplify integration with loggregator since loggregator provides a common golang library for this purpose.

Not an issue but another CPU query

Hi, we're currently trying out this release in one of our dev envs and noticed that the policy for cpu metric can only be between 1-100%.

I can find the code that implements this restriction which is fair enough. In our Env though CPU can go up to 200% as our Diego Cells are dual-core.

I'm running through the code to see what cpu metric is pulled out and i'm guessing it the same cpuPercentage from the loggregator v2 api as used in the command 'cf app [app name]' which shows cpu used. Is that right?

It wouldn't be using AbsoluteCPUUsage no? And i'm guessing not AbsoluteCPUEntitlement as that's experimental.

Any info would be good as i've been going down a rabbit hole following the code :-)

Think you still need cf_admin_password to deploy

The readme has recently been updated with the following for deploying autoscaler:

bosh -e YOUR_ENV -d app-autoscaler
deploy templates/app-autoscaler-deployment-fewer.yml
--vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml
-v system_domain=bosh-lite.com
-v cf_client_id=autoscaler_client_id
-v cf_client_secret=autoscaler_client_secret
-v skip_ssl_validation=true

I only got it to work with the following:

bosh -e YOUR_ENV -d app-autoscaler
deploy templates/app-autoscaler-deployment-fewer.yml
--vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml
-v system_domain=bosh-lite.com
-v cf_admin_password=
-v cf_client_id=autoscaler_client_id
-v cf_client_secret=autoscaler_client_secret
-v skip_ssl_validation=true

Without it, you get an error like:

Failed to find variable '//app-autoscaler/cf_admin_password' from config server: HTTP Code '404', Error: 'The request could not be completed because the credential does not exist or you do not have sufficient authorization.'

Failed to recurse into submodule path 'src/app-autoscaler'

Hi

I'm having isues when ./scripts/update runs:

Submodule path 'src/gopkg.in/yaml.v2': checked out 'a3f3340b5840cee44f372bddb5880fcbc419b46a'
Failed to recurse into submodule path 'src/app-autoscaler'

I'm following main installation steps https://github.com/cloudfoundry-incubator/app-autoscaler-release#bosh-lite-deployment

I have run "git submodule update --init --recursive" on appautoscaler repo but it didn't work

Have someone seen this error before? Am I missing some step?

Thanks

Bosh Bionic Stemcell Support and State of the project

Dear App-Autoscaler maintainers and contributors,

we tried rolling out the app-autoscaler BOSH release with the latest bionic stemcell (v 0.28) and could not succeed.
With the old Xenial stemcells and the same configuration we are able to run the service as intended.

Therefore some general questions:

  • Are there any plans to make the BOSH release deployable with the Bionic stemcell?
  • Is the project still maintained, or has it reached the end of it's life?
  • If it has reached the end of life: Are there alternatives that we could offer to our platform users for providing autoscaling features for Cloud Foundry?

Kind regards,
Julian

Need better instructions on `go mod vendor` before creating the release

Currently, we have set GOPROXY=off in the packaging scripts for the golang packages which stop go build from trying to access the internet when building each binary, but... this required go mod download && go mod vendor to be run before creating the bosh release. This will be a manual step that needs to be run before we have a pipeline that creates a release automatically.

We need to document this somewhere.

Binded app is not work. (RuntimeException, InvocationTargetException)

Issue

When I bind an Autoscaler service instance to a sample app and then restage, the app is not work.

$ cf create-service-broker autoscaler username password https://servicebroker.service.cf.internal:6101
$ cf enable-service-access autoscaler
$ cf create-service autoscaler autoscaler-free-plan autoscaler1
$ cf bind-service spring-music autoscaler1 -c '{"instance_min_count":1,"instance_max_count":4,"scaling_rules":[{"metric_type":"memoryused","stat_window_secs":300,"breach_duration_secs":600,"threshold":30,"operator":"<","cool_down_secs":300,"adjustment":"-1"},{"metric_type":"memoryused","stat_window_secs":300,"breach_duration_secs":600,"threshold":90,"operator":">=","cool_down_secs":300,"adjustment":"+1"}]}'
$ cf restage spring-music
...
Successfully destroyed container

0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 crashed
Failed to watch staging of app spring-music in org cloudlab / space dev as admin...
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]OUT 2017-07-06 09:11:47.489  INFO 7 --- [           main] .b.l.ClasspathLoggingApplicationListener : Application failed to start with classpath: [file:/home/vcap/app/, jar:file:/home/vcap/app/lib/tomcat-embed-core-8.0.33.jar!/,
...
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:62)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at java.lang.Thread.run(Thread.java:745)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Caused by: java.lang.reflect.InvocationTargetException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at java.lang.reflect.Method.invoke(Method.java:498)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:54)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	... 1 more
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Caused by: java.lang.NullPointerException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.CloudFoundryServiceInfoCreator.uriKeyMatchesScheme(CloudFoundryServiceInfoCreator.java:65)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.CloudFoundryServiceInfoCreator.accept(CloudFoundryServiceInfoCreator.java:26)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.RelationalServiceInfoCreator.accept(RelationalServiceInfoCreator.java:23)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.RelationalServiceInfoCreator.accept(RelationalServiceInfoCreator.java:15)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.AbstractCloudConnector.getServiceInfo(AbstractCloudConnector.java:60)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.AbstractCloudConnector.getServiceInfos(AbstractCloudConnector.java:40)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.Cloud.getServiceInfos(Cloud.java:89)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.config.SpringApplicationContextInitializer.getCloudProfile(SpringApplicationContextInitializer.java:64)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.config.SpringApplicationContextInitializer.initialize(SpringApplicationContextInitializer.java:44)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.applyInitializers(SpringApplication.java:640)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.createAndRefreshContext(SpringApplication.java:343)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.Application.main(Application.java:15)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	... 6 more
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]OUT Exit status 0

Context

  • CF: v247
  • App-autoscaler-release: master
  • Postgres: v17
---
director_uuid: {UUID}

name: app-autoscaler-release

## Release Details ###
releases:
  - name: app-autoscaler
    version: latest
  - name: postgres
    url: https://bosh.io/d/github.com/cloudfoundry/postgres-release
    version: '17'
    sha1: b062e32a5409ccd4e4161337c48705c793a58412
  - name: paasta-controller
    version: '2.0'

## Network Section ##
networks: {NETWORK CONFIG}

## Resource Pool ##
resource_pools:
  - name: small
    network: default
    stemcell:
      name: bosh-openstack-kvm-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      name: random
      instance_type: m1.small
      availability_zone: nova

## Disk Pool ##
disk_pools:
  - name: default
    disk_size: 1024

## Canary details ##
update:
  canaries: 1
  canary_watch_time: 1000-300000
  max_in_flight: 3
  update_watch_time: 1000-300000

## Compilation ##
compilation:
  workers: 2
  network: default
  reuse_compilation_vms: true
  cloud_properties:
    name: random
    instance_type: m1.small
    availability_zone: nova

## Jobs ##
jobs:
  - name: postgres
    instances: 1
    update:
      serial: true
    resource_pool: small
    networks:
      - name: default
        static_ips:
          - {POSTGRES_IP}
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: postgres, release: postgres}
    properties:
      databases:
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            postgres:
              check:
                tcp: 127.0.0.1:5432
                interval: 30s
                timeout: 10s

  - name: apiserver
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: apiserver, release: app-autoscaler}
    properties:
      api_server:
        db_config:
          idle_timeout: 1000
          max_connections: 10
          min_connections: 0
        port: 6100
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        scheduler:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            apiserver: {}



  - name: scheduler
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: scheduler, release: app-autoscaler}
    properties:
      scheduler:
        port: 6102
        job_reschedule_interval_millisecond: 10000
        job_reschedule_maxcount: 6
        notification_reschedule_maxcount: 3
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        scaling_engine:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      scheduler_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client

  - name: servicebroker
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: servicebroker, release: app-autoscaler}
    properties:
      service_broker:
        db_config:
          idle_timeout: 1000
          max_connections: 10
          min_connections: 0
        port : 6101
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        username: username
        password: password
        http_request_timeout: 5000
        catalog:
          services:
          - id: autoscaler-guid
            name: autoscaler
            description: Automatically increase or decrease the number of application instances based on a policy you define.
            bindable: true
            plans:
            - id: autoscaler-free-plan-id
              name: autoscaler-free-plan
              description: This is the free service plan for the Auto-Scaling service.
        api_server:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      binding_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            servicebroker: {}

  - name: pruner
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: pruner, release: app-autoscaler}
    properties:
      appmetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      instancemetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      scalingengine_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      pruner:
        logging:
          level: debug
      consul:
        agent:
          mode: client

  - name: metricscollector
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: metricscollector, release: app-autoscaler}
    properties:
      instancemetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      cf: {CF_INFO}
      metricscollector:
        logging:
          level: debug
        server:
          port: 6103
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
      consul:
        agent:
          mode: client


  - name: eventgenerator
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: eventgenerator, release: app-autoscaler}
    properties:
      appmetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      eventgenerator:
        logging:
          level: debug
        scaling_engine:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
        metricscollector:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      consul:
        agent:
          mode: client

  - name: scalingengine
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: scalingengine, release: app-autoscaler}
    properties:
      scalingengine_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      scheduler_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      cf: {CF_INFO}
      scalingengine:
        logging:
          level: debug
        server:
          port: 6104
        ca_cert: {CA_CERT}
        server_cert: {SERVIER_CERT}
        server_key: {SERVER_KEY}
        consul:
          cluster: http://127.0.0.1:8500
      consul:
        agent:
          mode: client

properties:
  consul:
    agent:
      domain: cf.internal
      log_level: warn
      servers:
        lan:
        - {CONSUL_IP}
    agent_cert: {AGENT_CERT}
    agent_key: {AGENT_KEY}
    ca_cert: {CA_CERT}
    dns_config: null
    encrypt_keys:
      - {ENCRYPT_KEY}
    server_cert: {SERVIER_CERT}
    server_key: {SERVER_KEY}

Question

How do I resolve this errors?

If autoscaler.operator.require_consul: false then autoscaler.operator.lock.consul_cluster_config should be set to null.

When require_consul is false the operator job still attempts to connect to consul.

In operator/main.go:

if conf.Lock.ConsulClusterConfig != "" {
		consulClient, err := consuladapter.NewClientFromUrl(conf.Lock.ConsulClusterConfig)
		if err != nil {
			logger.Fatal("new consul client failed", err)
		}
...

Because there is a default value for consul_cluster_config in the operator job spec, the value is never "" unless you explicitly set it.

CheckServiceExists does not correctly check for service offering with cf7 CLI

if strings.Contains(string(version.Out.Contents()), "version 7") {
serviceExists = cf.Cf("marketplace", "-e", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration())
} else {
serviceExists = cf.Cf("marketplace", "-s", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration())
}
Expect(serviceExists).To(Exit(0), fmt.Sprintf("Service offering, %s, does not exist", cfg.ServiceName))

does not correctly check for the service offering to exist with cf7 marketplace as it exits with exit code 0 even when no service offering was found:

[Update]: When the -e flag is specified, and no service offering with that name is found, the exit code returned is 0. This is in contrast to the cf CLI v6, which returned exit code 1 in this case.
(c.f. cloudfoundry/docs-cf-cli#71)

Instead the output of the cf7 marketplace command needs to be parsed.

wrong pid for servicebroker in the pid file.

@qibobo Hi qibobo, one more issues found:
In autoscaler-api pod, process servicebroker, apiserver status are does not exist as below:
Process 'crond' Running
File 'cron_bin' Accessible
File 'cron_rc' Accessible
Directory 'cron_spool' Accessible
Process 'rsyslogd' Running
File 'rsyslogd_bin' Accessible
File 'rsyslog_file' Timestamp failed
Process 'servicebroker' Does not exist
Process 'route_registrar' Running
Process 'apiserver' Does not exist
File 'post-start' Does not exist
System 'autoscaler-api-int.hcf.svc' Running

I tried to start them manually with monit validate -v:
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' process is not running
'servicebroker' trying to restart
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' start: /var/vcap/jobs/servicebroker/bin/servicebroker
'servicebroker' Error testing process id [86445] -- No such process

While check the corresponding pid file, it provides a different pid:
root@autoscaler-api-int:/var/vcap/monit# more /var/vcap/sys/run/servicebroker/servicebroker.pid
86565

Updating instance asactors failed during the app-autoscaler deployment with databasechangeloglock relation already exists error

I consistently ran this error for any new deployment of app-autoscaler (release 3.0.1):

Task 154346 | 22:10:49 | Preparing deployment: Preparing deployment (00:00:02)
Task 154346 | 22:10:53 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 154346 | 22:10:53 | Creating missing vms: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0)
Task 154346 | 22:10:53 | Creating missing vms: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0)
Task 154346 | 22:10:53 | Creating missing vms: asmetrics/69e58377-3a81-40fc-841a-03892de2f026 (0)
Task 154346 | 22:10:53 | Creating missing vms: asnozzle/3595b7f5-c04d-4add-b3a4-09d460d8ee20 (0)
Task 154346 | 22:10:53 | Creating missing vms: asapi/06ba43c1-b86f-4040-b0d8-d5e28a8c1686 (0) (00:01:08)
Task 154346 | 22:12:02 | Creating missing vms: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: asnozzle/3595b7f5-c04d-4add-b3a4-09d460d8ee20 (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: asmetrics/69e58377-3a81-40fc-841a-03892de2f026 (0) (00:01:09)
Task 154346 | 22:12:06 | Updating instance postgres_autoscaler: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0) (canary) (00:00:21)
Task 154346 | 22:12:27 | Updating instance asactors: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0) (canary) (00:00:52)
L Error: Action Failed get_task: Task 45355f8b-f7af-4267-6967-b90c3bc6d985 result: 2 of 4 pre-start scripts failed. Failed Jobs: scalingengine, operator. Successful Jobs: bosh-dns, scheduler.
Task 154346 | 22:13:19 | Error: Action Failed get_task: Task 45355f8b-f7af-4267-6967-b90c3bc6d985 result: 2 of 4 pre-start scripts failed. Failed Jobs: scalingengine, operator. Successful Jobs: bosh-dns, scheduler.

when checking /var/vcap/sys/log/scalingengine/pre-start.stdout.log, I found this stacktrace:

Starting Liquibase at Wed, 16 Sep 2020 22:13:02 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Unexpected error running Liquibase: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
liquibase.exception.LockException: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT(
) FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.lockservice.StandardLockService.acquireLock(StandardLockService.java:289)
at liquibase.lockservice.StandardLockService.waitForLock(StandardLockService.java:207)
at liquibase.Liquibase.update(Liquibase.java:184)
at liquibase.Liquibase.update(Liquibase.java:179)
at liquibase.integration.commandline.Main.doMigration(Main.java:1220)
at liquibase.integration.commandline.Main.run(Main.java:199)
at liquibase.integration.commandline.Main.main(Main.java:137)
Caused by: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.lockservice.StandardLockService.isDatabaseChangeLogLockTableInitialized(StandardLockService.java:173)
at liquibase.lockservice.StandardLockService.init(StandardLockService.java:121)
at liquibase.lockservice.StandardLockService.acquireLock(StandardLockService.java:246)
... 6 common frames omitted
Caused by: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT(
) FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:70)
at liquibase.executor.jvm.JdbcExecutor.query(JdbcExecutor.java:138)
at liquibase.executor.jvm.JdbcExecutor.query(JdbcExecutor.java:146)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:154)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:169)
at liquibase.executor.jvm.JdbcExecutor.queryForInt(JdbcExecutor.java:190)
at liquibase.executor.jvm.JdbcExecutor.queryForInt(JdbcExecutor.java:185)
at liquibase.lockservice.StandardLockService.isDatabaseChangeLogLockTableInitialized(StandardLockService.java:162)
... 8 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:224)
at liquibase.executor.jvm.JdbcExecutor$QueryStatementCallback.doInStatement(JdbcExecutor.java:419)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:57)
... 15 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: relation "databasechangeloglock" already exists
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at liquibase.executor.jvm.JdbcExecutor$ExecuteStatementCallback.doInStatement(JdbcExecutor.java:352)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:57)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:125)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:109)
at liquibase.lockservice.StandardLockService.init(StandardLockService.java:97)

I noticed that there is another open issue (#207) related to
databasechangeloglock, and I'm not sure if it's the same issue. If it's the same issue, please mark this issue as duplicate and close it. If not, what could be causing this error?

Note: As a temporary workaround, I ran the "bosh deploy" again and it seemed to fix the error.

Autoscaler shows incorrect version in stratos

Hi all,

We have found that the autoscaler version is not showing correctly in stratos.

We have just updated autoscaler from v3.0.0 to v3.0.1 on stratos v3.2.1
Nevertheless - in Stratos -> Cloud Foundry -> Summary -> Autoscaler Version is showing 3.0.0 (should be 3.0.1).

Stratos gets the version from the response to autoscaler.(cf system endpoint)/v1/info.
Looks like this comes from https://github.com/cloudfoundry/app-autoscaler/blob/3c85c748d2e9f315f86b216c6ce340416b515800/api/config/info.json
There must be something build/release side that updates that though as it's set to 001.

The version numbers shown in stratos can be configured in https://github.com/cloudfoundry/app-autoscaler-release/blob/master/jobs/golangapiserver/spec#L97 which seems a suboptimal way of doing this sort of thing.

Can we have a fix for that.

Thanks a lot,

Does the metrics introduced in the new autoscaler work with old cloudfoundry version?

@qibobo Hi Qibobo, compared the old app-autoscalerhttps://github.com/cfibmers/open-Autoscaler/tree/486d818e7047123339df45a4b7b0c9d15666fe51, which only support one metric type: Memory, the latest app-autoscaler support 4 scaling metrics: memoryused,memoryutil,responsetime,throughput, currently, the cloud foundry version we are using in our project is pretty old(cf-release v251), does those new metrics need some enhancement in newer cloudfoundry version to be able to scaling based on those metrics? Thanks a lot!

Allow encrypted DB connections

All of the database connection strings appear to be hard coded with "sslmode=disable" which is preventing secure connections.
It would be nice to be able to configure this for environments that require secure connections.

DB password exposed in the log when db connect failed.

When failed to connect db, the db password will be exposed in the log file, it's a secure issue.
Error logs:

2019/12/03 07:37:39 failed-to-connection-to-database, dburl:postgres://xxx:[email protected]:5432/autoscaler?sslmode=verify-full&sslrootcert=/var/vcap/jobs/scalingengine/config/certs/scalingengine_db/ca.crt,  err:pq: password authentication failed for user "xxx"
failed to connect to database:

Checking the code and found the below line will print the DB url with password:

log.Printf("failed-to-connection-to-database, dburl:%s, err:%s\n", dbUrl, err)

Error: Bad Certificate

I successfully deployed the app-autoscaler-release on AWS.

Everything works well except for the metricscollector and scalingengine APIs.

screenshot from 2017-06-16 16-17-08

I can use the apiserver APIs.

apiserver/740d157f-8e3f-43fc-bd0b-28d3b43075aa:~$ curl https://apiserver.service.cf.internal:6100/v1/policies/45c39971-41c6-4fb2-b999-a4fc33068329 --insecure
{"instance_max_count":4,"instance_min_count":1,"scaling_rules":[{"adjustment":"-1","breach_duration_secs":600,"cool_down_secs":300,"metric_type":"memoryused","operator":"<","stat_window_secs":300,"threshold":30},{"adjustment":"+1","breach_duration_secs":600,"cool_down_secs":300,"metric_type":"memoryused","operator":">=","stat_window_secs":300,"threshold":90}],"schedules":{"recurring_schedule":[{"days_of_week":[1,2,3],"end_time":"18:00","initial_min_instance_count":5,"instance_max_count":10,"instance_min_count":1,"start_time":"10:00"},{"days_of_month":[5,15,25],"end_date":"2099-07-23","end_time":"19:30","initial_min_instance_count":5,"instance_max_count":10,"instance_min_count":3,"start_date":"2099-06-27","start_time":"11:00"},{"days_of_week":[4,5,6],"end_time":"18:00","instance_max_count":10,"instance_min_count":1,"start_time":"10:00"},{"days_of_month":[10,20,30],"end_time":"19:30","instance_max_count":10,"instance_min_count":1,"start_time":"11:00"}],"specific_date":[{"end_date_time":"2099-06-15T13:59","initial_min_instance_count":2,"instance_max_count":4,"instance_min_count":1,"start_date_time":"2099-06-02T10:00"},{"end_date_time":"2099-02-19T23:15","initial_min_instance_count":3,"instance_max_count":5,"instance_min_count":2,"start_date_time":"2099-01-04T20:00"}],"timezone":"Asia/Shanghai"}}

When I try to access the metricscollector, it says that the certificate is not valid.

apiserver/740d157f-8e3f-43fc-bd0b-28d3b43075aa:~$ curl https://metricscollector.service.cf.internal:6103 --insecure
curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate

curl -v https://metricscollector.service.cf.internal:6103 --cacert ca.crt 
* Rebuilt URL to: https://metricscollector.service.cf.internal:6103/
* Hostname was NOT found in DNS cache
*   Trying 10.244.4.7...
* Connected to metricscollector.service.cf.internal (10.244.4.7) port 6103 (#0)
* successfully set certificate verify locations:
*   CAfile: ca.crt
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Request CERT (13):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS alert, Server hello (2):
* error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate
* Closing connection 0
curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate

I attached my yml file.

app-autoscaler-release.zip

Any help would be really appreciated!

metricsgateway job failed on asnozzle VM with certificate error

I just deployed the app-autoscaler BOSH release 3.0.1 and it failed on updating the instance asnozzle:

Task 153575 | 21:51:51 | Updating instance asnozzle: asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0) (canary) (00:05:17)
L Error: 'asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0)' is not running after update. Review logs for failed jobs: metricsgateway
Task 153575 | 21:57:08 | Error: 'asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0)' is not running after update. Review logs for failed jobs: metricsgateway

Task 153575 Started Fri Sep 11 21:49:35 UTC 2020
Task 153575 Finished Fri Sep 11 21:57:08 UTC 2020
Task 153575 Duration 00:07:33
Task 153575 error

Updating deployment:
Expected task '153575' to succeed but state is 'error'

Exit code 1

When I checked the /var/vcap/sys/log/metricsgateway/metricsgateway.stdout.log, I found a lot of occurrences of this error:

{"data":{"error":"x509: certificate is valid for metricsserver.service.cf.internal, *.asmetrics.default.app-autoscaler.bosh, not de4b3b4d-de80-40d3-832e-67a7f49c6bf6.asmetrics.vlan200-cfar.app-autoscaler.bosh"},"log_level":2,"log_time":"2020-09-11T23:01:27Z","message":"metricsgateway.failed to start emitter","source":"metricsgateway","timestamp":"1599865287.905046940"}

It looked like the CN/SAN of the certificate for metrics server does not match with the DNS name used for metric_server_addrs parameter in the metricsgateway.yml:

$ cat /var/vcap/jobs/metricsgateway/config/metricsgateway.yml
logging:
level: info
envelop_chan_size: 1000
nozzle_count: 3
metric_server_addrs: ['wss://0e8d79dc-253f-4128-985d-d86cf161f902.asmetrics.vlan200-cfar.app-autoscaler.bosh:7103']
...

Any idea what is wrong and how to fix this problem?

example/operation/external-db.yml

Can we use db_scheme: mysql instead of db_scheme: postgres in the external-db.yml file to Deploy autoscaler with external mysql database because we are using mysql db scheme.

certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name after go upgrade.

After upgrading to go1.15, it looks like we are hitting this error when running any of the golang services - e.g.

021/07/22 11:20:27 failed-to-connection-to-database, dburl:postgres://postgres:*REDACTED*@autoscalerpostgres.service.cf.internal:5432/autoscaler?sslmode=verify-full&sslrootcert=/var/vcap/jobs/scalingengine/config/certs/scalingengine_db/ca.crt,  err:x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

Possible workaround is to set the environment variable GODEBUG=x509ignoreCN=0.

Error: undefined method `link' in /var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb

@qibobo Hi Qibobo, I tried to integrate the new autoscaler service with our old cloud foundry version, the same way as you bumped it into scf, but when I tried to start the autoscaler-metrics, there are following error encountered:
/var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb:21:in get_binding': undefined method link' for #Bosh::Template::EvaluationContext:0x0055a3cd047ce0 (NoMethodError)
from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in eval' from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in result'
from /opt/hcf/configgin/lib/app/lib/generate.rb:30:in generate' from /opt/hcf/configgin/lib/app/bin/configgin:47:in block (2 levels) in

'
from /opt/hcf/configgin/lib/app/bin/configgin:36:in each' from /opt/hcf/configgin/lib/app/bin/configgin:36:in block in '
from /opt/hcf/configgin/lib/app/bin/configgin:33:in each' from /opt/hcf/configgin/lib/app/bin/configgin:33:in '
Could you help to check?
Thanks and Regards.
HAO

Failed: Release SHA1 does not match the expected SHA1

I cloned the repository today and successfully uploaded the release.

minjeong@ubuntu:~/workspace/GitHub/PaaSXpert-AutoScaler/app-autoscaler-release$ bosh releases
Acting as user 'admin' on 'Bosh Lite Director'

+----------------+----------+-------------+
| Name           | Versions | Commit Hash |
+----------------+----------+-------------+
| app-autoscaler | 0+dev.1  | af3ece9f    |
| cf             | 268*     | 4057a140+   |
| cf-mysql       | 32       | 6c0314b     |
| cf-redis       | 428.0.0  | 2d766084+   |
| cflinuxfs2     | 1.138.0* | c88004ab+   |
| diego          | 1.23.0*  | edb126ad    |
| garden-runc    | 1.9.0*   | 3f4312b5    |
| grootfs        | 0.21.0   | f896e94     |
| routing        | 0.142.0  | af830ed7+   |
+----------------+----------+-------------+
(*) Currently deployed
(+) Uncommitted changes

Releases total: 9

When I tried to deploy the yml file, I received the following error.

Release manifest: /home/minjeong/workspace/GitHub/PaaSXpert-AutoScaler/app-autoscaler-release/dev_releases/app-autoscaler/app-autoscaler-0+dev.1.yml
Acting as user 'admin' on 'Bosh Lite Director'

Copying packages
----------------
common
nodejs
servicebroker
golang1.7
scalingengine
scheduler
pruner
metricscollector
java
eventgenerator
apiserver
db


Copying jobs
------------
servicebroker
scalingengine
scheduler
pruner
metricscollector
eventgenerator
apiserver


Copying license
---------------
license

Generated /tmp/d20170719-31102-19jxg35/d20170719-31102-1sn865/release.tgz
Release size: 390.9M

Verifying manifest...
Extract manifest                                             OK
Manifest exists                                              OK
Release name/version                                         OK


Uploading release
release.tgz:    96% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo     | 375.3MB  29.2MB/s ETA:  00:00:00
Director task 30
  Started extracting release > Extracting release. Done (00:00:04)

  Started verifying manifest > Verifying manifest. Done (00:00:00)

  Started resolving package dependencies > Resolving package dependencies. Done (00:00:00)

  Started creating new packages
  Started creating new packages > common/306e7eb1a8187457885ce91eb4bc22f5aed734e8. Done (00:00:00)
  Started creating new packages > nodejs/78d4ee5eeb7010fd7d15a1d2986992942940229f. Done (00:00:01)
  Started creating new packages > servicebroker/26d3c21dec7897b57b82d5208e73130b7f4e8ac2. Done (00:00:00)
  Started creating new packages > golang1.7/651d77736c6087be1ca1df72eb8e4d2e701778f9. Done (00:00:02)
  Started creating new packages > scalingengine/32d7cedf61f0db125fcdecdcb84a3032d01a331c. Done (00:00:00)
  Started creating new packages > scheduler/8459c8ef0e82f345720e2fbbab5924a74159da0e. Done (00:00:02)
  Started creating new packages > pruner/0a8952978a0c226f84db1a057df244df2cbd7975. Done (00:00:00)
  Started creating new packages > metricscollector/90223e57b1985adc1667c8bb9abd10764cdaac43. Done (00:00:00)
  Started creating new packages > java/d6f4a8bb4e3bfb6c4f3121f231f1a0c569d7fdaf. Done (00:00:02)
  Started creating new packages > eventgenerator/0ddd32b54bed4d93072336948cdb639691c05ee5. Done (00:00:00)
  Started creating new packages > apiserver/83eccb99910f547942da04b38eeb4681305cfec0. Done (00:00:03)
  Started creating new packages > db/5eb59eeffe739dcf942a45cc0c6e06996cbd8f45. Done (00:00:00)
     Done creating new packages (00:00:10)

  Started creating new jobs
  Started creating new jobs > servicebroker/945e3aa0cfa958275c5e7ed0f8d4fd8cb3fa6cb3. Done (00:00:01)
  Started creating new jobs > scalingengine/4398d6bc3b0d7236d21c9ac25258c192771b54f6. Done (00:00:00)
  Started creating new jobs > scheduler/190ace43d570ee3942012713847e9df492961aa2. Done (00:00:00)
  Started creating new jobs > pruner/480130a24acdd8079c6955f8c61890cc2a8788c7. Done (00:00:00)
  Started creating new jobs > metricscollector/4e4c7a25f668d80ead187d3ab8fb33e85f4497d3. Done (00:00:00)
  Started creating new jobs > eventgenerator/fa282337b2b68dc80c7cdf7fab0fce43a037495a. Done (00:00:00)
  Started creating new jobs > apiserver/db00aaf50a50ee6cbd7d31ca8b29a2f6eff7e938. Done (00:00:00)
     Done creating new jobs (00:00:01)

  Started release has been created > app-autoscaler/0+dev.1. Done (00:00:00)

Task 30 done

Started		2017-07-19 12:07:57 UTC
Finished	2017-07-19 12:08:12 UTC
Duration	00:00:15
release.tgz:    96% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo     | 376.3MB  13.0MB/s Time: 00:00:28

Release uploaded
Acting as user 'admin' on 'Bosh Lite Director'
Checking whether release postgres/17 already exists...NO
Using remote release 'https://bosh.io/d/github.com/cloudfoundry/postgres-release'

Director task 31
  Started downloading remote release > Downloading remote release. Done (00:01:54)

  Started verifying remote release > Verifying remote release. Failed: Release SHA1 'ad62d5d7e4b7875316ecd5b972f26ee842c4b605' does not match the expected SHA1 'b062e32a5409ccd4e4161337c48705c793a58412' (00:00:00)

Error 30015: Release SHA1 'ad62d5d7e4b7875316ecd5b972f26ee842c4b605' does not match the expected SHA1 'b062e32a5409ccd4e4161337c48705c793a58412'

Task 31 error

For a more detailed error report, run: bosh task 31 --debug

The autoscaler API connection to the CC API must skip ssl validation if self signed certs on the CC are used.

If the CC API is using a self signed cert or one provided by a private PKI the Autoscaler API must have: autoscaler.cf.skip_ssl_validation: true

There is no option to supply a trusted root cert for nodejs. Ideally the version of Nodejs the job is using could be compiled to just use the default system CA store like the Nodejs buildpack does. https://www.pivotaltracker.com/n/projects/1042066/stories/152254480

Question on MemoryUTIL

Hello ,

I have deployed autoscaler using master branch with the policy
{
"instance_min_count": 1,
"instance_max_count": 4,
"scaling_rules": [{
"metric_type": "memoryutil",
"stat_window_secs": 60,
"breach_duration_secs": 60,
"threshold": 49,
"operator": "<",
"cool_down_secs": 60,
"adjustment": "-1"
}, {
"metric_type": "memoryutil",
"stat_window_secs": 60,
"breach_duration_secs": 60,
"threshold": 50,
"operator": ">",
"cool_down_secs": 60,
"adjustment": "+1"
}]
}

Below is the app usage .

According to policy the application should scale + 1 based on memory usage, but its not .

Am i doing anything incorrect ?

instances: 1/1
usage: 1G x 1 instances
urls: hello-world-new.run.us2.covapp.io
last uploaded: Thu Jul 26 19:27:31 UTC 2018
stack: covs-internal-stack
buildpack: covs-java-III

     state     since                    cpu    memory         disk           details
#0   running   2018-07-27 01:10:30 PM   0.1%   679.2M of 1G   317.7M of 1G
covladmins-MacBook-Pro-6:FakePolicy nsharma$ cf app hello-world
Showing health and status for app hello-world in org paas / space properties as nsharma...
OK

requested state: started
instances: 1/1
usage: 1G x 1 instances
urls: hello-world-new.run.us2.covapp.io
last uploaded: Thu Jul 26 19:27:31 UTC 2018
stack: covs-internal-stack
buildpack: covs-java-III

     state     since                    cpu    memory         disk           details
#0   running   2018-07-27 01:10:30 PM   0.1%   679.2M of 1G   317.7M of 1G ```

eventgenerator can't start up.

Hi,
Eventgenerator.yml.erb files can't be processed successfully, please kindly check following logs:
...
/var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb:21:in get_binding': undefined method link' for #Bosh::Template::EvaluationContext:0x00560eea0c1418 (NoMethodError)
from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in eval' from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in result'
from /opt/hcf/configgin/lib/app/lib/generate.rb:30:in generate' from /opt/hcf/configgin/lib/app/bin/configgin:47:in block (2 levels) in

'
from /opt/hcf/configgin/lib/app/bin/configgin:36:in each' from /opt/hcf/configgin/lib/app/bin/configgin:36:in block in '
from /opt/hcf/configgin/lib/app/bin/configgin:33:in each' from /opt/hcf/configgin/lib/app/bin/configgin:33:in '
...
Please kindly help, thanks!

App autoscaler fails to start, settings.json invalid.

attempting to deploying app autoscaler 3.0.0 results in the apiserver failing to start with a message that settings.json is invalid.

from apiserver.stderr.log

Error: settings.json is invalid
at module.exports (/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/app.js:16:11)
at Object. (/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/index.js:25:56)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/app.js:16
throw new Error('settings.json is invalid');

from apiserver.stdout.log

{"timestamp":"2020-03-20T05:41:05.864Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:41:45.986Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:42:26.115Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:43:06.282Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:43:46.384Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:44:26.513Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}

Note: For deployment we used app-autoscaler-deployment-v1.yml file.

Metricsforwarder logs ssl validation error when forwarding metrics

Using the deployment template I'm seeing this log output for the metrics-forwarder:

{"data":{"metric":{"app_guid":"9aa474dc-7b6d-4cb1-bbf9-2ffb7d23c0d7","instance_index":0,"name":"custom","unit":"test-unit","value":1000},"session":"4"},"log_level":0,"log_time":"2020-06-29T11:56:46Z","message":"metricsforwarder.custom_metrics_server.custom-metric-emit-request-received:","source":"metricsforwarder","timestamp":"1593431806.530002832"}
{"data":{"data":[{"code":14,"message":"all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate is valid for reverselogproxy, not metron\""}],"session":"4.1"},"log_level":0,"log_time":"2020-06-29T11:56:46Z","message":"metricsforwarder.custom_metrics_server.metric_forwarder.Error while flushing: %s","source":"metricsforwarder","timestamp":"1593431806.552621126"}
{"data":{"count":1,"session":"5"},"log_level":0,"log_time":"2020-06-29T11:57:13Z","message":"metricsforwarder.PolicyManager.policycount","source":"metricsforwarder","timestamp":"1593431833.567957401"}

I don't understand what the cause of the error message is:

transport: authentication handshake failed: x509: certificate is valid for reverselogproxy, not metron

I'm assuming that this happens while metricsforwarder is trying to forward to the local loggregator_agent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.