GMP-CF-0002 - Optionally Removing Consul in 1.3

Note: previously, this GMP had stated that 1.4.0 of the Cloud Foundry Genesis Kit would be when the Consul components were officially removed. 1.4.0 was released on May 2nd, 2019, without this removal. This GMP has been updated to instead target the 1.5.0 release.

Overview

With BOSH DNS a mature product and an integral component in many BOSH deployments, we have begun replacing Consul service discovery with BOSH DNS. The Cloud Foundry Genesis Kit v1.3 is the first step in this two-step phase. v1.3 colocates BOSH DNS and disables Consul DNS, but does not remove Consul entirely. The result is a seamless upgrade from v1.2.X while also leveraging BOSH DNS for service discovery.

It is our plan to fully remove Consul from the Cloud Foundry Genesis Kit in v1.5, which will include the deletion of all consul_etcd virtual machines. This is the second step of the two-step phase, which enables a zero-downtime upgrade path from Consul DNS to BOSH DNS.

However, if an operator wishes to remove all Consul related services from their Cloud Foundry deployment without waiting for v1.5, they may do so with a temporary feature flag implemented just for v1.3 called migrate-1.3-without-consul. This GMP will explain the steps necessary to use this feature.

Impact

The impact of this feature is minimal to the runtime and stability of your Cloud Foundry deployment. The decision to either apply or not apply this feature to your deployments will not impact future upgradability of this environment. No extra steps are necessary to maintain your Cloud Foundry environment after this GMP is completed.

There is no downtime incurred by applying this feature.

The Process

Determining Eligibility

Operators who wish to remove all Consul services from their Cloud Foundry deployment are only able to do so if their Genesis Cloud Foundry is version v1.3. To determine the version of the deployed Cloud Foundry environment, run the following command within your cf-deployments folder:

genesis info [environment name]

Where [environment name] is the name of the Cloud Foundry environment you wish to check.

The result will look like this:

CF Deployment for Environment 'genesis-lab'

  Last deployed about 30 minutes ago (08:32PM on Dec 19, 2018 UTC)
            by David
        to BOSH genesis-lab
  based on kit cf/1.3
          using Genesis v2.6.12
  with manifest .genesis/manifests/buffalo-lab.yml (redacted)

The based on kit line must read cf/1.3. If it does not, please upgrade your Cloud Foundry installation with the v1.3 Cloud Foundry Genesis Kit before starting.

If your based on kit line reads cf/1.3, and you wish to remove Consul entirely, please continue reading this guide.

Removing Consul

After ensuring the environment you wish to remove Consul from is on v1.3, edit your Genesis Environment Manifest and append migrate-1.3-without-consul to the feature list array. The end result will be something similar to:

---
kit:
  name:   cf
  version: 1.3
  features:
  - local-blobstore
  - local-ha-db
  - migrate-1.3-without-consul
params:
  env:   genesis-lab
  [...]

Once the feature has been added to your Genesis Environment Manifest, apply the changes with a genesis deploy:

genesis deploy [environment name]

This will remove any consul_agents colocated on VMs, as well as removing all consul_etcd VMs currently deployed. There will be no downtime during this deployment.

Deployments With Multiple Cell Instance Groups

If your deployment has multiple instance groups for cells (possibly for isolation segments), then you will need to add additional DNS aliases, or else the Auctioneer will not be able to locate Rep instances on the additional instance groups. As a result, applications will not be deployed to the additional cell instance groups.

To map the correct DNS aliases to the additional cell instance groups, add the following to your environment file.

addons:
- name: bosh-dns-aliases
  jobs:
  - name: bosh-dns-aliases
    release: bosh-dns-aliases
    properties:
      aliases:
      - domain: '_.cell.service.cf.internal'
        targets:
        - (( append ))
        - query: '_'
          instance_group: extra-cell-instance-group-1-name
          deployment: (( grab name ))
          network: (( grab params.cf_runtime_network ))
          domain: bosh
        - query: '_'
          instance_group: extra-cell-instance-group-2-name
          deployment: (( grab name ))
          network: (( grab params.cf_runtime_network ))
          domain: bosh

Add a block under the targets array for each additional instance group that you have. Change the instance_group value to the name of each additional cell instance group. To be clear, you do not need to add one for the instance group named cell.

Verification & Closing Notes (Final step)

Once the deployment is complete, you can verify that Consul is no longer used in your environment by running:

bosh -e [environment name] -d [deployment name] vms

Where [environment name] is the name of your BOSH director alias, and [deployment name] is the name of your Cloud Foundry environment.

If you do not see any consul_etcd VMs listed, the feature was successfully applied.

It is necessary to keep the migrate-1.3-without-consul feature until the v1.5 update is released. Genesis Cloud Foundry Kit v1.5 will remind you to remove the feature, as it will no longer be needed.

Help & Support

If you have concerns about the impact of this migration process, or need assistance running through it, please don't hesitate to find us in #help on Slack.