Saltside recently migrated production applications to Kubernetes using Helm to deploy the application. This post describes how we built our chart to fit Saltside’s unique business requirements.
Saltside makes classifieds (user posted ads, not secret information)
sites. We operate in multiple markets:
Tonaton.com, Bikroy.com,
Efritin.com, and Ikman.lk.
I’ll refer to this as MARKET
going forward. I suggest you checkout
the sites. You’ll notice they look and feel the same. That’s by
design!
Market infrastructure is deployed, scaled, and operated separately
from the others. This ensures each market’s infrastructure scales
correctly for load and other requirements while guarding against
failures propagating to others. Deploying separately requires
different configuration for each MARKET
. Configurations must be
tested in different stages in the deployment pipeline. Example stages
are production or uat. I’ll refer to this as STAGE
going forward.
Naturally this propagates down to individual codebases. Each code base
must support different MARKET
and STAGE
values. Our product is
composed of ~20 independent code bases that may themselves be composed
of multiple deployed processes (such as a thrift server, web server,
or working off messages in queue). I’ll use COMPONENT
to refer to
components or code bases and PROCESS
to refer processes inside each
COMPONENT
.
You may be thinking “this sounds like a lot of stuff!” You’re right.
It is. The total possible configuration is MARKET
_ STAGE
_
COMPONENT
* PROCESS
. You can learn more about our previous setup
in my talk at DevOps Days India
2016. This the jumping
off point for discussing our Kubernetes solution.
Requirements
The big idea is that we’ll move from independently configured and
deployed COMPONENT
to centrally configured and deployed mono-repo.
The implementation is no surprise from the title. The entire product
(all configuration for MARKET
, STAGE, COMPONENT
, and PROCESS
)
are wrapped up in a single repo that builds Helm
charts. I’m not going into the reasons why we chose Helm. There are
plenty of articles, talks, and presentations on what Helm is and what
it can do. Let’s move onto specifics we need from our Helm chart.
Market, Stage, Component and Process Configuration
MARKET
, STAGE
, COMPONENT
, and PROCESS
specific environment
variables and secrets. There is a hierarchy between those 4 values.
MARKET
is the most general and PROCESS
is the most specific. A
specific COMPONENT
may need different environment variables
depending on MARKET
or STAGE
. PROCESS
may need the same. Here’s
a concrete example. The same BUGSNAG_API_KEY
may apply to all
PROCESS
inside a given COMPONENT
. However something like
EXTERNAL_REDIRECT_URL
may need different values depending on
MARKET
or STAGE
.
A Single Values File
No separate values files or requirements to pass -f
or --set
on
helm install
or helm upgrade
. There are too many possible
configurations. That also forces engineers to mange these files or
pass in command line flags is simply unmanageable in our problem
domain.
Encrypted Secrets Values
Our current solution is subpar in this area so we’d like to, at a minimum, encrypt sensitive data that’s committed to SCM.
Preconfigured Charts for MARKET and STAGE
This is somewhat a follow up to the previous point. The idea here is
that anyone can run a command like helm install $MARKET
and
everything will work as expected. Preconfigured versions for different
STAGE
values are communicated through different semantic versions.
This requirement opens up many use cases such as automatic environment
creation from topic branches or allowing any engineer to stand up test
environments against arbitrary configurations.
Easier or Better than our Current Solution
Different COMPONENT and PROCESS configurations should be as easy as current solution or easier. This one is subjective so here’s some background. Different COMPONENT have roughly the same shape. All COMPONENT values share the same Docker image. Different PROCESSS run as different containers using that Docker image. This may require a different command, ports, environment variables, or other things. This generally boils down to expressing configuration at our abstraction level rather than Helm’s.
Deployable to any Kubernetes Cluster
The chart should be deployable on any Kubernetes cluster. This means it should be self contained, specifically it should contain an image pull secret for access to private Docker images.
Those are the high level requirements around chart. Let’s see how this requirements drive the implementation.
Implementation at 10,000ft
The requirements dictate a single values.yaml file and a build process that can spit out preconfigured charts for different M and S values. Here’s what this looks like for us:
- Everything declared in values.yaml with a ton of YAML anchors (more on that later)
script/lint-values
testsvalues.yaml
for correct semantics. Example: each combination of M, S, C, and P have declared environment variables (so range in chart templates works as expected).- Secret values kept in a
secrets.yaml
managed with git-crypt. We initially tried using Ansible vault but aborted that effort. More on that later as well. - A parameterized build script that injects the correct values for
MARKET
andSTAGE
intovalues.yaml
before building the final chart. The productionSTAGE
produces a chart namedMARKET-VERSION
. The sandboxSTAGE
produces a chart namedMARKET-VERSION.betaN
. AnySTAGE
is supported through different version modifiers. The versioning information is determined by the commit and git branch. - Chart templates expect a specific structure in
$.Values
to generate each Service, Deployment, and Pod etc. This allows defining everything aboutCOMPONENT
andPROCESS
invalues.yaml
without creating new templates.
In short, our mono repo is a template for producing MARKET * STAGE
Helm charts. Let’s dive into implementation specifics.
Templates & Values
The chart templates are straight forward. There is one template for
Service, Deployment, and Secret. Each template produces multiple
resources (separated by —) in YAML from range functions over various
$.Values
. The charts essentially loop over
$.Values.applications[].containers[]
(where application refers to
COMPONENT
and containers is PROCESS
). One Deployment is created
for each COMPONENT
and PROCESS
combination because this is how
our system scales. One Service is generated for each COMPONENT
with ports for each PROCESS
. “Our” is emphasized on purpose. Your
application may be different.
Our values.yaml
and secrets.yaml
rely on YAML anchors. This is
especially useful for “overriding” the
MARKET/STAGE/COMPONENT/PROCESS
hierarchy. Initially this started out
in in the templates (one loop for each). It was easier to move
everything into values using YAML anchors.
The values.yaml
defines images for each COMPONENT
value; as well
as common things like MARKET
and STAGE
values. There are many
shared environment variables (like MARKET
, STAGE
, RACK_ENV
, or
NODE_ENV
) that are defined as anchors and appended to. Let’s look at
the main anchors. Anchors are prefixed with _ to prevent accidental
clashes with real variables. They are also annotated with comments so
readers know that these are not intended for direct use, but only
referenced later in the file. Here is an example:
_anchors:
common_env: &COMMON_ENV
APP_VERSION: "${CHART_ID}"
APP_ENV: "${CHART_STAGE}"
NODE_ENV: "${CHART_STAGE}"
RACK_ENV: "${CHART_STAGE}"
RAILS_ENV: "${CHART_STAGE}"
MARKET: "${CHART_MARKET}"
STATSD_URL: "udp://localhost:8125"
LOG_LEVEL: info
# TODO: AWS variables
The ${VARIABLE}
is special. That’s inserted by the build process
with envsubst. I’ll cover that
later on. This anchor is reused in each STAGE
and COMPONENT
declaration as shown below:
env:
admin_service:
production: &ADMIN_SERVICE_PRODUCTION_ENV_VARS
<<: *COMMON_ENV
sandbox: &ADMIN_SERVICE_SANDBOX_ENV_VARS
<<: *COMMON_ENV
# FIXME: This is wrong!
APP_ENV: development
AWS_ACCESS_KEY_ID: foo
AWS_SECRET_ACCESS_KEY: foo
Injecting COMMON_ENV
anchor into the COMPONENT * STAGE
context is
easy as well as adding new values (in the sandbox STAGE
in this
example). That COMPONENT * STAGE
configuration is assigned a new
anchor name which can be customized by MARKET
or PROCESS
later on.
Here’s an MARKET
example:
env:
admin_service:
production:
bikroy:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
ikman:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
tonaton:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
efritin:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
sandbox:
bikroy:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
ikman:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
tonaton:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
efritin:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
It’s not the prettiest but it works. It removes effort for shared
configuration and makes it easy override all levels in the hierarchy.
This same structure applies to secrets.yaml
.
Here’s an example COMPONENT
and PROCESS
configuration from
values.yaml:
- name: image_service
tier: thrift
service_type: ClusterIP
containers:
- name: thrift
command:
- "thrift-server"
ports:
container: 9090
service: 9090
livenessProbe:
initialDelaySeconds: 5
tcpSocket:
port: 9090
resources:
sandbox:
requests:
memory: 128Mi
production:
requests:
memory: 128Mi
cpu: 50m
limits:
memory: 128Mi
cpu: 100m
We mix custom structures and existing Kubernetes ones (like livenessProbes inside resources). This works well because data may directly dumped into the manifest.
This gist is our complete deployment template. That there are things
that I’m not going to describe in this post. Note the use of index in
combination with $.Values.market
, $.Values.stage
, and
$.Values.topology.pods
. Also note that generating JSON init
containers is quite strange inside YAML since YAML is white space
sensitive but JSON requires delimiters.
Kubernetes Services are declared through a combination of custom data
structures and previously shown COMPONENT
and PROCESS
configuration.
Repo Organization & Building
I find this section the most interesting personally since it demonstrates what you can do with Helm when you throw some Bash into the mix. It requires a bit of funkiness to get everything to work. I’ll touch on those the most because they highlight curious choices in Helm chart packaging.
Here’s the tree:
.
├── Makefile
├── README.md
├── VERSION
├── chart
│ ├── charts
│ └── templates
│ ├── _helpers.tpl
│ ├── app_deployments.yaml
│ ├── app_services.yaml
│ ├── sandbox.yaml
│ ├── secrets.yaml
│ └── smoke_test.yaml
├── config
│ ├── Chart.yaml
│ └── values.yaml
├── script
│ ├── await-release
│ ├── build
│ ├── ci
│ │ └── test
│ ├── clean-releases
│ ├── lint-manifest
│ ├── lint-values
│ ├── package
│ ├── publish
│ ├── test-chart
│ ├── test-release
│ └── yaml2json
├── secrets.yaml
Building
make coordinates the entire process. /chart
and /config
are the
most relevant directories for this section. /chart
contains files
that are not parameterized and may be copied directly into the final
output directory. /config
contains files that are parameterized so
each must pass through envsubst
before coping into the final output
directory. This is where ${MARKET}
etc is inserted into the final
values.yaml file.
script/build
coordinates the process. It determines the correct
VERSION
along all other template variables, creates a specifically
named directory for the chart, copies over everything in /templates
,
templates everything in /config
and dumps that into the output
directory. It’s shared in its entirety:
You’ll also notice chart our versioning semantics coded into that file. Astute readers will notice that this script does not package or publish the chart. This script only prepares the directory. That is enough to install the chart for testing. Other scripts produce the final.tgz archive and upload to our internal chart repository. There is nothing interesting in those script. They accept a directory as an argument and call the various helm commands.
This approach seems is working well enough for our production continuous delivery pipeline. The CD pipeline is a entire post in itself. Check back in the future for all the details once all the kinks are sorted out. I want to close out this post by discussing things we tried that didn’t work out.
What Didn’t Work
- Ansible vault for secrets. Internally we use Ansible to automate
processes and glue various workflows together. We did not want to
commit secrets in plain text to source control, but we could accept
encrypting them in source control. Ansible vault was a natural fit
for us. It worked in the beginning then we hit increased friction
levels beyond the dev stage. We generated a Kubernetes Secret using
Ansible templates and variables. The chart required a
$.Values.secret_name
. This meant secret must exist in the cluster(s) where the chart would be installed on. This became more problematic as we have multiple Kubernetes clusters for different stages (namely production & non-production). This also opened up a question of what to do if the chart itself failed tests? The secret would need to be deleted from the clusters. That is possible but it’s an avoidable problem. Everything became much easier when switched to git-crypt which made the chart self-contained. I advise teams adopting Helm start and continue with git-crypt until it becomes a pain point. - Duplicating configuration in templates. Notice the Deployment
template (naturally) defines the template pods and their init
containers. The init containers use the same Docker image as
COMPONENT
to run initPROCESS
. EachPROCESS
generally requires the same set of configuration values (such asCOMPONENT
specific values likeMYSQL_URL
orMARKET
orSTAGE
global values). Previous versions duplicated generation in different parts of the templates. Things became much easier by moving things into values.yaml through YAML anchors. I doubt other teams will encounter problems of this scope. Regardless it’s an interesting approach to keep in your back pocket. - This is tangentially related to the chart but I think it’s important to discuss since it impact the chart’s usability. Initially we wanted to pre-authenticate all cluster nodes to our Docker registry. We did this through updated the default ServiceAccount to also use a custom image pull secret. This seemed like a good idea in the beginning, but turned out to be a bad one when it came down to it. All this did was introduce more pre-reqs to where the chart could be installed. In the end it was easier to create an image pull secret as part of chart (also not an issue since git-crypt manages our plain text secrets) and use that in all the templates. This was another step in making the chart more self contained.
- The chart’s flexibility correlates with the internal application’s
(
COMPONENT
andPROCESS
). Our process currently has two stages: production and sandbox. Unfortunately our code bases are in various stages of being completely configured by environment variables and/or command line options. We cannot easily introduce a new stage because many code bases hard coded logic on something likeNODE_ENV
orRACK_ENV
. This is reminder that hard coding conditional is an anti pattern and should be avoided.
What’s Next?
We plan for future posts documenting our test strategy, workflow around the chart, and continuous delivery (and hopefully deployment!) processes. Stay tuned for more information. Also don’t hesitate to leave a comment or ask a question. You can also find the SRE team on the Kubernetes slack in #helm-users, #helm-dev, #in-users, and #kops if you want to chat with us.
Good luck out there and happy shipping!