Source Allies

Database Migrations at Scale with Fargate and Step Functions

2024-04-15T00:00:00-05:00

As holders of the Migration Consulting Competency from AWS, we are often deeply involved in cloud migration projects. We recently tackled a seemingly straight forward migration of an on-premise Postgres database to AWS Aurora. This database was a data warehouse that was shared by many teams and contained more than 60 schemas with over 8TB of data.

Since this system was used for business operations, we wanted a migration strategy that would limit or eliminate downtime. The Database Migration Service advertises itself as a serverless solution for migrating data from an on-premise database to an RDS database without downtime by leveraging Change Data Capture events. While proving out this service, we encountered two insurmountable issues; Firstly, DMS had an issue with the extensive use of Postgres table inheritance. It would see records in the child tables as appearing in both the parent table as well as the child. Secondly, this particular database included many large tables that contained TEXT columns full of JSON. In our testing, these columns would regularly fail to load into the Aurora RDS database or be extremely slow to load.

We then considered using a Snowball to move the data. This would allow us to create a local database dump within the data center, save it on the Snowball device, then AWS can load that dump into S3 and finally we restore the S3 file into the target database. We ruled out this option because it would prevent us from making data changes between the dump and load steps, which could be several days to a week.

Finally, we landed on using the native pg_dump and pg_restore. This is the simplest way to move data between two Postgres databases. There are many ways to execute these commands, but a simple way that pipes directly between two database is to do something like the following:

pg_dump --host=source_server --dbname=source_db --format=custom | pg_restore --host=target_server --dbname=target_db --clean

For a small database, this could be executed on a workstation that has access to both the source and target servers. In our situation though, we were moving a large amount of data that would take hours to transfer. We also had a short amount of downtime and couldn’t afford to be reliant on manual steps.

In order to address the first concern, we decided to execute the dump/restore within AWS instead of locally. This shortens the network path between the databases and removes the reliability of a workstation from the mix. We could have spun up an EC2 instance. But, instead, we decided to create a Fargate Task Definition. Since we are just running native Postgres commands we can actually launch the public Postgres image. Here is an example Cloudformation snippet:

ECSCluster:
  Type: AWS::ECS::Cluster

MigrationTask:
  Type: AWS::ECS::TaskDefinition
  Properties:
    TaskRoleArn: !GetAtt TaskRole.Arn
    ExecutionRoleArn : !GetAtt TaskRole.Arn
    NetworkMode: awsvpc
    RequiresCompatibilities: ["FARGATE"]
    Cpu: 1
    Memory: 2048
    ContainerDefinitions:
      - Name: postgres
        Image: public.ecr.aws/docker/library/postgres:alpine
        Essential: true
        Environment:
          - Name: SOURCE_HOST
            Value: "source_host"
          - Name: TARGET_HOST
            Value: !GetAtt DatabaseCluster.Endpoint.Address
        LogConfiguration:
          LogDriver: awslogs
          Options:
            "awslogs-group": !Ref LogGroup
            "awslogs-region": !Ref AWS::Region
            "awslogs-stream-prefix": "db-migration"
        Secrets:
          - Name: SOURCE_PASSWORD
            ValueFrom: !Sub "${SourceSecret.Arn}:password::"
          - Name: TARGET_PASSWORD
            ValueFrom: !Sub "${Database.MasterUserSecret.SecretArn}:password::"
        Entrypoint: ['bash', '-c']
        Command: |
          set -e
          echo "${SOURCE_HOST}:5432:source_db:postgres:${SOURCE_PASSWORD}" >> ~/.pgpass
          echo "${TARGET_HOST}:5432:target_db:postgres:${TARGET_PASSWORD}" >> ~/.pgpass
          chmod 0600 ~/.pgpass
          pg_dump --host=${SOURCE_HOST} --dbname=source_db --no-password --format=custom \
            | pg_restore --host=${TARGET_HOST} --dbname=target_db --no-password --clean

We added the above configuration to the CloudFormation template responsible for deploying our Aurora Database. In order to execute a migration, all we had to do was start a task using this definition and provide the subnet to launch on. This worked well in testing but wasn’t quite what we were looking for: It was still a manual effort to kick this off, and it was very slow. We only had around 24 hours of downtime and this process took longer than that even in lower environments. It was also fragile as a network hiccup or any error would stop the entire process. What we needed was the ability to break down the work into smaller, atomic units that could be retried and run them in parallel.

AWS Step Functions is a service that can coordinate work between other AWS services. Since it Supports AWS EC/Fargate we can use it to launch the migration tasks for each schema. These can be run in parallel, with a concurrency limit. In addition, each “state” within the step function can be configured with automatic retries. In cloudformation, a state machine looks like this:

MigrationStateMachine:
  Type: AWS::Serverless::StateMachine
  Properties:
    Policies:
      - Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - iam:PassRole
            Resource: !GetAtt TaskRole.Arn
          - Effect: Allow
            Action:
              - ecs:RunTask
            Resource: !Ref MigrationTask
          - Effect: Allow
            Action:
              - ecs:StopTask
              - ecs:DescribeTasks
            Resource: "*"
          - Effect: Allow
            Action:
              - events:PutTargets
              - events:PutRule
              - events:DescribeRule
            Resource: !Sub "arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/StepFunctionsGetEventsForECSTaskRule"
    Definition:
      StartAt: SetSchemas
      States:
        SetSchemas:
          Type: Pass
          Result:
            schemas:
              - accounting
              - finance
              - risk
              - agent_management
              - sales
              - products
              - history
          ResultPath: "$"
          Next: LoadSchemas
        LoadSchemas:
          Type: Map
          ItemsPath: "$.schemas"
          MaxConcurrency: 5
          ItemProcessor:
            StartAt: LoadSchema
            States:
              LoadSchema:
                Type: Task
                Resource: "arn:aws:states:::ecs:runTask.sync"
                Parameters:
                  Cluster: !Ref ECSCluster
                  TaskDefinition: !Ref MigrationTask
                  LaunchType: FARGATE
                  NetworkConfiguration:
                    awsvpcConfiguration:
                      subnets: ["sn-23451", "sn-5483671"]
                  Overrides:
                    ContainerOverrides:
                      - Name: postgres
                        Environment:
                          - Name: SCHEMA_NAME
                            "Value.$": "$"
                Retry:
                  - ErrorEquals: [ "States.ALL" ]
                    MaxAttempts: 5
                End: true

          End: true

In order to solidify this solution, we executed our migration over and over in our development environment. Then, once that was working we ran the migration in production every weekend leading up to our go-live date. This process gave us the confidence on the reliability of the process and the timing we needed within the larger deployment timeline. We were ultimately able to migrate all 8TB of data on-premise to AWS Aurora within our 24 hour window.

A Hands-On Tour of Kubernetes: Part 4 - Deployments and Replication

2024-04-01T00:00:00-05:00

Deployments

So far we’ve been deploying pods directly. This has been a great way to gain familiarity with Kubernetes, but typically, we’ll rely on some other workload resource to create our pods for us. In this section, we’ll look at the Deployment resource.

Creating a deployment isn’t too different from creating a pod directly. We specify a name/namespace along with a container image. However, when we create a deployment, we can also use --replicas to tell it how many pod replicas to create. All the pods created by a deployment use the same image, but they are given a unique, randomly-generated name.

Let’s test this out. We’ll start by creating a namespace for ourselves

$ kubectl create namespace galaxy
namespace/galaxy created

Now, let’s create our first deployment. In this case, we’ll specify that we want three replicas:

$ kubectl create deployment star --image=jmalloc/echo-server:0.3.6 --replicas=3 --namespace=galaxy
deployment.apps/star created

Like pods, namespaces, and services, we can use kubectl get to list our deployments:

$ kubectl get deployments --namespace=galaxy
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
star   3/3     3            3           6s

This output shows that our deployment, star, created the three pods as requested. Let’s list those three pods:

$ kubectl get pods --namespace=galaxy
NAME                    READY   STATUS    RESTARTS   AGE
star-796b6dcc77-q4lhd   1/1     Running   0          4m57s
star-796b6dcc77-j87bk   1/1     Running   0          4m57s
star-796b6dcc77-kvtfk   1/1     Running   0          4m57s

Before, when we managed pods directly, deleting a pod meant it was truly deleted from the cluster. In comparison, deployments will make sure there are the correct number of pods. If we delete a pod that is part of a deployment, the deployment will automatically create a new pod.

$ kubectl delete pod star-796b6dcc77-q4lhd --namespace=galaxy
pod "star-796b6dcc77-q4lhd" deleted

$ kubectl get pods --namespace=galaxy
NAME                    READY   STATUS    RESTARTS   AGE
star-796b6dcc77-j87bk   1/1     Running   0          7m46s
star-796b6dcc77-kvtfk   1/1     Running   0          7m46s
star-796b6dcc77-7qlht   1/1     Running   0          5s

We’re not stuck with three replicas though. We can scale our deployment to adjust the replicas count. Let’s scale up our replica count:

$ kubectl scale deployment star --replicas=5 --namespace=galaxy
deployment.apps/star scaled

$ kubectl get pods --namespace=galaxy
NAME                    READY   STATUS    RESTARTS   AGE
star-796b6dcc77-j87bk   1/1     Running   0          11m
star-796b6dcc77-kvtfk   1/1     Running   0          11m
star-796b6dcc77-7qlht   1/1     Running   0          3m58s
star-796b6dcc77-lqst5   1/1     Running   0          5s
star-796b6dcc77-57qfg   1/1     Running   0          5s

And now, let’s scale down our replica count:

$ kubectl scale deployment star --replicas=2 --namespace=galaxy
deployment.apps/star scaled

$ kubectl get pods --namespace=galaxy
NAME                    READY   STATUS    RESTARTS   AGE
star-796b6dcc77-j87bk   1/1     Running   0          12m
star-796b6dcc77-kvtfk   1/1     Running   0          12m

Deleting a deployment will remove all the pods it created:

$ kubectl delete deployment star --namespace=galaxy
deployment.apps "star" deleted

$ kubectl get pods --namespace=galaxy
No resources found in galaxy namespace.

Deployments and services play nicely together.

Let’s recreate our deployment and look at the labels:

$ kubectl create deployment star --image=jmalloc/echo-server:0.3.6 --replicas=3 --namespace=galaxy
deployment.apps/star created

$ kubectl get pods --show-labels --namespace=galaxy
NAME                    READY   STATUS    RESTARTS   AGE   LABELS
star-796b6dcc77-hkzpm   1/1     Running   0          17s   app=star,pod-template-hash=796b6dcc77
star-796b6dcc77-7bmqd   1/1     Running   0          17s   app=star,pod-template-hash=796b6dcc77
star-796b6dcc77-rq5kv   1/1     Running   0          17s   app=star,pod-template-hash=796b6dcc77

Notice how all the pods have an app=star label. A Deployment will attach the same labels to all its pods. This plays nicely with Services. We can use the expose command to create a Service, using app=star as the selector.

$ kubectl expose deployment/star --name=star --port=8080 --selector=app=star --namespace=galaxy
service/star exposed

Create our wget pod

$ kubectl run spaceship --image=alpine:3.19 --namespace=galaxy --command -- sleep infinite
pod/spaceship created

Now let’s call our service from the pod:

$ kubectl exec pod/spaceship --namespace=galaxy -- wget -q -O- http://star:8080
Request served by star-796b6dcc77-hkzpm

HTTP/1.1 GET /

Host: star:8080
User-Agent: Wget
Connection: close

As we learned previously, the Service will load balance requests across the pods with matching labels. Since a Deployment creates pods with the same labels, our requests will be load-balanced across them.

$ kubectl exec pod/spaceship --namespace=galaxy -- wget -q -O- http://star:8080
Request served by star-796b6dcc77-rq5kv

HTTP/1.1 GET /

Host: star:8080
User-Agent: Wget
Connection: close

(Note that the hostname is different. You may need to execute the request several times to see the different hostnames)

It is possible to scale to 0 replicas, by the way. Don’t think of this as a lambda, however. Pods won’t automatically spin up when a request is received.

$ kubectl scale deployment star --replicas=0 --namespace=galaxy
deployment.apps/star scaled

$ kubectl get pods --namespace=galaxy
NAME        READY   STATUS    RESTARTS   AGE
spaceship   1/1     Running   0          16m

$ kubectl exec pod/spaceship --namespace=galaxy -- wget -q -O- http://star:8080
wget: can't connect to remote host (10.43.215.214): Connection refused
command terminated with exit code 1

Our request to the Service fails because no pods running.

Deployments also make it easy to update our application image. Great for deploying new versions of our app.

Let’s create a new deployment for ourselves to see how this works.

$ kubectl create deployment planet --replicas=3 --image=nginx:1.21 --namespace=galaxy
deployment.apps/planet created

$ kubectl get pods --namespace=galaxy
NAME                      READY   STATUS    RESTARTS   AGE
spaceship                 1/1     Running   0          17m
planet-7dbd5686dd-wgnhh   1/1     Running   0          15s
planet-7dbd5686dd-92pfj   1/1     Running   0          15s
planet-7dbd5686dd-8l2xs   1/1     Running   0          15s

Let’s say we are ready to deploy a new version of our app. We can do:

$ kubectl set image deployment/planet nginx=nginx:1.22 --namespace=galaxy
deployment.apps/planet image updated

$ kubectl get pods --namespace=galaxy
NAME                      READY   STATUS    RESTARTS   AGE
spaceship                 1/1     Running   0          20m
planet-78c595dc85-vqw8j   1/1     Running   0          38s
planet-78c595dc85-nkjsr   1/1     Running   0          27s
planet-78c595dc85-jv8cx   1/1     Running   0          25s

Note: the first part of nginx= (the name to the left of the equals) refers to the container name in the pod. The container name can be verified by looking at the details of the pod or deployment using kubectl describe. With how we’ve been creating resources, the container name matches the name of the container image without the tag or repository information.

Depending on how quickly you run the commands, you may see extra pods with a ContainerCreating or Terminating status. For example:

NAME                      READY   STATUS              RESTARTS   AGE
spaceship                 1/1     Running             0          19m
planet-7dbd5686dd-wgnhh   1/1     Running             0          2m37s
planet-7dbd5686dd-92pfj   1/1     Running             0          2m37s
planet-7dbd5686dd-8l2xs   1/1     Running             0          2m37s
planet-78c595dc85-vqw8j   0/1     ContainerCreating   0          8s

This is because the Deployment uses a RollingUpdate rollout strategy. Kubernetes incrementally replaces instances of your old version with the new one. This ensures that at least one instance of your application is always running, leading to zero downtime.

Let’s clean up before moving on:

$ kubectl delete namespace galaxy

Demo Application

So far, we’ve been using existing images, but how do we deploy our application?

This demo application is a simple web application, using Python 3.9, with a frontend and backend that displays a few of my favorite things. We will create Docker images and use a manifest file to deploy them to a Kubernetes cluster.

Backend

We’ll start by constructing a backend API that facilitates interaction with our frontend. The backend will be a simple API that will allow us to create and retrieve our favorite things. The backend will be built using FastAPI and will be packaged as a Docker image.

Our backend in the file backend.py will have two endpoints:

GET /things - returns a list of things
POST /things - creates a new thing

import fastapi
import pydantic

class Thing(pydantic.BaseModel):
    name: str

app = fastapi.FastAPI()

things: list[Thing] = [
    {"name": "raindrops on roses"},
    {"name": "whiskers on kittens"},
    {"name": "bright copper kettles"},
    {"name": "warm woolen mittens"},
]

@app.get("/things", response_model=list[Thing])
def get_things():
    return things

@app.post("/things", status_code=201)
def create_thing(thing: Thing):
    things.append(thing)
    return thing

Let’s add a Dockerfile backend.Dockerfile to build our backend image.

FROM python:3.9

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY backend.py .

EXPOSE 8000
ENTRYPOINT ["uvicorn", "backend:app", "--host", "0.0.0.0", "--port", "8000"]

Frontend

Let’s build a frontend that will interact with our backend API. Create a file called frontend.py that contains the following:

import fastapi
import fastapi.templating
import pydantic_settings
import requests

class Settings(pydantic_settings.BaseSettings):
    api_host: str = "localhost"
    api_port: int = 8000

settings = Settings()
app = fastapi.FastAPI()
templates = fastapi.templating.Jinja2Templates(directory="templates")

api_url = f"http://{settings.api_host}:{settings.api_port}"

@app.get("/")
def index(request: fastapi.Request):
    print(api_url)
    r = requests.get(f"{api_url}/things")
    r.raise_for_status()
    return templates.TemplateResponse("index.html", {"request": request, "things": r.json()})

@app.post("/new")
def new(request: fastapi.Request, name: str = fastapi.Form()):
    r = requests.post(f"{api_url}/things", json={"name": name})
    r.raise_for_status()
    return templates.TemplateResponse("new.html", {"request": request, "name": name})

We will also need to create a templates folder to hold our HTML files. Inside the templates folder, we will create two files, index.html and new.html, with the following content.

The Jinja template file for index.html







Favorite Things

 id="things">
{% for thing in things %}
    {{ thing.name }}
{% endfor %}



 hx-post="new" hx-target="#things" hx-swap="beforeend">
     type="text" name="name">
     type="submit">

The Jinja template file for new.html

{{ name }}

Let’s create a Dockerfile frontend.Dockerfile that we will use to build our frontend image.

FROM python:3.9

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY templates/ templates/
COPY frontend.py .

EXPOSE 8000
ENTRYPOINT ["uvicorn", "frontend:app", "--host", "0.0.0.0", "--port", "3000"]

Before we can build our images, we need to create a requirements.txt file to hold our python dependencies. The contents of the file will be as below:

annotated-types==0.6.0
anyio==4.2.0
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
fastapi==0.109.0
h11==0.14.0
idna==3.6
Jinja2==3.1.3
MarkupSafe==2.1.4
pydantic==2.5.3
pydantic-settings==2.1.0
pydantic_core==2.14.6
python-dotenv==1.0.1
python-multipart==0.0.6
requests==2.31.0
sniffio==1.3.0
starlette==0.35.1
typing_extensions==4.9.0
urllib3==2.1.0
uvicorn==0.27.0

We then create a manifest file demo.yaml that will hold our Kubernetes resources. The file will contain the following manifest files:

apiVersion: v1
kind: Namespace
metadata:
  name: demo

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - image: backend:1
        name: backend
        imagePullPolicy: Never

---

apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace: demo
spec:
  selector:
    app: backend
  ports:
  - port: 8000
    protocol: TCP
    targetPort: 8000

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - image: frontend:1
        name: frontend
        imagePullPolicy: Never
        env:
        - name: API_HOST
          value: backend

---

apiVersion: v1
kind: Service
metadata:
  name: frontend
  namespace: demo
spec:
  selector:
    app: frontend
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 3000

Notice imagePullPolicy is set to Never. This is because we are building our images locally. If we were to push our images to a registry, we would set this to Always or ifNotPresent.

The folder structure will look like this:

demo
  - templates
    - index.html
    - new.html
  - backend.Dockerfile
  - backend.py
  - demo.yaml
  - frontend.Dockerfile
  - frontend.py
  - requirements.txt

Now let’s build our Docker images and apply the manifest file.

Build and tag the docker image for the backend.

docker build . -f backend.Dockerfile -t backend:1

Build and tag the docker image for the frontend.

docker build . -f frontend.Dockerfile -t frontend:1

Now we can deploy our application to our cluster with kubectl apply.

kubectl apply -f demo.yaml

Then, we need to port-forward the frontend so that we can see it in our browser.

kubectl port-forward svc/frontend 3000:3000 --namespace demo

When you visit http://localhost:3000 in your browser, you should see the following page:

A screenshot of a webpage titled ‘Favorite Things’ with a list of four items: ‘raindrops on roses’, ‘whiskers on kittens’, ‘bright copper kettles’, and ‘warm woolen mittens’. Below the list is a ‘Submit’ button.”

Updating our application

Let’s say we want to update our application to a new version. We can do this by creating a new Docker image and updating our manifest to use the new image version.

Let’s update the index.html with a new title.

...
My Favorite Things
...

We then build and tag the frontend image version to 2:

docker build . -f frontend.Dockerfile -t frontend:2

Then, we update the frontend deployment in the manifest to use the new image version.

...
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - image: frontend:2
        name: frontend
        imagePullPolicy: Never
...

Finally, we can apply the changes to our cluster.

$ kubectl apply -f demo.yaml

Running kubectl get pods -n demo quickly enough will show that the frontend pod is being updated with a RollingUpdate rollout strategy.

$ kubectl get pods --namespace demo
NAME                        READY   STATUS              RESTARTS   AGE
backend-5df5d9554c-5zzp8    1/1     Running             0          2d22h
frontend-77b7dbdfff-vq79m   1/1     Running             0          26s
frontend-dc8bc69c4-pcc7g    0/1     ContainerCreating   0          1s

$ kubectl get pods --namespace demo
NAME                        READY   STATUS        RESTARTS   AGE
backend-5df5d9554c-5zzp8    1/1     Running       0          2d22h
frontend-dc8bc69c4-pcc7g    1/1     Running       0          2s
frontend-77b7dbdfff-vq79m   0/1     Terminating   0          27s

Once the update is complete, we can refresh our browser to see the new version of our application.

Scaling

Let’s say we want to scale our backend to 3 replicas. We can update our manifest to use 3 replicas and then apply the changes to our cluster.

...
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - image: backend:1
        name: backend
        imagePullPolicy: Never
...

Then, apply the changes to our cluster.


$ kubectl apply -f demo.yaml

Now we can check the status of our pods:

$ kubectl get pods --namespace=demo -o wide --selector=app=backend
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
backend-5df5d9554c-8v9pq   1/1     Running   3          41m   10.42.0.251   lima-rancher-desktop              
backend-5df5d9554c-ngctc   1/1     Running   3          41m   10.42.0.254   lima-rancher-desktop              
backend-5df5d9554c-n5mvm   1/1     Running   3          41m   10.42.0.252   lima-rancher-desktop              

In this post, we looked at how deployments can be used to manage pod replicas. We also saw how deployments and services can be used to expose our pods. Our demo application demonstrated how to deploy our own applications, and how to update them with zero downtime.

What Next?

Familiarize yourself with more workload types:

Other Concepts:

A Hands-On Tour of Kubernetes: Part 3 - Communication and Services

2024-03-18T00:00:00-05:00

Pod Communication

Our “applications” haven’t been too exciting so far. We’ve created some nginx pods and sent a few HTTP requests, but these pods aren’t talking to each other. Kubernetes complements a microservice architecture, but even if you follow a monolithic application design approach, we can anticipate there will be at least some communication across pods within our cluster.

To better understand how pods are able to communicate with each other, let’s start by creating a new namespace for ourselves.

$ kubectl create namespace telephone
namespace/telephone created

Next, we need a way to issue arbitrary HTTP requests from inside the cluster. We’ll create a “helper” pod which we’ll use to send our HTTP requests.

$ kubectl run caller --image=alpine:3.19 --namespace=telephone --command -- sleep infinite
pod/caller created

Note that we’re using Alpine for our container image. We’re also supplying the --command option, we haven’t seen that before. Without this option, the container will run using the ENTRYPOINT specified by the image. For our nginx pods, ENTRYPOINT provides the desired behavior (that is, run nginx), but the ENTRYPOINT for Alpine runs a shell. Since there is no standard input connected to the shell, the process will exit immediately. By using --command, we can specify a new entrypoint, which we set to a command that will run forever.

We’ll see momentarily why this is useful. First, let’s verify the pod is running.

$ kubectl get pods --namespace=telephone
NAME     READY   STATUS    RESTARTS   AGE
caller   1/1     Running   0          30s

Looks good. Next, we will use kubectl exec to send HTTP requests from within the container in this pod. This command is similar to docker exec – we specify a new process to run in the container, and the output will be shown in the terminal. Note that just like docker exec, we can only run commands that are available within the container image.

Here is how we can make a request to the Source Allies home page.

$ kubectl exec pod/caller --namespace=telephone -- wget -q -S https://www.sourceallies.com -O /dev/null
  HTTP/1.1 200 OK
  Content-Type: text/html
  Content-Length: 23195
  Connection: close
  x-amz-id-2: JbK9j2rVTyi6hcupIfeOkojTTifXPz0SGHdk88cnXkqZ6cr/DC0xInAW4iwD3esv866NLlsnrO0=
  x-amz-request-id: AY4NMYGY3KKVRSTT
  Date: Thu, 25 Jan 2024 19:40:53 GMT
  Last-Modified: Wed, 24 Jan 2024 13:51:07 GMT
  ETag: "7e835e07e20658bc5febfd483401fcae"
  x-amz-server-side-encryption: AES256
  Accept-Ranges: bytes
  Server: AmazonS3
  X-Cache: Miss from cloudfront
  Via: 1.1 ee0949c654b72e5ceb330e8b3e825e32.cloudfront.net (CloudFront)
  X-Amz-Cf-Pop: ORD53-C2
  X-Amz-Cf-Id: Bcs22oqFKYhHLStEuye7JiSzXGYmpXFrOJbMaZRRx6SJLF0sx3QGFg==

Let’s break down this command a bit:

kubectl exec pod/caller --namespace=telephone
- We’re specifying which pod we want to use to run our command
--
- This separates our kubectl exec options from the command to run in the container.
wget -q -S https://www.sourceallies.com -O /dev/null
- This is the command to run in the container.

And here is the meaning of the options provided to our wget command:

-q silences progress meters and other extraneous output
-S displays the response headers
-O /dev/null sends the body of the response to /dev/null (effectively discards the response body)

We can use the --stdin (-i) and --tty (-t) options of kubectl exec to run interactive programs from within a container. For example, we can run and connect to a shell running inside the container.

$ kubectl exec pod/caller --stdin --tty --namespace=telephone -- sh

/ # cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.0
PRETTY_NAME="Alpine Linux v3.19"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

/ # uname -a
Linux caller 6.1.64-0-virt #1-Alpine SMP Wed, 29 Nov 2023 18:56:40 +0000 aarch64 Linux

/ # whoami
root

/ # exit

Being able to run commands interactively from within your application container is extremely handy for debugging.

The previous request was to an external resource, but how do we reach things inside the cluster? To see that in action, we need to create another pod.

$ kubectl run receiver --image=nginx:1.24 --namespace=telephone
pod/receiver created

As always, let’s verify the new pod is running.

$ kubectl get pods --namespace=telephone
NAME       READY   STATUS    RESTARTS   AGE
caller     1/1     Running   0          74s
receiver   1/1     Running   0          8s

In Kubernetes, every pod receives its own IP address. We can ask kubectl get to show pod IP addresses by specifying the output format with --output (-o). In our case, we’ll use the wide output format.

$ kubectl get pods --output=wide --namespace=telephone
NAME       READY   STATUS    RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
caller     1/1     Running   0          90s   10.42.0.120   lima-rancher-desktop              
receiver   1/1     Running   0          24s   10.42.0.121   lima-rancher-desktop              

In addition to the IP addresses, the wide output format also shows us which node each pod is running on. Assuming you’re running Rancher Desktop or Docker Desktop as shown in the introductory blog post, you’ll see the same node for all your pods since we’re running a single node cluster.

Let’s try using the IP address of the receiver pod as the target for our wget command. Note that your IP addresses will likely be different, so update this command with the IP address that you see.

$ kubectl exec pod/caller --namespace=telephone -- wget -q -S 10.42.0.121 -O /dev/null
  HTTP/1.1 200 OK
  Server: nginx/1.24.0
  Date: Thu, 25 Jan 2024 19:57:56 GMT
  Content-Type: text/html
  Content-Length: 615
  Last-Modified: Tue, 11 Apr 2023 01:45:34 GMT
  Connection: close
  ETag: "6434bbbe-267"
  Accept-Ranges: bytes

Woah, it worked! The Server response header indicates that it was nginx that sent the response, but let’s check our receiver logs to be sure. We’ll use --tail in our kubectl logs command to grab the last five lines of output.

$ kubectl logs pod/receiver --tail=5 --namespace=telephone
2024/01/25 19:57:25 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/01/25 19:57:25 [notice] 1#1: start worker processes
2024/01/25 19:57:25 [notice] 1#1: start worker process 29
2024/01/25 19:57:25 [notice] 1#1: start worker process 30
10.42.0.120 - - [25/Jan/2024:19:57:56 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"

Sure enough, the last line shows that nginx received a request from 10.42.0.120, which is the IP address of our caller pod. (Again, your pod IP addresses will likely be different).

Before you start hard coding pod IP address into your application, let’s see what happens if we delete and recreate our receiver pod.

$ kubectl delete pod/receiver --namespace=telephone
pod "receiver" deleted

$ kubectl run receiver --image=nginx:1.24 --namespace=telephone
pod/receiver created

Alright, now let’s list our pod IP addresses again.

$ kubectl get pods --output=wide --namespace=telephone
NAME       READY   STATUS    RESTARTS   AGE     IP            NODE                   NOMINATED NODE   READINESS GATES
caller     1/1     Running   0          4m41s   10.42.0.120   lima-rancher-desktop              
receiver   1/1     Running   0          41s     10.42.0.122   lima-rancher-desktop              

Before, the IP address for receiver was ` 10.42.0.121, but now it is 10.42.0.122`. This brings us to a key aspect of the Kubernetes networking model: pod IP addresses are ephemeral.

When a pod is created, an IP address will be selected from a pool of unused IP addresses.
A pod will retain it’s IP address as long as it’s running.
When a pod is deleted, it’s IP address is put back into the pool of unused pod IP addresses.

So, hard coding pod IP addresses in your application is a pathway to madness. You have no guarantees on which IP addresses will be assigned to your pods. But if that’s the case, what hope do we have for building applications that rely on other pods if we don’t know their IP addresses?

In the next section, we’ll start looking at the DNS service provided by the cluster. This DNS service is what allows us to tame these ephemeral IPs.

Before moving on, let’s clean up the pods and namespace we’ve created.

$ kubectl delete namespace/telephone
namespace "telephone" deleted

kubectl supports several output options. We used the wide format earlier in this post to view pod IP addresses, but this format also includes other information such as pod age and number of container restarts. If we only wanted the pod names and IPs, we can use custom-columns to only show these columns.

$ kubectl get pods --output=custom-columns=NAME:.metadata.name,IP:.status.podIP --namespace=telephone 
NAME       IP
caller     10.42.0.120
receiver   10.42.0.121

Using custom-columns requires knowledge of the underlying API resource format, but it can be handy for generating automated reports.

If you want to perform additional transformations or filtering on the output of kubectl get (e.g. as part of a script), you may want to use the json or yaml output formats, which return the underlying API resource as JSON or YAML, respectively.

Services

As we saw at the end of the previous blog post, pod IP addresses are ephemeral. To avoid the toil of updating IP addresses in our applications as pods are created and destroyed, Kubernetes relies on a faithful protocol that helps power the Internet: DNS.

When we run a pod, Kubernetes adjusts the container DNS resolution configuration file (/etc/resolv.conf) to include the DNS server running inside the cluster. This DNS server automatically creates an A/AAAA record for every pod running in the cluster. The domain name uses the following format:

pod-ip-address.my-namespace.pod.cluster-domain.example

Sadly, as you can see, the pod IP address is part of the domain. Despite the existence of the DNS record, we’d still need to know the pod IP address if we want to reach it from another application. Drat!

Fortunately, Kubernetes provides a separate resource to facilitate service discovery: the aptly-named service. Here is how a service works:

When we create service, we include a label selector in the spec.
The service will look for pods in the same namespace as itself. Any pod whose labels match the label selector will be considered part of the service.
A service has its own IP address. Whenever a request is sent to the service IP address, the request will be routed to one of the pods in the service.

Essentially, a service functions as a cluster-internal load balancer for pods. Like pods, the cluster DNS server creates an A/AAAA record for every service. Here is the domain format:

my-svc.my-namespace.svc.cluster-domain.example

No IP address in this name! In most cases, we can shorten the domain to the following:

my-svc.my-namespace.svc

Despite the fact that service IP addresses are ephemeral, the domain name of a service is static. If we know the name and namespace of a service, we can connect to the corresponding application without worrying about the underlying IP addresses.

Let’s put together an example scenario so that we can see this behavior in action. To start, we’ll create a namespace for ourselves:

$ kubectl create namespace lake
namespace/lake created

Next, let’s look at an example service manifest:

apiVersion: v1
kind: Service
metadata:
  name: fish
  namespace: lake
spec:
  selector:
    role: fish
  ports:
  - name: http
    port: 80
    targetPort: 8080

This manifest specifies that any pods with the label role: fish in the lake namespace will be considered part of the fish service. The ports section specifies that requests received by the service on port 80 (port) will be forwarded to port 8080 on the pod (targetPort). Services only handle traffic on the specified ports, so there must be at least one entry in the ports list.

Let’s create this service using the manifest directly. As a reminder, here is how to create a resource with a manifest:

Save the manifest to a file.
Run kubectl apply -f .

An example with bash:

$ cat <service.yaml
apiVersion: v1
kind: Service
metadata:
  name: fish
  namespace: lake
spec:
  selector:
    role: fish
  ports:
  - name: http
    port: 80
    targetPort: 8080
EOF

$ kubectl apply -f service.yaml
service/fish created

Let’s verify the service exists:

$ kubectl get services --namespace=lake
NAME   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
fish   ClusterIP   10.43.108.184           80/TCP    30s

So far so good! We haven’t created any pods in this namespace yet (let alone pods with a matching label), so there are zero pods included in this service. We can list the pod endpoints of the service to verify:

$ kubectl get endpoints --namespace=lake
NAME   ENDPOINTS   AGE
fish         60s

As expected, the endpoints list is empty. Let’s start adding pods to our namespace, using the --labels (-l) option to specify a label on the pods. We’ll set the value to match the service label selector.

$ kubectl run fish-1 --image=jmalloc/echo-server:0.3.6 --labels=role=fish --namespace=lake
pod/fish-1 created

$ kubectl run fish-2 --image=jmalloc/echo-server:0.3.6 --labels=role=fish --namespace=lake
pod/fish-2 created

$ kubectl run fish-3 --image=jmalloc/echo-server:0.3.6 --labels=role=fish --namespace=lake
pod/fish-3 created

Let’s list our pods along with their IP addresses:

$ kubectl get pods --namespace=lake -o wide
NAME     READY   STATUS    RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
fish-1   1/1     Running   0          36s   10.42.0.195   lima-rancher-desktop              
fish-2   1/1     Running   0          31s   10.42.0.196   lima-rancher-desktop              
fish-3   1/1     Running   0          26s   10.42.0.197   lima-rancher-desktop              

And now, we’ll list the service endpoints again:

$ kubectl get endpoints --namespace=lake
NAME   ENDPOINTS                                            AGE
fish   10.42.0.195:8080,10.42.0.196:8080,10.42.0.197:8080   11m

Our service has picked up our pods! Note that the IP addresses listed match the pod IP addresses. Let’s create another pod that we can use to send HTTP requests inside the cluster.

$ kubectl run angler --image=alpine:3.19 --labels=role=angler --namespace=lake --command -- sleep infinite
pod/angler created

The label we specified does not match the label selector of the service, so this pod is not included in the service. Listing the service endpoints should show the same values as before:

$ kubectl get endpoints fish --namespace=lake                                                             
NAME   ENDPOINTS                                            AGE
fish   10.42.0.195:8080,10.42.0.196:8080,10.42.0.197:8080   11m

It’s time to make our first request:

$ kubectl exec pod/angler --namespace=lake -- wget -qO- fish.lake.svc
Request served by fish-3

HTTP/1.1 GET /

Host: fish.lake.svc
User-Agent: Wget
Connection: close

The fish-* pods are running an application that returns the details of the request along with the hostname of the pod. The hostname of a pod matches the name of the pod, and in this example, it was the fish-3 pod that received the request. Because the service does load balancing, you may see a different pod selected. In fact, if we keep sending requests, we’ll likely see different pods chosen:

$ kubectl exec pod/angler --namespace=lake -- wget -qO- fish.lake.svc
Request served by fish-2

HTTP/1.1 GET /

Host: fish.lake.svc
User-Agent: Wget
Connection: close

This time, it was fish-2 that received the request. Services are a critical resource in Kubernetes since services facilitate horizontal scaling of workloads. Pods can be added and removed and the service will adjust the endpoints accordingly. For example, let’s delete two of our pods then inspect the endpoints.

$ kubectl delete pod fish-2 fish-3 --namespace=lake
pod "fish-2" deleted
pod "fish-3" deleted

$ kubectl get endpoints --namespace=lake
NAME   ENDPOINTS          AGE
fish   10.42.0.195:8080   27m

There is just the one endpoint. If we pretend that the fish-* pods represent replicas of our application, we can start to see how we can scale our application in/out depending on load.

Manually creating and deleting the pod replicas is a bit tedious though. In the next blog post, we’ll look at another resource that will make it easier to manage pod replicas.

Let’s clean up before moving on:

$ kubectl delete namespace/lake

In this post, we looked at how pods communicate with each other. We saw that pod IP addresses are ephemeral, but we can use services to provide a stable domain name for our pods. In the next post, we’ll look at how to use Deployments to manage pod replicas and how to deploy our own applications.

A Hands-On Tour of Kubernetes: Part 2 - Namespaces and Labels

2024-03-04T00:00:00-06:00

Namespaces

We’ve only created one pod so far. Kubernetes wouldn’t be very special if we could only run one pod, so let’s try running multiple pods.

$ kubectl run app-1 --image=nginx:1.24
pod/app-1 created

$ kubectl run app-2 --image=nginx:1.24
pod/app-2 created

$ kubectl run app-3 --image=nginx:1.24
pod/app-3 created

It seems like Kubernetes was happy to create three pods. Let’s list our pods to verify.

$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
app-1   1/1     Running   0          83s
app-2   1/1     Running   0          66s
app-3   1/1     Running   0          17s

Consider that a given Kubernetes cluster may be used to run dozens of applications. What would this list look like if I created hundreds of pods?

Well, there’s no fancy truncation. This list would be very long. Fortunately, Kubernetes allows us to create namespaces so that we can group related resources together.

A namespace is like any other resource in Kubernetes, meaning we can use kubectl to list our namespaces.

$ kubectl get namespaces
NAME              STATUS   AGE
default           Active   8d
kube-system       Active   8d
kube-public       Active   8d
kube-node-lease   Active   8d

It looks like our local cluster already has more than one namespace!

The namespaces prefixed with kube- contain resources used by the cluster itself.
The default namespace is where new resources will go unless otherwise specified.

Since we haven’t been specifying a namespace in our kubectl commands, all the pods we’ve been creating have been added under the default namespace. Let’s go ahead and create a new namespace for ourselves.

$ kubectl create namespace app
namespace/app created

We can list our namespaces again to verify our new namespace exists.

$ kubectl get namespaces             
NAME              STATUS   AGE
default           Active   8d
kube-system       Active   8d
kube-public       Active   8d
kube-node-lease   Active   8d
app               Active   42s

There it is! Let’s now create a pod under this namespace.

$ kubectl run app-4 --image=nginx:1.24 --namespace=app
pod/app-4 created

Note the --namespace option at the end of the above command. --namespace can be added to most kubectl commands to specify which namespace to use. For example, here is how we can list our newly-created pod:

$ kubectl get pods --namespace=app
NAME    READY   STATUS    RESTARTS   AGE
app-4   1/1     Running   0          7s

Without --namespace we’d be listing the pods in the default namespace. In fact, removing --namespace is the same as explicitly setting the namespace to default.

$ kubectl get pods --namespace=default
NAME    READY   STATUS    RESTARTS   AGE
app-1   1/1     Running   0          19m
app-2   1/1     Running   0          19m
app-3   1/1     Running   0          18m

$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
app-1   1/1     Running   0          19m
app-2   1/1     Running   0          19m
app-3   1/1     Running   0          18m

Many kubectl flags and options provide a long form (e.g. --namespace) and a short form (-n). The short form requires fewer keystrokes, but I will use the long form in this series since it is more descriptive.

When introducing new options, I will show the short form (if one exists) alongside the long form, so feel free to use the version you prefer.

Resource names need to be unique in a given namespace, but different namespaces can have resources with identical names.

For example, since a pod called app-1 exists in the default namespace, we cannot create another pod called app-1 in the default namespace:

$ kubectl run app-1 --image=nginx:1.24
Error from server (AlreadyExists): pods "app-1" already exists

However, we can certainly create a pod called app-1 in the app namespace:

$ kubectl run app-1 --image=nginx:1.24 --namespace=app
pod/app-1 created

We’ll see this name duplication if we list list out the pods in both namespaces:

$ kubectl get pods --namespace=app
NAME    READY   STATUS    RESTARTS   AGE
app-4   1/1     Running   0          5m5s
app-1   1/1     Running   0          15s

$ kubectl get pods --namespace=default
NAME    READY   STATUS    RESTARTS   AGE
app-1   1/1     Running   0          24m
app-2   1/1     Running   0          24m
app-3   1/1     Running   0          23m

It’s important to emphasize that the app-1 pod in the default namespace and the app-1 pod in the app namespace are two different pods. They share nothing other than the name.

Beyond providing a convenient mechanism for organizing resources, namespaces are also central to Kubernetes’ RBAC model and for controlling resource allocation. We won’t go into detail about either of these topics, but these characteristics of namespaces are what facilitate the usage of multi-tenant clusters (multiple teams using the same cluster). However, even for single-tenant clusters, it’s common to create a namespace for every workload.

You can also set the default namespace with

$ kubectl config set-context --current --namespace

Deleting a namespace will automatically delete all resources in that namespace. Let’s delete our app namespace and see this in action.

$ kubectl delete namespace app
namespace "app" deleted

Recall that this namespace contained two pods. If we try to list the pods in this namespace, we’ll see that no such resources exist.

$ kubectl get pods --namespace=app
No resources found in app namespace.

Indeed, our namespace is gone entirely.

$ kubectl get namespaces
NAME              STATUS   AGE
default           Active   8d
kube-system       Active   8d
kube-public       Active   8d
kube-node-lease   Active   8d

Before moving on, let’s delete the other pods we created in the default namespace.

$ kubectl delete pod/app-1 pod/app-2 pod/app-3
pod "app-1" deleted
pod "app-2" deleted
pod "app-3" deleted

Labels

Namespaces are an important mechanism for grouping related resources, but they often aren’t sufficient for keeping things organized. What if we want to further segment resources in a namespace? What if we want to group together resources across namespaces?

In Kubernetes, labels allow us to tag resources with arbitrary key-value pairs. We can then query resources by their labels.

For example, let’s say we have three applications. These three applications are creatively named:

app-1
app-2
app-3

All three of these applications consist of:

a backend process that exposes a JSON API
a frontend process that serves HTML/CSS/JS and consumes the API

Futhermore, suppose these applications have the following visibility:

“app-1” and “app-2” are public, internet-facing applications
“app-3” is an internal-facing application for support

Let’s set up this fictional environment in our cluster and see how we might label these resources. To start, we’ll create some namespaces for ourselves.

$ kubectl create namespace app-1
namespace/app-1 created

$ kubectl create namespace app-2
namespace/app-2 created

$ kubectl create namespace app-3
namespace/app-3 created

Next, we’ll create the pods representing the backend and frontend processes for each application.

$ kubectl run app-1-backend --image=nginx:1.24 --namespace=app-1
pod/app-1-backend created

$ kubectl run app-1-frontend --image=nginx:1.24 --namespace=app-1
pod/app-1-frontend created

$ kubectl run app-2-backend --image=nginx:1.24 --namespace=app-2
pod/app-2-backend created

$ kubectl run app-2-frontend --image=nginx:1.24 --namespace=app-2
pod/app-2-frontend created

$ kubectl run app-3-backend --image=nginx:1.24 --namespace=app-3
pod/app-3-backend created

$ kubectl run app-3-frontend --image=nginx:1.24 --namespace=app-3
pod/app-3-frontend created

For good measure, we’ll list the pods in each namespace and verify everything is in the right spot.

$ kubectl get pods --namespace=app-1
NAME             READY   STATUS    RESTARTS   AGE
app-1-backend    1/1     Running   0          89s
app-1-frontend   1/1     Running   0          80s

$ kubectl get pods --namespace=app-2
NAME             READY   STATUS    RESTARTS   AGE
app-2-backend    1/1     Running   0          68s
app-2-frontend   1/1     Running   0          61s

$ kubectl get pods --namespace=app-3
NAME             READY   STATUS    RESTARTS   AGE
app-3-backend    1/1     Running   0          56s
app-3-frontend   1/1     Running   0          51s

Without any other changes, how can I list all the public-facing pods? Well, I can’t. I would already need to know the visibility of each application and run multiple kubectl commands.

However, we can remedy this situation by adding a visibility label to our pods. We can then query for pods by their visibility value.

Let’s start by adding the proper visibility label to one of our pods.

$ kubectl label pod/app-1-frontend visibility=public --namespace=app-1
pod/app-1-frontend labeled

kubectl reported our pod was labeled, but does anything look different if we list the pods?

$ kubectl get pods --namespace=app-1
NAME             READY   STATUS    RESTARTS   AGE
app-1-backend    1/1     Running   0          6m34s
app-1-frontend   1/1     Running   0          6m25s

The output looks the same as before, other than updated ages. By default, kubectl get doesn’t show resource labels. We can ask kubectl to show the value of a certain label by using the --label-columns (-L) option:

$ kubectl get pods --label-columns=visibility --namespace=app-1
NAME             READY   STATUS    RESTARTS   AGE   VISIBILITY
app-1-backend    1/1     Running   0          12m   
app-1-frontend   1/1     Running   0          12m   public

We now see a “visibility” column, along with the value of that label for both pods.

app-1-frontend shows the value “public”. This is what we set it to previously.
app-1-backend shows no value. We haven’t set this label on this pod.

If we don’t know what labels exist on our resources, we can use the --show-labels option to show all the labels that exist on a resource. The output formatting can get a little messy with this command if our resources have many labels, but it’s useful for exploration.

$ kubectl get pods --show-labels --namespace=app-1
NAME             READY   STATUS    RESTARTS   AGE   LABELS
app-1-backend    1/1     Running   0          14m   run=app-1-backend
app-1-frontend   1/1     Running   0          13m   run=app-1-frontend,visibility=public

So it seems our pods already have a run label! This label is added automatically when we use the kubectl run command to create pods. We won’t use this run label for anything, but be aware that certain actions will add labels to our resources.

Let’s go ahead and finish adding the correct visibility label to our other pods.

$ kubectl label pod/app-1-backend visibility=public --namespace=app-1
pod/app-1-backend labeled

$ kubectl label pod/app-2-backend pod/app-2-frontend visibility=public --namespace=app-2
pod/app-2-backend labeled
pod/app-2-frontend labeled

$ kubectl label pod/app-3-backend pod/app-3-frontend visibility=internal --namespace=app-3
pod/app-3-backend labeled
pod/app-3-frontend labeled

Alright, back to our original concern… how do we list all the pods that are public facing? Since our pods are now sufficiently labeled, we can use the --selector (-l) option to provide a label selector to our kubectl get command.

$ kubectl get pods --selector=visibility=public --namespace=app-1
NAME             READY   STATUS    RESTARTS   AGE
app-1-frontend   1/1     Running   0          21m
app-1-backend    1/1     Running   0          21m

Hm, that was disappointing. We only received two pods, but we expect to see four. The --namespace option is still narrowing the query to our app-1 namespace. If we want to query across all namespaces, do we remove it?

$ kubectl get pods --selector=visibility=public
No resources found in default namespace.

Nope! Remember, not specifying --namespace is the same as --namespace=default, so the previous command tried to list all pods matching the giving label selector in the default namespace.

For what we’re trying to accomplish, we need to use --all-namespaces (-A) option.

$ kubectl get pods --selector=visibility=public --all-namespaces
NAMESPACE   NAME             READY   STATUS    RESTARTS   AGE
app-1       app-1-frontend   1/1     Running   0          23m
app-1       app-1-backend    1/1     Running   0          23m
app-2       app-2-backend    1/1     Running   0          22m
app-2       app-2-frontend   1/1     Running   0          22m

Much better! None of the “app-3” pods are included in the output. We can list those “app-3” pods by altering the value of the visibility label in our label selector.

$ kubectl get pods --selector=visibility=internal --all-namespaces
NAMESPACE   NAME             READY   STATUS    RESTARTS   AGE
app-3       app-3-backend    1/1     Running   0          24m
app-3       app-3-frontend   1/1     Running   0          24m

This example shows how we can use labels for simple, ad-hoc analysis, but labels can also be consumed by automated processes to address things like resource auditing and cost analysis. Depending on your organization’s needs, tools like Kyverno and OPA Gatekeeper can enforce the usage of certain labels.

Notably, as we’ll see in upcoming material, labels are also used by other Kubernetes resources to configure their behavior.

If you need to delete a label for any reason, you can use the following command.

kubectl label / -

For example, to delete the visibility label from the app-1-frontend pod, we would run:

$ kubectl label pod/app-1-frontend visibility- --namespace=app-1
pod/app-1-frontend unlabeled

Before moving on, let’s delete the resources we’ve created. Remember that deleting a namespace automatically deletes all the resources under that namespace.

$ kubectl delete namespace/app-1 namespace/app-2 namespace/app-3
namespace "app-1" deleted
namespace "app-2" deleted
namespace "app-3" deleted

Imperative vs. Declarative

So far, we’ve been using kubectl to imperatively create our resources. This has been useful for getting our feet off the ground, but Kubernetes generally favors a declarative approach. Rather than telling Kubernetes how we want to create our resources, we tell Kubernetes what to create.

Kubernetes resources are typically declared as YAML manifests, although JSON also works. Let’s look at an example pod manifest.

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - name: example
    image: nginx:1.24

Alright, so there is a bit to unpack here. Let’s break down the meaning of these fields.

Together, apiVersion and kind specify the type of resource declared in this manifest. Every resource manifest includes these fields.
- In our example, we’re using apiVersion: v1, which represents the core API resources. Resources are versioned inside an API group.
- The kind refers to a specific resource type inside the API version. Since we want a pod, kind is set to Pod. Resource kinds use PascalCase.
The metadata object contains various fields that are common across resources.
- name: The name of the resource. Every resource needs a name.
- namespace: Which namespace this resource belongs in. If this is not specified, the default namespace is used.
- Although not shown in the example, metadata also contains fields to hold our labels, annotations, finalizers, and owner references, among others.
As the name implies, the spec object contains the resource specification. The resource type determines which fields are included under spec.
- Resources typically have a set of required fields alongside a set of optional fields.
- For a pod, we need to specify at least one container. Our example defines a single container with the name example that uses the image nginx:1.24. Note that the container name doesn’t necessarily need to match the pod name, although pods created using kubectl run set the pod and container name to the same value.
- Many other fields can be set in a pod spec, but they are not included in the example because they are either not required or they assume a default value if unspecified.

Right now, this manifest is just words on a website. Somehow, we need to send this manifest to our cluster. Here is how we do that:

Copy the contents of the manifest into a new file.
Send the manifest to Kubernetes using kubectl apply -f .

Here is how that might look with bash:

$ cat <pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - name: example
    image: nginx:1.24
EOF

$ kubectl apply -f pod.yaml 
pod/example created

And the same, but with PowerShell:

$ @" 
apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - name: example
    image: nginx:1.24
"@ > pod.yaml

$ kubectl apply -f pod.yaml
pod/example created

For comparison, here is how we’d create the same pod using imperative commands:

$ kubectl run example --image=nginx:1.24 --namespace=default

On the surface, the imperative approach seems much simpler. Why go through the effort of creating a file and composing a resource manifest when we’re going to run a kubectl command anyway?

Despite the verbosity, the declarative approach has a couple big advantages over imperative resource creation:

Repeatability: When deploying your resources with a series of imperative commands (e.g. in a script), there is an implicit assumption that the environment doesn’t change between runs. For certain environments this may be a safe assumption, but for “living” environments such as many production systems, this constraint is rarely satisfied. The declarative approach embraces a live environment – rather than trying to push our desires into the environment, we’ll change the environment to satisfy our desires.
Maintainability: Applications change. New components are added to the system. Implementing these changes imperatively involves an ever-evolving set of scripts and/or continuous deployment pipelines. In contrast, the declarative approach has a simple deployment model: point kubectl to the directory containing your manifests and it will ship the resources to the cluster. Should those manifests change, Kubernetes will respond appropriately.

The Kubernetes documentation provides some guidance on when to use which technique (imperative vs. declarative), but for brevity, we’ll continue using imperative commands for the remainder of these posts.

Tip: We can append --dry-run=client -o yaml to our imperative commands to view the manifest of the underlying resource being created.

$ kubectl run example --image=nginx:1.24 --namespace=default --dry-run=client -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: example
  name: example
  namespace: default
spec:
  containers:
  - image: nginx:1.24
    name: example
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

And if you prefer JSON, you can use -o json instead of -o yaml.

Deleting declarative resources is a straightforward process with kubectl using the command:

kubectl delete -f filename.yml

When you run this command, kubectl will delete the resources specified in filename.yml. However, keep in mind that any resources that are not specified in the file will not be deleted.

In this post, we learned about using namespaces, labels, and manifest files to organize and manage our resources. In the next post, we’ll look at how pods communicate with each other and the outside world.

A Hands-On Tour of Kubernetes: Part 1 - Introduction

2024-02-20T00:00:00-06:00

Introduction

Kubernetes is a divisive topic in the world of software development. There seems to be an ardent following of both promoters and detractors.

For some, Kubernetes is a herald to the upcoming golden age of cloud native software. We are on the cusp of reveling in workloads and infrastructure that are self-healing and always-available. We’ll no longer think in terms of “iteration cycles” since we’ll be delivering a constant stream of value into the world.

For others, Kubernetes is the tangled manifestation of an industry driven by hype and complexity. We now toil away on the inconsequential mess we’ve created for ourselves, rather than solving problems faced by businesses in the real world. We’ve prioritized job security over simplicity.

Perhaps it’s best to step away from the emotion and look at the numbers. According to the 2022 CNCF Annual Survey, nearly half of all organizations using containers run Kubernetes to deploy and manage at least some of those containers. Worldwide, there are 5.6 million developers using Kubernetes today.

And yet, I know there are many still wondering: what is Kubernetes?

The official kubernetes.io website describes Kubernetes as follows:

Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.

I remember reading this sentence several years ago before I knew anything about Kubernetes. On the surface, this description did nothing to help me understand what is was. I didn’t wake up like Neo from a training program, with all the kubectl commands suddenly at my disposal.

Rather, I continued on to the documentation page where I was met with a buffet of information. I remember feeling lost in the sea of concepts and cross-referenced materials. I was on a boat without a skipper, and the further I dove into the material, the more I felt like Kubernetes was beyond my capability of understanding.

Eventually, I realized I felt lost because my learning style just didn’t align with how the official docs are organized. The official docs are an amazing reference for Kubernetes, but the longer I go before I convert reading into doing, the more likely I am to forget the material.

This blog post, and the ones that follow, aim to provide that hands-on, “hello world” introduction to Kubernetes for others like me. Rather than serving as an exhaustive reference, these posts focus on the basics of Kubernetes by running commands against a real cluster. By the end, you should feel more familiar with some of the Kubernetes terminology and underlying mechanics. Consider these posts the “context onramp” into the official docs.

In the interest of keeping these posts concise, I’ve made a few assumptions about the things you already know:

You are familiar with Docker/containers.
You are comfortable with running commands from the command line.

However, you do not need to be an expert in either of these things. We won’t be building our own container images, and we won’t be piping together long sequences of sed or awk. Mostly, we’ll be running a single command, inspecting the output, then showing what that information means.

And since we’re going to learn by doing, it seems appropriate to start by running Kubernetes locally.

Running Kubernetes Locally

Kubernetes is not a single piece of software; it is a set of software components working together.

It is possible to run Kubernetes by manually installing and configuring each of these components. There are guides that explain how to install Kubernetes from scratch, but this is a rather tedious endeavor. There are tools (such as kubeadm) that automate many of necessary tasks, but even these tools expect a level of administration knowledge that are beyond the scope of these blog posts.

In a production setting, you will likely use the hosted Kubernetes service provided by your favorite cloud provider. Examples include:

If you are managing your own data center, you may choose to use one of the numerous, commercially-backed Kubernetes distributions, such as:

There are a lot of different Kubernetes offerings out there. These offerings simplify many of the administration tasks while offering their own unique functionality and integration opportunities, but most of these options are too heavyweight for our current needs. We need a local cluster that we can start and stop with ease.

Luckily for us, there are two freely* available pieces of software that provide a single-click installation of Kubernetes for the desktop:

The sections below describe the setup process for these two pieces of software. Note that both options support Windows, macOS, and Linux.

* Docker Desktop is free for personal use, as explained on the pricing page.

Docker Desktop

For many, “Docker” is synonymous with “container”. You may already have it available on you machine. After installing Docker Desktop, you can enable Kubernetes by following the official instructions.

Rancher Desktop

Rancher Desktop is an open-source project (GitHub repo) that aims to bring Kubernetes to the desktop. When installing Rancher Desktop, you have a choice of using containerd or dockerd as the container runtime. We will be using dockerd.

Verify Your Setup

kubectl is THE command line tool for interacting with Kubernetes. Before continuing, let’s make sure kubectl is installed and available within our PATH. From your system’s command prompt, run:

$ kubectl config current-context

You should see a single line of output whose value depends on how you are running Kubernetes.

If you are using Docker Desktop, you should see something like docker-desktop
If you are using Rancher Desktop, you should see something like rancher-desktop

This output tells us that kubectl is working and pointing to our local Kubernetes instance. If the output of the above command shows a different value, please review the installation directions for Docker Desktop / Rancher Desktop and make sure the chosen software is running.

Note: If you are using Rancher Desktop, when you first use kubectl you may see some extra messages like this:

$ kubectl config current-context
kubectl config current-context
I0509 15:57:19.646691   13564 versioner.go:58] invalid configuration: no configuration has been provided
I0509 15:57:19.724839   13564 versioner.go:64] No local kubectl binary found, fetching latest stable release version
I0509 15:57:19.993116   13564 versioner.go:84] Right kubectl missing, downloading version 1.24.0
Downloading https://storage.googleapis.com/kubernetes-release/release/v1.24.0/bin/darwin/amd64/kubectl
...

This is expected, as Rancher Desktop may delay the installation of kubectl until it is first used.

Assuming the previous command matches the suggested output, we can now verify that our local Kubernetes instance is running.

$ kubectl get nodes

The output of this command will also differ depending on how you’re running Kubernetes.

Docker Desktop:

$ kubectl get nodes
NAME             STATUS   ROLES           AGE     VERSION
docker-desktop   Ready    control-plane   2m19s   v1.28.2

Rancher Desktop:

$ kubectl get nodes
NAME                   STATUS   ROLES                  AGE   VERSION
lima-rancher-desktop   Ready    control-plane,master   34d   v1.28.5+k3s1

Note: the numbers in the VERSION column may be different for you.

If your output looks similar, then congrats! You are successfully running Kubernetes locally.

Basic Concepts

Alright, I lied a little. Before we dive further into kubectl commands, I’d like to introduce some initial Kubernetes concepts. These blog posts will absolutely be hands-on, but having the context for our actions will help with our understanding.

Kubernetes can do a lot of things, but I think understanding its purpose can best be summarized as a dialog.

Me: Hey Kubernetes, here are some machines for running applications. Let’s call them “nodes”.

Kubernetes: Sounds good. Terminology noted.

Me: Perfect. Also, here are my applications. They are packaged as containers.

Kubernetes: Looks good to me. What do you want me to do with this information?

Me: I’d like to run the containers on these nodes, and I want you to figure out how to make it work.

Kubernetes: I’m on it!

Ultimately, Kubernetes exists to help us run our containerized applications across a set of machines. These machines carry compute (CPU and memory) alongside storage and networking, and we’re not interested in specifying every detail to allow our applications to run and communicate. Instead, we’d rather tell Kubernetes our desired outcome and let Kubernetes figure out where things belong.

Nodes

As suggested in the above dialog, the machines that are available to run our containers are called nodes, or sometimes “worker nodes”. A node can be either a bare-metal server or a virtual machine, but either way, the nodes run software that allows them to reach each other and run containerized workloads.

The nodes are registered to the control plane, which tracks the state of the cluster. The control plane itself is some software running on one or more machines. The software running on the control plane determines which containers run on which nodes.

How you set up all these machines can vary quite a bit:

We can run our control plane on a single machine and have multiple worker nodes connected.
We can roll with a highly available (HA) cluster which replicates the control plane components across multiple machines. Most managed Kubernetes offerings (such as EKS, AKS, and GKE) follow this topology.
We can use a single machine to simultaneously host the control plane and function as a worker node. This is the setup we’re using for our local cluster.

Regardless of how many machines we’re using, the interface to Kubernetes remains the same. From the perspective of the application developer, there is little difference between a single-node cluster and a 15,000-node cluster apart from the redundancy and compute resources available. No functionality becomes “unlocked” after you’ve added your eighth node to the cluster, for example.

Kubernetes API

The control plane hosts an API server and a database. Similar to other APIs, we send HTTP requests to the API server to manipulate the resources in the database.

However, these API resources don’t represent things like “customer”, “cart”, or “invoice”. Rather, these API resources represent things like “a running container”, “a network connection”, or “application storage”. While we could interact with Kubernetes entirely through its API endpoints, the kubectl tool provides a friendly wrapper around all these HTTP requests.

Only the API server connects to the database. All other components read and update the state of the cluster through the API server. This design leads to a key aspect of Kubernetes: the desired state of the cluster is entirely contained in the database. In fact, we can take a snapshot of the database and use it to restore the cluster if there was a critical failure.

Other software components (called controllers) compare the desired state of the cluster with the actual state. If the actual state doesn’t match the desired state, these controllers perform the necessary actions to bring the actual state closer to the desired state.

Unless you are a cluster administrator, you don’t need to worry about the detailed workings of these various components. However, I think some awareness of what’s happening behind the scenes helps demystify Kubernetes. With the initial concepts out of the way, the remainder of these blog post will focus on these different API resources and how we use them to run our applications.

Pods

We’re going to start with the most fundamental building block in Kubernetes, the pod. Put simply, a pod is collection of one or more containers along with an execution environment. For simplicity, the pods we’ll create in this series will only have a single container.

Let’s get our hands on the keyboard and run our first pod.

$ kubectl run my-first-pod --image=nginx:1.24
pod/my-first-pod created

If you’re familiar with running containers with docker, this command will look similar to you.

After kubectl run we specify the name of our pod, which is “my-first-pod”.
We use the --image option to specify which image to use for the single container in our pod. In this case, we are using an nginx image from Docker Hub.

Let’s now verify the pod is running.

$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
my-first-pod   1/1     Running   0          38s

The output of the previous command indicates that our pod, which contains a single container, is ready (1/1) and currently running. The ready (1/1) status signifies that the pod has one container, and this container is prepared to accept traffic. Additionally, the restart counter displays zero restarts, and the age of the pod is also provided.

Assuming your pod is also running with zero restarts, then congrats! You have successfully run nginx in your local Kubernetes cluster!

The nginx image we’re using binds to port 80 by default. Let’s try to send an HTTP request to our nginx instance:

$ curl http://localhost:80                   
curl: (7) Failed to connect to localhost port 80 after 0 ms: Couldn't connect to server

Unless you have another service listening on port 80 on your machine, you should see a “Couldn’t connect to server” error. Similar to Docker, containers running in Kubernetes have their own (virtual) network interfaces. We’ll look at cluster networking some more later on, but for now, let’s reassure the skeptic that nginx is running by first setting up port forwarding between our localhost and the pod.

$ kubectl port-forward pod/my-first-pod 8000:80
Forwarding from 127.0.0.1:8000 -> 80
Forwarding from [::1]:8000 -> 80

Requests to localhost:8000 will now be forwarded to port 80 of our pod. In a separate terminal, let’s try sending our HTTP request again.

$ curl http://localhost:8000



Welcome to nginx!



Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.

For online documentation and support please refer to
nginx.org.

Commercial support is available at
nginx.com.

Thank you for using nginx.

Ta-da! We received the default nginx response. We can actually inspect the nginx logs and verify it wasn’t some other instance of nginx that someone sneakily ran on our machine.

$ kubectl logs pod/my-first-pod
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2024/01/24 22:24:49 [notice] 1#1: using the "epoll" event method
2024/01/24 22:24:49 [notice] 1#1: nginx/1.24.0
2024/01/24 22:24:49 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
2024/01/24 22:24:49 [notice] 1#1: OS: Linux 6.1.64-0-virt
2024/01/24 22:24:49 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/01/24 22:24:49 [notice] 1#1: start worker processes
2024/01/24 22:24:49 [notice] 1#1: start worker process 29
2024/01/24 22:24:49 [notice] 1#1: start worker process 30
127.0.0.1 - - [24/Jan/2024:22:25:46 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"

The final log message shows our HTTP request. Satisfied with these results, we can now hit “ctrl-c” in the terminal running our port forward command to stop the port forwarding.

In fact, we no longer need this pod. Let’s delete it.

$ kubectl delete pod my-first-pod
pod "my-first-pod" deleted

We can get the pods again to verify it’s gone:

$ kubectl get pods
No resources found in default namespace.

The keen reader may notice some similarities with these commands. For the most part, kubectl commands use the following pattern:

kubectl

verb indicates the desired action.
type refers to the resource type. So far we’ve only talked about pods, but we’ll see more resource types soon.
name is the name of the resource.

The kubectl documentation further explains the command syntax and all the possible actions.

I don’t think I’d be too much of a reductionist to say that Kubernetes is all about pod manipulation. Given one or more containers, Kubernetes will help me configure the storage, networking, lifecycle, organization, and security of those containers.

With this in mind, everything we learn from here is about giving ourselves more tools to manipulate our pods. In the next post, we’ll look at how we can use namespaces and labels to organize our objects.

Zero to CUDA: Calling a GPU From Golang

2024-02-05T00:00:00-06:00

If you’re here, you’ve probably identified a serious performance bottleneck in your code, and need a way around it. That’s where my team was about a year ago. Our application is a large raster calculator that either renders the raster into a picture, or summarizes the dataset represented by the raster depending on what request the user has made. Additionally, we can’t pre-compute the results as there is a rather complex JSON-based Domain Specific Language (DSL) that lets the user describe, at a granular level, what dataset they want us to render or summarize. All of this has to be done in real-time during an API request. Ultimately we decided to parallelize the calculations with CUDA because during testing it performed significantly better than the other options with our test datasets.

Ultimately, shelling out to CUDA was significantly easier than I expected. It came down to 4 steps.

Write CUDA kernel
Wrap CUDA kernel in a C function
Package into shared library
Call C function from cgo

For those of you that want to play along at home, a working minimal repo with all of the code samples I’m using in this post can be found here

Terminology

Before we begin, I’d like to take a moment and clarify the terminology I’m going to be using for this post.

host: refers to the computer the GPU is attached to
device: the GPU itself kernel: function executed in parallel on the GPU

CUDA Basics

Footguns

Beyond the fact that you’re working in C++ (or C or Fortran), there’s really only two jarring things about working with CUDA (in the easy case anyway).

You’re now working with two address spaces
Some functions are only callable from either the host side or device side. Your functions get colored unless you specifically annotate them to tell the compiler to make them available on both sides. Variables share a similar fate.

Ok, so what does any of that mean?

Well, the first one means that both the host and device have their own address space. You can have pointers to either address space, but dereferencing them on the wrong side will, most likely, cause your program to abort.

The second means that you have to tell the nvidia compiler (nvcc) where your function is callable from and where variables will be referenced from. There are four options.

Functions with no attributes, or with the __host__ attribute are callable from the host only. This includes basically all of the C++ standard library, along with STL data structure methods.
Functions with the __global__ attribute (kernels) exist on the device, and are callable from either the host or the device. These functions operate in parallel.
Functions marked with the __device__ attribute exist on the device and are only callable from the device.
Functions with both the __host__ and __device__ attributes exist on both sides and are callable from either side. Unlike __global__ functions, these are more like the traditional serial functions we know and love.

Writing the kernel

Ok, with all of that out of the way, it’s time to write some code. We need one of those fancy __global__ functions to run in parallel on the device. Here’s an example of one that adds one number in an array to another number.

__global__ void add_kernel(double *a, double *b, size_t len) {
    size_t i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < len) {
        a[i] = a[i] + b[i];
    }
}

Wait, what are those magical blockIdx, blockDim, and threadIdx variables? Those are globals that the nvidia compiler makes available for you to figure out what unit of work you need to do. See, your kernel function is run in parallel in groups of threads known as a “block”. A block can contain up to 1024 threads. blockIdx is which block you’re running on, blockDim is the size of the block, and threadIdx is which thread this is. Using those three numbers, we can figure out which element of the array we’re supposed to be operating on. We’re accessing the single dimension we have located at the x property, but you can have more dimensions (populated under y and z properties), but their use is out of scope for this article.

An additional important note, we need to compare our index to the length of the arrays we’re working on. A whole number of blocks will run, so if your array isn’t an exact multiple of the block size, you could end up overwriting memory you didn’t intend to.

Calling the kernel

Now that we have the kernel, we need to call it. We can’t quite call it like a normal function. You remember those magical blockIdx and blockDim variables? When we call the kernel, we need to tell the compiler how big of blocks to use, as well as how many blocks we want to run. To do this, CUDA uses syntax like this:

add_kernel<<<num_blocks, block_size>>>(dev_a, dev_b, len);

I’ve uses integers for num_block and block_size in my example code since we’re just working with arrays, but you can actually supply a struct of 3 integers called a dim3. Supplying this will populate the y and z dimensions of the block we talked about before.

Alright, so we’ve covered writing and calling the kernels. We’ve got to be just about done, right? Well, not quite. We still have to get the data we want to operate on onto the device itself. The general pattern when calling a kernel is:

allocate memory on the device
copy memory onto the device
call kernel
copy result back to the host from the device
free memory on device

You can see a full example of calling a kernel here.

Creating the Library

Creating the library is a relatively simple process. All you need to do is wrap your function in an extern "C" block so that go can call it. Technically, you could just put all of your code in the wrapper function itself, but this is cleaner to me, especially when you have many functions you’re trying to make available on the go side.

The only real choice here is between a shared or a static library (if you’re on linux, these will produce a .so or a .a file on respectively). A static library will be compiled into your go binary, and will be easier to deploy, but there may be legal issues with statically compiling in certain code, especially if you’re working on a corporate application. If you choose the shared library, the .so file must exist in your LD_LIBRARY_PATH at runtime.

Examples for setting both of these up using cmake exist here. An important note is that, if you choose the static library, you need to tell the compiler to resolve device symbols (set_target_properties(examplestatic PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)), otherwise you will get errors at link time.

Calling C From Golang

Ok, that’s the hard part done. Seriously, the rest is just normal cgo calls as if we weren’t using CUDA at all.

To call c code from cuda, we need a magic comment at the top of our go file (but after the package declaration) that’s the rough equivalent of a header file in C. It’s going to look something like:

package static

/*
#cgo LDFLAGS: -lexamplestatic -L${SRCDIR}/../example/build -lcuda -lcudart -lm
#include 
int add_wrapper(double *a, double *b, size_t len);
*/
import "C"

The first -l option will be the name of your library without the preceding lib and without the trailing .so or .a. The -L option will be a path to the directory the .so or .a file is in.

If you have errors building, make sure there isn’t a blank line between the comment and the import "C". This is an error.

All we have to do now is call the C function from go, I like to make a little wrapper function to return errors instead of C-style error codes. That would look a little like this:

func cudaAdd(a, b []float64) error {
	if res := C.add_wrapper((*C.double)(&a[0]), (*C.double)(&b[0]), C.size_t(len(a))); res != 0 {
		return fmt.Errorf("got bad error code from C.add %d", int(res))
	}
	return nil
}

All we’re doing is passing the address of the first element of each slice. You may have some concern about casting a pointer to a float64 to a pointer to a C double, as there aren’t a ton of guarantees about floating point format in C or C++, but nvidia adheres to the IEEE-754 floating point standard, which is the same standard go uses for its floating point numbers.

Bonus section: Testing

We’ve covered a lot, but I want to sneak in just one more topic. Writing raw CUDA (or C++ in general) can lead to tremendous performance gains, but working with CUDA can be unweildy and the exact semantics aren’t always obvious at a glance. Writing unit tests is imperative to have any confidence in your library. I’ve added an example of testing this code into my repo. Once you’ve wrapped the CUDA kernel in a C++ function, you can use any C/C++ testing library. I’ve opted for GoogleTest in this example project.

In order to provide a prod-like test environment, it is vital… nay, essential… that the tests suite in your CI/CD process use actual GPUs. Accept no substitute for true nvidia hardware. Without such hardware in your pipeline, subsequent commits will introduce bugs.

I think we’re finally done here. I hope I saved at least one of you several hours of debugging some error message you’re getting. See you next time.

5 Lessons in Product Delivery from Assembling a Murphy Bed

2024-01-17T00:00:00-06:00

The spare room in my house is big enough to hold a queen-sized bed and little else. After five years of being a seldom used guest bedroom, I convinced my husband that we could turn it into a home gym. He hesitantly agreed with one condition: we had to put a murphy bed in the room that guests could use. I decided that was a fair compromise, and a few days later several large boxes arrived at our doorstep that would one day become a hidden bed.

As we assembled the bed, I noticed several similarities between assembling furniture and delivering software.

Lesson 1: Just In Time Development

I was not actively working out at home when I had the idea to create a workout room. I was convinced the lack of dedicated space was the only reason. If I just had a place I could set up a yoga mat and store the random exercise accoutrement I had acquired over the years, surely I would reach my fitness goals.
I purchased our murphy bed as soon as I had the idea to create a workout room. As a result, the pile of boxes sat in my living room for months. They eventually made their way upstairs to the guest room where they sat for several more weeks. My problem was that I hadn’t found a routine that worked for me, not that I lacked space to work out. After several days of getting up early to work out in my bedroom and watching yoga videos on my bedroom TV, I decided I was committed to doing yoga regularly and it would be easier to do if we finally finished converting the guest room. At that point, I knew exactly what I needed the room to be and how I would use it.

There is an agile principle known as just in time development. The idea behind this principle is that business needs change rapidly and software design is a complicated process. Companies should avoid designing large complicated solutions prior to developing and instead focus on a simple, iterative development approach. Focus on immediate business needs and delivering value quickly. Businesses that tackle problems in non-optimal chunks run the risk of devoting resources (both time and money) to work that is not needed. An iterative approach leads to a higher quality product with more accurate requirements. Start simple and look for ways to quickly deliver value.

Lesson 2: Design Proper Architecture and Manage Technical Debt

The first step in assembling furniture is to unbox all the pieces. We were doing this in a small room, so we began throwing all the boxes and foam packing into the hallway. When we finished, we discovered that we needed some tools outside of the room. We poked our heads into the hallway and realized that we had boxed ourselves in with all the boxes and trash. We had to spend several minutes sorting through the rubble in our hallway to be able to get downstairs. It also took several weeks to get all the trash out of our house. Our recycling bin could only hold so much at one time. Luckily we had the space to store all the waste.

Lean architecture design focuses on reducing waste, improving cycle time, and increasing the value delivered to the customer. A little time spent up front to think about the best way to deliver a product can save a lot of time down the road. To quote a wise client, “We can save a few minutes of planning with several hours of development.” Don’t rush into a project without taking the time to understand your use case and map out your plan of attack or you might find yourself in the middle of a giant mess.

Lesson 3: Limit Work in Progress

Now that I knew how I was planning to use the room (as a yoga studio), I knew I needed a way to watch yoga videos. We purchased a TV for the new workout room, meaning there were two tasks standing in our way of having a functional workout space: assembling furniture and connecting a TV to our Google Home network.

We started the process with my husband setting up the TV while I unboxed all the pieces for the bed. This meant we were constantly getting in each other’s way and unable to discuss the best approach to each other’s task. I didn’t like the settings he chose for the TV. He didn’t know where I put any of the pieces that I unboxed. If we had both worked on the bed, there were many tasks that could have been shared or completed in parallel. We would have been more in sync about what was happening if we were both unboxing. One of us could have been reading the instructions to better sort the pieces as they were unboxed and we would have delivered the completed bed much faster.

It is tempting to start multiple projects at once. Working on multiple things in parallel does not make either get delivered faster and often causes development to slow down from the context switching. If you have two, 6 month long projects, would you rather have one of them in 6 months and the other at the end of the year or wait until the end of the year (or longer) to get both? Even within teams that are focusing on one priority, there is a temptation to take on multiple tasks at once or to have each teammate own a story. It is physically impossible to do two things at once, so no one should ever be working on more than one task. Consider strategies like paired programming where multiple developers collaborate on one user story. By having two people work on one story, you improve quality as well as minimize business risk. When two people are familiar with the decisions and trade-offs made on a story, the code is easier to maintain.

Lesson 4: Communication is Essential

There were multiple points in the process of assembling the bed where communication broke down. My husband had opened the package of screws and noticed there were 5 types of screws. When I sorted the screws, I combined two similar looking types together. We ended up using the wrong screws in several parts of the bed. Had I stopped to clarify how many screws I should be seeing, the quality of our finished product would have been higher.

We also learned the importance of using a common language and communicating clearly. Comments like “hand me that” or “move it toward the wall” weren’t clear. I found myself frequently asking “What do you want me to hand you?” and “Which wall? There are 4!” We had to pause and have a conversation about how to give each other better directions.

Teams often jump into projects without taking the time to do team forming exercises like building a charter. Taking a couple hours to talk about your working styles and communication preferences (especially how you like to receive feedback) can make a big difference on the success of the project. Make sure everyone on the team is clear about the project goals and understands what is expected of them. Holding regular team building sessions can also be beneficial to improve communication throughout the team.

Don’t forget to include your stakeholders when thinking about communication. Our household is home to three cats who were not prepared for the transition we were making. At one point our kitten panicked and darted through the trash filled hallway. Make sure everyone impacted by your change is aware of the impact to them. Give them a chance to ask questions and make any preparations needed.

Lesson 5: Avoid the Sunk Cost Bias

After the bed was assembled and the TV connected, I was finally able to start using the room for my daily yoga session! The room is just big enough to hold our workout equipment with enough space to stretch out my yoga mat. When I am in there, my husband doesn’t have room to use any of the weights or other workout equipment. We quickly realized that my office across the hall (which is about twice the size) would make a better workout room. We now need to haul all of the workout equipment and a very heavy murphy bed into a new room, so we can use the workout space at the same time.

It is important to identify a minimal viable product. What is the smallest piece of the project that would still deliver value? Complete that portion, get feedback from the end user, make adjustments, and then move onto the next slice of the project.

I had used the guest bedroom to work out a few times before we assembled the murphy bed but never with the intention of having someone else join me. Make sure you are thinking through how the product will actually be used. Keep your mind open to other options if something isn’t working. Avoid the sunk cost bias which leads you to continuing down a path you have invested in even once the cost outweighs the benefits. If I had kept an open mind early on and committed to testing my idea for working out at home iteratively, I would have saved my family a lot of work.

Whether you are assembling furniture or building enterprise software, applying better delivery practices can save both time and money. Effectively managing delivery leads to better prioritization, reduces risk, and maximizes value. Consider adopting these tips or working with one of Source Allies’ experienced portfolio managers to assess your delivery process for improvement opportunities.

Systems engineering using Machine Learning

2023-11-01T00:00:00-05:00

When you hear about AI/ML your thoughts most likely go to the uses that have been grabbing headlines of late. You may think of writers using ChatGPT to write something (not this article) or asking DALL-E 2 to create an image, or Netflix using ML to determine what movies to suggest to you.

What you may not hear as much about are the practical uses of Machine Learning to make equipment more useful and easier to maintain. These cases go beyond the trivial use of ML to make a creative act quicker and enable manufacturers the opportunity to solve some real problems with this new technology. These applications feature a tight integration of hardware, electronics, and software and require thoughtful system engineering to make sure that these different modes of development work closely together.

At Source Allies, we’ve helped manufacturing and agricultural companies apply system engineering approaches to help organizations use Machine Learning to increase the value they can provide with their concrete, real world products.

What is Machine Learning

Before sharing some examples of how software can make hardware more valuable, we should probably explain what we mean when we say machine learning and systems engineering. Machine learning is a less glitzy, but quite useful, subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed. It provides machines the ability to perform complex activities such as analyzing images and detecting objects.

These activities come in handy when you want a machine that does a repeated activity that should only be done in certain circumstances. You can now give the machine the ability to determine whether it should do something, such as spray herbicide on a weed, and when it shouldn’t. When designed and configured appropriately, machine learning removes the need for a person to oversee the repeated actions of a machine. This way, those actions are more predictable and the person involved can focus on activities that are not so easily turned over to machine learning.

What is Systems Engineering

According to NASA, systems engineering is a “methodical, multi-disciplinary approach for the design, realization, technical management, operations, and retirement of a system”. A system is the combination of elements (hardware, software, equipment, facilities, personnel, processes, and procedures) that function together to produce a particular outcome. For NASA, a system could be the New Horizons spacecraft. Back here on earth, systems could be a tractor, an agricultural sprayer, valves, or lighting.

One of the keys to successfully designing and delivering a system is making sure that all its elements work together in harmony. That’s where systems engineering comes into play. People most familiar with developing software would love to take the same iterative, incremental approach to developing an entire system. Unfortunately, the laws of physics can get in the way when electronic circuits and metal casings can’t be changed as quickly - or cheaply - as software can.

As a result, if you’re working on a complete system you need to be more intentional in how you design and build your solution. You need an interdisciplinary approach to product development that includes people skilled in all the different technologies involved in the system. You use systems engineering to translate the overall requirements to the requirements specific to each subsystem. You also build traceability between the overall system requirements and the subsystem requirements. From that point, the experts in each subsystem follow the appropriate design and development approach for their technology, all while continuing to coordinate from an overall systems perspective.

Source Allies has contributed several times to these types of efforts, both from an overall systems engineering perspective as well as on the software aspects of applying machine learning.

Solving real problems in the real world

Mobile development to improve global farming

For farmers, planting seed can be a complex process entailing planning and forecasting. Many agriculture companies invest in technology with hopes of improving overall annual yield. One of our Fortune 500 agriscience partners looked to Source Allies to help them improve the way farmers accessed critical planting data in real-time.

The original process from seed sale to planting and an evaluation of the crucial data was clunky. Farmers could not get access to planting recommendations with enough time to make modifications and have a real impact on overall yield.

We built a mobile application, Sync Service, that uses a small hardware component that allows farmers to wirelessly instruct their tractors how to plant our partners’ seed products, so they can achieve the highest yield.

From Foundation to IoT Innovation

We helped one of our large agriculture manufacturing clients develop and enhance their data-driven approach to monitor and improve its farming equipment’s quality and efficiency. We sourced machine data from displays, receivers, and a MTG system to track hours of operation, distinguishing between idle, working, and transport hours.

We incorporated acreage data to shed light on defect locations and causes. We also captured and ranked “cell strength” data on each machine to create a global map that showed signal strength levels.

This groundbreaking approach not only helps pinpoint areas with poor data reporting, such as Australia, but also reveals surprising gaps in supposedly well-covered regions like Wisconsin and Washington. Armed with this insight, our client strategized alternative methods to track machines, and ensure continued participation in software testing and data reporting.

Predicting when lights should be on or off

In-office work has grown more inconsistent due to remote work. We helped one of our clients, a lighting manufacturer, develop a model to determine when lights should be on or off depending on facility usage at a given time and in a given room.

We developed a system to look at the prior day’s events to determine when lights should be on or off in a particular room. We used that data to make predictions which are relayed back to the lights each hour necessary to implement the prediction of turning the lights off. A result of our work is a 20% reduction in energy costs.

Anomaly Detection with IOT Data

One of our trucking clients used odometer data to create preventative maintenance plans. If that data is incorrect, it can cause trucks to miss needed maintenance, leading to lost time and money. Conversely, trucks may get maintenance before they should, leading to unnecessary spending.

We worked with our client to build a solution that showcases how ML can be used to bring awareness to these abnormal odometer readings, allowing our partner to make decisions quicker in order to get preventative maintenance plans back on track.

Our team established a system to detect incorrect odometer data. The system relays those data points to subject matter experts who update the incorrect data, which leads to quicker correction of preventative maintenance plans. The result is hours saved and less unnecessary maintenance on trucks resulting in a cost savings of hundreds of thousands of dollars.

How Source Allies can help

There is a lot of hype about all the things you can do with AI. Many of the common mainstream examples of using AI yield questionable value providing solutions that may be in search of a problem. At the same time, many companies that operate in the physical world are quietly applying machine learning to solve real world problems. If you’re one of those companies, Source Allies can help you engineer your system to achieve your vision. Reach out to find out how.

My Apprentice Experience: Working With Many Teams…

2023-10-04T00:00:00-05:00

I joined Source Allies in May of 2023 as the Teammate Services Apprentice. This allowed me to learn from many different branches of the business in order to get the most out of my apprenticeship. I learned from recruiting, business development, account management, human resources, and marketing! My role was mainly focused on marketing and human resources where I was very hands-on.

I would like to share with you some advice for your internship/apprenticeship so you can get the most out of your experience.

Don’t be Afraid to Fail

Starting something new is scary, but no one expects you to be perfect when starting an internship. You are there to gain experience, so try something you have never done and grow from it. Mistakes are only mistakes when you do not learn from them.

Ask for Feedback

Sometimes we forget that we can encourage feedback. Previously, I avoided feedback because I was afraid of the confrontation. However, feedback is some of the best learning and growth you can provide for yourself.

I am usually more of a listener and have been put on so many teams that I struggled to find my place when I started my apprenticeship. So, I asked a teammate their thoughts on where I can help and become better. She gave me tons of advice from her perspective, and I now know which topics I need to speak up on and that people want to hear from me rather than just listen. Listening is one of my greatest strengths but I am continuing to work on providing my own insights along with listening.

We all have our individual views of ourselves, but we will never know the impact we can make unless we learn about ourselves from others. This is a huge plus through working with so many different people. They may all see you in different situations and you can learn from their views of you.

Utilize Your Teammates

Your teammates want to help you, so get to know them and secure those valuable connections. Plus, if you get to know someone on a personal level, it makes it easier to ask questions/advice. People are valuable resources; who knows, you may be able to help them with something as well!

Think Big Picture

Think about everything you are doing now and how it is helping you reach your goals. Whether that is something small or big, think about your “why.” I think about my “why” in terms of my motivation. What keeps you going when things get tough and what pushes you to become the better version of yourself?

Tackle a New Challenge

Even if you do not know where to start, try facing a new challenge. This goes hand in hand with not being afraid to fail. Try new things and use it as a learning opportunity to grow your expertise. My advice when tackling a new challenge is to research as much as you can about the topic. Once you have a general understanding, it is easier to ask questions and know where to take off.

I hope this provides some insight into an intern experience and how you can make the most of it! Every experience no matter how big or small at the beginning of your career is valuable, so be grateful for all of the opportunities.

If you have enjoyed hearing about Kamryn’s experiences and think Source Allies may be a good fit for you, please do not hesitate to apply!

Save Money by Scaling Off Hours

2023-09-06T00:00:00-05:00

Source Allies, like many organizations, has several AWS environments. In addition to production we also have a dev and a qual environment. An application deployed to all three environments will be running three copies of its infrastucture. If that architecture includes RDS databases, EC2 instances, ECS tasks, or other compute then we will be billed for each minute those services are running. Since our team isn’t using these environments unless we are actively testing things then this is a wasted expense that could account for two-thirds of a projects overall AWS spend.

Creating a scheduled job to stop these resources during off-hours isn’t a new idea. Generally this involves a Lambda that has a bit of code to make the appropriate AWS calls. Instead, since Step Functions has added support for calling almost any AWS API natively, we can leverage a State Machine to shut down our database. In Cloudformation it looks like this:

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Parameters:
  ScaleDownOffHours:
    Type: String
    Default: "false"
Conditions:
  ConfigureScaleDownOffHours: !Equals [ "true", !Ref ScaleDownOffHours ]
Resources:
  ...
  ScaleDownOffHoursStateMachine:
    Condition: ConfigureScaleDownOffHours
    Type: AWS::Serverless::StateMachine
    Properties:
      Definition:
        StartAt: ScaleDown
        States:
          ScaleDown:
            Type: Task
            Resource: "arn:aws:states:::aws-sdk:rds:stopDBCluster"
            Parameters:
              DbClusterIdentifier: !Ref DatabaseCluster
            End: true
  ...

We’re using a AWS::Serverless::StateMachine rather than a AWS::StepFunctions::StateMachine. This configuration leverages the serverless transform and inlines some additional requirements to get this to run on a schedule. First, we need to create an IAM Role that gives the State Machine permission to stop the database. We can do that by adding a Policies property to the resource and the Serverless transform will expand it into a full Role at deploy time:

  ScaleDownOffHoursStateMachine:
    Condition: ConfigureScaleDownOffHours
    Type: AWS::Serverless::StateMachine
    Properties:
      ...
      Policies:
        - Version: '2012-10-17'
            Statement:
            - Effect: Allow
                Action:
                - rds:StopDBCluster
                - rds:StartDBCluster
                Resource:
                - !GetAtt DatabaseCluster.DBClusterArn

We want to scale down every day at 5 PM Central. We can add an Events property and the transform will expand that into other resources. Those resources will kick off the statemachine on the appropriate schedule.

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Events:
      ScaleDown:
        Type: ScheduleV2
        Properties:
          ScheduleExpressionTimezone: America/Chicago
          ScheduleExpression: "cron(0 17 * * ? *)"

If we stop here, we have a single resource we can add to out template that is able to automatically shut down the database every day at 5PM. Additional states can be added to the state machine to stop other resources as well (such as an EC2 instance). One downside to this approach is that our enviroment is never started back up, we would have to do that manually. We can modify the definition of our state machine to actually start resources as well. Replace the Definition element with:

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Definition:
      StartAt: DetermineDirection
      States:
        DetermineDirection:
          Type: Choice
          Choices:
            - Variable: "$$.Execution.Input.source"
              StringEquals: aws.scheduler
              Next: ScaleDown
          Default: ScaleUp
        ScaleUp:
          Type: Task
          Resource: "arn:aws:states:::aws-sdk:rds:startDBCluster"
          Parameters:
            DbClusterIdentifier: !Ref DatabaseCluster
          End: true
        ScaleDown:
          Type: Task
          Resource: "arn:aws:states:::aws-sdk:rds:stopDBCluster"
          Parameters:
            DbClusterIdentifier: !Ref DatabaseCluster
          End: true

This definition will start the database if the state machine is not triggered by the scheduled event (such as manually). Let’s go even further by adding an event to start the database whenever we deploy a new version of our application:

ScaleDownOffHoursStateMachine:
  Condition: ConfigureScaleDownOffHours
  Type: AWS::Serverless::StateMachine
  Properties:
    ...
    Events:
      ...
      ScaleUp:
        Type: EventBridgeRule
        Properties:
          Pattern:
            source: [ "aws.cloudformation" ]
            account: [ !Ref AWS::AccountId ]
            detail-type: [ "CloudFormation Stack Status Change" ]
            detail:
              stack-id: [ !Ref AWS::StackId ]
              status-details:
                status: [ "UPDATE_IN_PROGRESS" ]

This event actually listens for the current stack to go into “UPDATE_IN_PROGRESS” state and starts the database in response. It isn’t a synchronous operation so it will still take a few moments before the application is usable.

This is just a sample of some of the ways to manage your non-production infrastructure. State machines are flexible enough that all sorts of innovative combinations can be supported. You could even setup a Wait state to automatically shut down things a certain amount of time after they are deployed. Take a look at the complete template on our Github repository.