Using testcontainers to manage containers for tests

Most applications rely upon other services such as databases and third party APIs. When we think about how we test the data access layer of our application which is responsible for interacting with databases or third party APIs it is useful to consider our test pyramid.

For interactions with a database, our unit tests are usually concerned with regression testing and locking in the last good database query that we know worked. For interactions with third party APIs, our unit tests are usually concerned with ensuring that our application behaves the way that we expect it to, assuming that the third party API abides by its API contract with our system.

When we look at integration tests that are only concerned with testing the integration between our application and one other service, there are a few considerations on how to implement this. We could have our application interact with a deployed instance of a database. But this tends to be flakey and error prone because multiple tests from different CI/CD pipelines running in parallel could cause test data corruption issues and the availability of staging environments is not guaranteed given that they can be used for testing changed. On the other hand, running our integration tests against a local running container of the database can be more reliable.

testcontainers offer a nice API for running containers in a number of languages. The main benefits of using testcontainers over docker compose are that:

You can start and stop containers within your test files in your preferred language
testcontainers handles automatically removing containers after they have been used
Containers are automatically assigned a new port, so you can run multiple test files in parallel and in isolation because each test file can spin up their own containers

Usage#

Installation#

npm i -D testcontainers

yarn add -D testcontainers

pnpm add -D testcontainers

ni -D testcontainers

Example service#

I will be using a service that interacts with DynamoDB as an example and all source code can be found here. In this case, createUser creates a User record in DynamoDB by sending a PutCommand with the partition key of email.

type User = {
  email: string;
};

export type UserDataAccessService = {
  createUser: (email: string) => Promise<User>;
};

export const createUserDataAccessService = ({
  tableName,
  dynamoClient,
}: {
  tableName: string;
  dynamoClient: DynamoClient;
}): UserDataAccessService => ({
  createUser: async (email) => {
    await dynamoClient.put({
      TableName: tableName,
      Item: {
        email,
      },
      ConditionExpression: 'attribute_not_exists(email)',
    });

    return {
      email,
    };
  },
});

Setting up tests#

To use testcontainers we need a docker image. For DynamoDB, AWS maintains the amazon/dynamodb-local image.

The first thing we want to do in our test file is to start up the container. We need to define the image that we want to use with GenericContainer and specify the port that the database exposes.

import { GenericContainer } from 'testcontainers';

beforeAll(async () => {
  const dynamoDbContainer = new GenericContainer(
    'amazon/dynamodb-local',
  ).withExposedPorts(8000);
});

Then, we can start and stop our container using the start and stop methods.

import { GenericContainer, type StartedTestContainer } from 'testcontainers';

let container: StartedTestContainer;

beforeAll(async () => {
  const dynamoDbContainer = new GenericContainer(
    'amazon/dynamodb-local',
  ).withExposedPorts(8000);
  container = await dynamoDbContainer.start();
});

afterAll(async () => {
  await container.stop();
});

The first time you run your test, it may take an unexpectedly long time because you don’t have the docker image stored locally and so testcontainers will be pulling the image.

docker pull amazon/dynamodb-local

The next step is to configure our DynamoDB client. Since, testcontainers will assign a random port to the running DynamoDB container, we will need to set the endpoint that the DynamoDB client should be talking to. We can determine the port that the container exposes by using hte getMappedPort method.
As a sidenote, I am using a simple wrapper on top of DynamoDBDocumentClient but this is not necessary.

Full code of the DynamoDB client wrapper

import {
  CreateTableCommand,
  type CreateTableCommandInput,
  type CreateTableCommandOutput,
  DynamoDBClient,
  type DynamoDBClientConfig,
} from '@aws-sdk/client-dynamodb';
import {
  DynamoDBDocumentClient,
  GetCommand,
  type GetCommandInput,
  type GetCommandOutput,
  PutCommand,
  type PutCommandInput,
  type PutCommandOutput,
  ScanCommand,
  type ScanCommandInput,
  type ScanCommandOutput,
} from '@aws-sdk/lib-dynamodb';

export type DynamoClient = {
  createTable: (
    input: CreateTableCommandInput,
  ) => Promise<CreateTableCommandOutput>;
  put: (input: PutCommandInput) => Promise<PutCommandOutput>;
  get: (input: GetCommandInput) => Promise<GetCommandOutput>;
  scan: (input: ScanCommandInput) => Promise<ScanCommandOutput>;
  documentClient: DynamoDBDocumentClient;
};

export const createDynamoClient = (
  config: DynamoDBClientConfig,
): DynamoClient => {
  const dynamoDbClient = new DynamoDBClient(config);
  const dynamoDbDocumentClient = DynamoDBDocumentClient.from(dynamoDbClient);

  return {
    createTable: (input) => dynamoDbClient.send(new CreateTableCommand(input)),
    put: (input) => dynamoDbDocumentClient.send(new PutCommand(input)),
    get: (input) => dynamoDbDocumentClient.send(new GetCommand(input)),
    scan: (input) => dynamoDbDocumentClient.send(new ScanCommand(input)),
    documentClient: dynamoDbDocumentClient,
  };
};

let dynamoClient: DynamoClient;

beforeAll(async () => {
  // ...
  container = await dynamoDbContainer.start();
  dynamoClient = createDynamoClient({
    endpoint: `http://${container.getHost()}:${container.getMappedPort(8000)}`,
    region: 'ap-southeast-2',
    credentials: {
      accessKeyId: 'dummy',
      secretAccessKey: 'dummy',
    },
  });
});

region and credentials are required otherwise your DynamoDBClient will throw errors.

The last part of setup is creating the tables that we need.

beforeAll(async () => {
  // ...
  await dynamoClient.createTable({
    TableName: usersTableName,
    KeySchema: [
      {
        AttributeName: 'email',
        KeyType: 'HASH',
      },
    ],
    AttributeDefinitions: [
      {
        AttributeName: 'email',
        AttributeType: 'S',
      },
    ],
    BillingMode: 'PAY_PER_REQUEST',
  });
});

From here, we should be able to run our tests. Ensure that whatever container runtime e.g. Docker Desktop is up and running and configured to work with testcontainers.

it('should create a new user record', async () => {
  await userDataAccessService.createUser('test@test.com');

  const createdUser = await userDataAccessService.getUser('test@test.com');

  expect(createdUser).toEqual<User>({
    email: 'test@test.com',
  });
});

Clearing the database#

While testcontainers is really good for running different test files in parallel and in isolation by spinning up different containers, the process of spinning up new containers is still too slow to spin up a new container for each individual test within a given test file. So you are usually better off clearing the data in your database between individual tests.

For example, I can clear all the data in my Users table after each test.

afterEach(async () => {
  await clearTable<User>({
    tableName: usersTableName,
    dynamoDbDocumentClient: dynamoClient.documentClient,
    keyAttributes: ['email'],
  });
});

Full code of clearTable function

import {
  BatchWriteCommand,
  type BatchWriteCommandInput,
  type DynamoDBDocumentClient,
  paginateScan,
} from '@aws-sdk/lib-dynamodb';
import { cluster } from 'radash';

const DYNAMODB_MAX_BATCH_WRITE_LIMIT = 25 as const;

type DeleteRequest = NonNullable<
  BatchWriteCommandInput['RequestItems']
>[number][number];

export const clearTable = async <Item extends Record<string, unknown>>({
  tableName,
  dynamoDbDocumentClient,
  keyAttributes,
}: {
  tableName: string;
  dynamoDbDocumentClient: DynamoDBDocumentClient;
  keyAttributes: (keyof Item)[];
}) => {
  const paginator = paginateScan(
    {
      client: dynamoDbDocumentClient,
    },
    {
      TableName: tableName,
      AttributesToGet: keyAttributes as string[],
    },
  );

  const itemsToDelete: DeleteRequest[] = [];
  for await (const page of paginator) {
    const deleteRequests = page.Items?.map(
      (item): DeleteRequest => ({
        DeleteRequest: {
          Key: item,
        },
      }),
    );
    itemsToDelete.push(...(deleteRequests ?? []));
  }

  // splits into groups of size DYNAMODB_MAX_BATCH_WRITE_LIMIT
  // https://radash-docs.vercel.app/docs/array/cluster
  const deletionPromises = cluster(
    itemsToDelete,
    DYNAMODB_MAX_BATCH_WRITE_LIMIT,
  ).map((chunkItemsToDelete) =>
    dynamoDbDocumentClient.send(
      new BatchWriteCommand({
        RequestItems: {
          [tableName]: chunkItemsToDelete,
        },
      }),
    ),
  );

  await Promise.all(deletionPromises);
};

Limitations#

Local development and debugging#

Unfortunately, testcontainers does not provide a nice way to run long living container instances for use cases like running your application locally for development and debugging why specific tests are failing with a database query client like DataGrip. This is because the containers expose a random port on startup and you cannot hardcode which port to expose.

The team recommends setting up a proxy for accessing the container in this GitHub issue and mentions this article for how to set one up for debugging purposes.

It may also be possible to extend the GenericContainer class and overwrite how it creates containers but I have not investigated this.

Ultimately, I found that maintaining a separate configuration for docker compose to run long lived containers for local development was easier.

Global database clients#

If you rely on globally instantiated database clients then testcontainers will be a nightmare to use. This is because testcontainers will automatically bind an available, random port. Usually, your database client will assume that the service is running on the same port or take it in as an environmental variable. By the time that you have started running a test file, your global database client has taken in the environment variables and so it is too late to overwrite the environmental variables with the port assigned by testcontainers. If you still want to figure out if it’s possible, you can check out this article which goes into detail how you may workaround this problem.

const dynamoDbClient = new DynamoDBClient();

export const globalDynamoDbDocumentClient =
  DynamoDBDocumentClient.from(dynamoDbClient);

import { globalDynamoDbDocumentClient } from './dynamoDbClient'

export const createUser = (email: string) => {
    await globalDynamoDbDocumentClient.send(new PutCommand({...}))
}

Duplication of infrastructure definitions#

Another problem is maintaining multiple definitions of your databases. When we spin up our dynamodb-local container, we still need to populate it with the right tables. Since we usually define our infrastructure using IaaC which cannot be used to define the tables in our local container, we usually need to maintain a separate dynamodb table definition for our tests. Maintaining duplicate table definitions can be error prone because the table definitions can get out of sync, but it may be worth it because table schemas like DynamoDB rarely change. Usually, the issue is when adding a new index and your application is relying on that index to perform certain queries.

If you have many applications which interact with the same database spread across different codebases, it may be worth publishing a new image of dynamodb-local containing the table definitions to ECR so that it can be easy to distribute and you do not need to maintain the process of populating your table definitions when starting the containers. For reference, you can take a look at this Buildkite plugin.