Using testcontainers to manage containers for tests
Most applications rely upon other services such as databases and third party APIs. When we think about how we test the data access layer of our application which is responsible for interacting with databases or third party APIs it is useful to consider our test pyramid.
For interactions with a database, our unit tests are usually concerned with regression testing and locking in the last good database query that we know worked. For interactions with third party APIs, our unit tests are usually concerned with ensuring that our application behaves the way that we expect it to, assuming that the third party API abides by its API contract with our system.
When we look at integration tests that are only concerned with testing the integration between our application and one other service, there are a few considerations on how to implement this. We could have our application interact with a deployed instance of a database. But this tends to be flakey and error prone because multiple tests from different CI/CD pipelines running in parallel could cause test data corruption issues and the availability of staging environments is not guaranteed given that they can be used for testing changed. On the other hand, running our integration tests against a local running container of the database can be more reliable.
testcontainers offer a nice API for running containers in a number
of languages. The main benefits of using testcontainers
over docker compose
are that:
- You can start and stop containers within your test files in your preferred language
testcontainers
handles automatically removing containers after they have been used- Containers are automatically assigned a new port, so you can run multiple test files in parallel and in isolation because each test file can spin up their own containers
Usage#
Installation#
npm i -D testcontainers
yarn add -D testcontainers
pnpm add -D testcontainers
ni -D testcontainers
Example service#
I will be using a service that interacts with DynamoDB
as an example and all source code can be found here. In this case, createUser
creates a User
record in DynamoDB by sending a PutCommand
with
the partition key of email
.
type User = { email: string;};
export type UserDataAccessService = { createUser: (email: string) => Promise<User>;};
export const createUserDataAccessService = ({ tableName, dynamoClient,}: { tableName: string; dynamoClient: DynamoClient;}): UserDataAccessService => ({ createUser: async (email) => { await dynamoClient.put({ TableName: tableName, Item: { email, }, ConditionExpression: 'attribute_not_exists(email)', });
return { email, }; },});
Setting up tests#
To use testcontainers
we need a docker image. For DynamoDB, AWS maintains the amazon/dynamodb-local image.
The first thing we want to do in our test file is to start up the container. We need to define the image that we want to use with GenericContainer and specify the port that the database exposes.
import { GenericContainer } from 'testcontainers';
beforeAll(async () => { const dynamoDbContainer = new GenericContainer( 'amazon/dynamodb-local', ).withExposedPorts(8000);});
Then, we can start and stop our container using the start
and stop
methods.
import { GenericContainer, type StartedTestContainer } from 'testcontainers';
let container: StartedTestContainer;
beforeAll(async () => { const dynamoDbContainer = new GenericContainer( 'amazon/dynamodb-local', ).withExposedPorts(8000); container = await dynamoDbContainer.start();});
afterAll(async () => { await container.stop();});
The first time you run your test, it may take an unexpectedly long time because you don’t have the docker image stored locally and so testcontainers
will be pulling the image.
docker pull amazon/dynamodb-local
The next step is to configure our DynamoDB
client. Since, testcontainers
will assign a random port to the
running DynamoDB
container, we will need to set the endpoint
that the DynamoDB
client should be talking to.
We can determine the port that the container exposes by using hte getMappedPort
method.
As a sidenote, I am using a simple wrapper on top
of DynamoDBDocumentClient
but this is not necessary.
Full code of the DynamoDB client wrapper
import { CreateTableCommand, type CreateTableCommandInput, type CreateTableCommandOutput, DynamoDBClient, type DynamoDBClientConfig,} from '@aws-sdk/client-dynamodb';import { DynamoDBDocumentClient, GetCommand, type GetCommandInput, type GetCommandOutput, PutCommand, type PutCommandInput, type PutCommandOutput, ScanCommand, type ScanCommandInput, type ScanCommandOutput,} from '@aws-sdk/lib-dynamodb';
export type DynamoClient = { createTable: ( input: CreateTableCommandInput, ) => Promise<CreateTableCommandOutput>; put: (input: PutCommandInput) => Promise<PutCommandOutput>; get: (input: GetCommandInput) => Promise<GetCommandOutput>; scan: (input: ScanCommandInput) => Promise<ScanCommandOutput>; documentClient: DynamoDBDocumentClient;};
export const createDynamoClient = ( config: DynamoDBClientConfig,): DynamoClient => { const dynamoDbClient = new DynamoDBClient(config); const dynamoDbDocumentClient = DynamoDBDocumentClient.from(dynamoDbClient);
return { createTable: (input) => dynamoDbClient.send(new CreateTableCommand(input)), put: (input) => dynamoDbDocumentClient.send(new PutCommand(input)), get: (input) => dynamoDbDocumentClient.send(new GetCommand(input)), scan: (input) => dynamoDbDocumentClient.send(new ScanCommand(input)), documentClient: dynamoDbDocumentClient, };};
let dynamoClient: DynamoClient;
beforeAll(async () => { // ... container = await dynamoDbContainer.start(); dynamoClient = createDynamoClient({ endpoint: `http://${container.getHost()}:${container.getMappedPort(8000)}`, region: 'ap-southeast-2', credentials: { accessKeyId: 'dummy', secretAccessKey: 'dummy', }, });});
region
and credentials
are required otherwise your DynamoDBClient
will throw errors.
The last part of setup is creating the tables that we need.
beforeAll(async () => { // ... await dynamoClient.createTable({ TableName: usersTableName, KeySchema: [ { AttributeName: 'email', KeyType: 'HASH', }, ], AttributeDefinitions: [ { AttributeName: 'email', AttributeType: 'S', }, ], BillingMode: 'PAY_PER_REQUEST', });});
From here, we should be able to run our tests. Ensure that whatever container runtime e.g. Docker Desktop is up
and running and configured to work with testcontainers
.
it('should create a new user record', async () => { await userDataAccessService.createUser('test@test.com');
const createdUser = await userDataAccessService.getUser('test@test.com');
expect(createdUser).toEqual<User>({ email: 'test@test.com', });});
Clearing the database#
While testcontainers
is really good for running different test files in parallel and in isolation by spinning up different containers,
the process of spinning up new containers is still too slow to spin up a new container for each individual test within a given test file.
So you are usually better off clearing the data in your database between individual tests.
For example, I can clear all the data in my Users
table after each test.
afterEach(async () => { await clearTable<User>({ tableName: usersTableName, dynamoDbDocumentClient: dynamoClient.documentClient, keyAttributes: ['email'], });});
Full code of clearTable
function
import { BatchWriteCommand, type BatchWriteCommandInput, type DynamoDBDocumentClient, paginateScan,} from '@aws-sdk/lib-dynamodb';import { cluster } from 'radash';
const DYNAMODB_MAX_BATCH_WRITE_LIMIT = 25 as const;
type DeleteRequest = NonNullable< BatchWriteCommandInput['RequestItems']>[number][number];
export const clearTable = async <Item extends Record<string, unknown>>({ tableName, dynamoDbDocumentClient, keyAttributes,}: { tableName: string; dynamoDbDocumentClient: DynamoDBDocumentClient; keyAttributes: (keyof Item)[];}) => { const paginator = paginateScan( { client: dynamoDbDocumentClient, }, { TableName: tableName, AttributesToGet: keyAttributes as string[], }, );
const itemsToDelete: DeleteRequest[] = []; for await (const page of paginator) { const deleteRequests = page.Items?.map( (item): DeleteRequest => ({ DeleteRequest: { Key: item, }, }), ); itemsToDelete.push(...(deleteRequests ?? [])); }
// splits into groups of size DYNAMODB_MAX_BATCH_WRITE_LIMIT // https://radash-docs.vercel.app/docs/array/cluster const deletionPromises = cluster( itemsToDelete, DYNAMODB_MAX_BATCH_WRITE_LIMIT, ).map((chunkItemsToDelete) => dynamoDbDocumentClient.send( new BatchWriteCommand({ RequestItems: { [tableName]: chunkItemsToDelete, }, }), ), );
await Promise.all(deletionPromises);};
Limitations#
Local development and debugging#
Unfortunately, testcontainers
does not provide a nice way to run long living container instances for use cases like running your application
locally for development and debugging why specific tests are failing with a database query client like DataGrip.
This is because the containers expose a random port on startup and you cannot hardcode which port to expose.
The team recommends setting up a proxy for accessing the container in this GitHub issue and mentions this article for how to set one up for debugging purposes.
It may also be possible to extend the GenericContainer
class and overwrite how it creates containers
but I have not investigated this.
Ultimately, I found that maintaining a separate configuration for docker compose
to run long lived containers for local development was
easier.
Global database clients#
If you rely on globally instantiated database clients then testcontainers
will be a nightmare to use. This is because
testcontainers
will automatically bind an available, random port. Usually, your database client will assume
that the service is running on the same port or take it in as an environmental variable. By the time that you
have started running a test file, your global database client has taken in the environment variables and so it is too
late to overwrite the environmental variables with the port assigned by testcontainers
. If you still want to figure out if it’s
possible, you can check out this article which goes into detail
how you may workaround this problem.
const dynamoDbClient = new DynamoDBClient();
export const globalDynamoDbDocumentClient = DynamoDBDocumentClient.from(dynamoDbClient);
import { globalDynamoDbDocumentClient } from './dynamoDbClient'
export const createUser = (email: string) => { await globalDynamoDbDocumentClient.send(new PutCommand({...}))}
Duplication of infrastructure definitions#
Another problem is maintaining multiple definitions of your databases. When we spin up our
dynamodb-local
container, we still need to populate it with the right tables. Since we usually
define our infrastructure using IaaC which cannot be used to define
the tables in our local container, we usually need to maintain a separate dynamodb table definition for
our tests. Maintaining duplicate table definitions can be error prone because the table definitions can get out of sync, but it may be worth
it because table schemas like DynamoDB rarely change. Usually, the issue is when adding a new index and your application is relying on that
index to perform certain queries.
If you have many applications which interact with the same database spread across different codebases, it may be worth publishing a new image
of dynamodb-local
containing the table definitions to ECR so that it can be easy to distribute and you do not need to maintain the process
of populating your table definitions when starting the containers. For reference, you can take a look at this Buildkite plugin.