Amazon Web Services

Large vEcoli workflows can be run cost-effectively on Amazon Web Services (AWS). This section covers setup starting from a fresh account, running workflows, and handling outputs.

Tip

In all instructions below, “navigate to X console” means to search for X in the AWS Console search bar and click on the corresponding service.

Fresh Account Setup

Note

This section about creating a fresh AWS account is only necessary if starting from scratch. If you have access to the Covert Lab GovCloud account, skip to Launch an EC2 Instance.

Create a new AWS account at AWS Sign Up. Once you have created your account, navigate to the AWS Console and familiarize yourself with the following services that vEcoli uses:

  • Batch: Manages compute resources for running workflow tasks

  • S3 (Simple Storage Service): Object storage for workflow outputs

  • EC2 (Elastic Compute Cloud): Virtual machines for running Nextflow and workflow tasks

  • ECR (Elastic Container Registry): Stores Docker images for workflow tasks

  • ECS (Elastic Container Service): Orchestrates Docker containers

  • CloudWatch: Monitors and logs AWS resources

  • IAM (Identity and Access Management): Manages access to AWS resources

The following sections will guide you through a minimal setup for running vEcoli workflows on a fresh AWS account. Users requiring more advanced configurations (e.g., custom VPCs, security groups, etc.) should refer to the AWS documentation for those services.

Installing and Configuring AWS CLI

Install the AWS CLI following the official documentation.

After installation, configure your AWS credentials:

aws configure

You will be prompted for:

  • AWS Access Key ID

  • AWS Secret Access Key

  • Default region name (e.g., us-west-2)

  • Default output format (json recommended)

Warning

Authentication to Stanford’s GovCloud account requires running aws configure sso with our custom start URL (contact admin). Set the profile name to default when prompted to avoid having to specify --profile <profile name> for every AWS CLI command.

Note

We strongly recommend choosing one specific region for all AWS resources to avoid unexpected cross-region data transfer costs.

Setting Up Required Services

Create VPC

Navigate to the VPC console and create a new VPC.

  1. Under “Resources to create”, select “VPC and more”.

  2. Under “Name tag auto-generation”, choose a name. The generated VPC will be called <your name>-vpc.

  3. We recommend choosing the maximum number of availability zones in your chosen region to maximize resource availability.

  4. Change the number of private subnets to 0.

  5. Set VPC endpoints to None.

  6. Leave other settings as default.

Create IAM Roles

vEcoli workflows require specific permissions. We recommend creating two IAM roles:

  1. Nextflow: For head VMs that run the Nextflow workflow manager

  2. Batch: For AWS Batch jobs that run workflow tasks

Before creating the Nextflow role, first create an IAM policy with the required permissions. To create an IAM policy, navigate to the IAM console, click “Policies” in the sidebar, and then click “Create policy”. Switch to the JSON tab and paste in these permissions under the Action list. In addition to those permissions, paste the following permissions to let runscripts.workflow build and push vEcoli Docker images to ECR using runscripts/container/build-and-push-ecr.sh:

  • ecr:CompleteLayerUpload

  • ecr:CreateRepository

  • ecr:InitiateLayerUpload

  • ecr:PutImage

  • ecr:UploadLayerPart

Nextflow also requires access to S3, which must be granted as described here. The easiest way to do this is to grant full S3 access to the Nextflow policy by pasting s3:* to the list of allowed actions.

Then, add "*" to the Resource list to make the allowed actions defined in this permission policy valid for all AWS resources. Click “Next”, give the policy a name and description, and create the policy.

After creating the policy, create a new IAM role for Nextflow by clicking “Roles” in the sidebar and then “Create role”. Select “AWS service” as the trusted entity type and “EC2” as the use case. Click “Next”, then attach the policy you just created. Click “Next”, give the role a name and description, and create the role.

For the Batch role, create a new IAM role as described above but attach the AWS-managed policies AmazonS3FullAccess and AmazonEC2ContainerServiceforEC2Role (see here for more details).

Setup Batch

AWS Batch manages the compute resources for running vEcoli workflows. Follow these steps to set it up:

  1. Navigate to the AWS Batch console

2. Click “Environments” in the sidebar, then “Create environment –> Compute environment” 2. Use the following settings to create the new compute environment:

  • Environment type: “Amazon Elastic Compute Cloud (Amazon EC2)”

  • Name: Choose a name

  • Orchestration type: “Managed”

  • Instance role: Select the Batch role you created earlier

  • Use EC2 Spot instances: True (optional, but strongly recommended for cost savings)

  • Maximum vCPUs: Set based on maximum # of lineages you want to run in parallel (AWS Batch will create and terminate VM instances as needed to meet demand up to this limit)

  • Allowed instance types: See below for recommendations

  • VPC ID: Select the VPC you created earlier

  • Subnets: Select all subnets in that VPC

  • Leave other settings as default

  1. Create a Job Queue:

    • Orchestration type: “Amazon Elastic Compute Cloud (Amazon EC2)”

    • Name: Choose a name. This is the name you will use for the batch_queue key in your config JSON.

    • Connected compute environments: Select your compute environment

We strongly recommend using the latest generation of general-purpose Graviton EC2 instances (M8g as of Feb 2026). These instances offer excellent price-performance for vEcoli workflows, which are CPU-bound and benefit from the lack of hyperthreading on Graviton processors. They also offer 4 GiB of memory per vCPU, which is the default memory/CPU allocation for each simulation.

Warning

Do not mix Graviton (ARM) and non-Graviton (x86) instances in the same compute environment. If you choose Graviton instances, make sure to use a Graviton instance (e.g., t4g.medium) for your head node as well (see below).

Tip

To retrieve the latest price/physical CPU for different instance types, run uv run runscripts/cloud_pricing/aws.py --region <your region>. Benchmark workflow performance on different instance types to find the best price-performance for your specific workflow configuration.

Launch an EC2 Instance

Create an Instance

Create a small EC2 instance to run Nextflow. Navigate to the EC2 console and launch a new instance with:

  • An Amazon Linux 2023 Amazon Machine Image (AMI), pick “64-bit (ARM)” architecture if using Graviton instances or “64-bit (x86)” if using non-Graviton instances

  • Instance type: Min. 4 GiB memory, must match Batch compute environment CPU architecture, try t4g.medium (Graviton) or t3.medium (non-Graviton)

  • Key pair: Create a new key pair or use an existing one to SSH into the instance

  • Network: Select the VPC you created earlier

  • Security group: Create a new one allowing SSH traffic from your IP only

  • Storage: 30 GiB gp3

  • Under Advanced details, set IAM instance profile to the Nextflow role you created (ECR for the Stanford GovCloud account).

Note

Run chmod 400 /path/to/your-key.pem on your private key file to ensure it has the correct permissions for SSH.

Warning

Remember to stop your EC2 instance when your workflow finishes to avoid unnecessary charges.

Connect to Your Instance

SSH into your newly created EC2 instance using the private key from above:

ssh -i /path/to/your-key.pem ec2-user@<instance-public-dns>

Install Dependencies

On the EC2 instance, install Git, Docker, and Java:

# Update package manager
sudo yum update -y

# Install Git, Java (required for Nextflow), and Docker
sudo yum install -y git java docker

# Start Docker service and enable on boot
sudo systemctl start docker
sudo systemctl enable docker

# Add your user to the docker group
sudo usermod -aG docker $USER

# Set AWS CLI default region (us-gov-west-1 for Stanford GovCloud)
aws configure set region <your-region>

# Log out and back in for group changes to take effect

Clone the vEcoli repository:

git clone https://github.com/CovertLab/vEcoli.git --filter=blob:none
cd vEcoli

Install uv, then create a new virtual environment and install S3FS:

curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc
uv venv
uv pip install s3fs boto3

Run the following to automatically activate the virtual environment:

echo "source ~/vEcoli/.venv/bin/activate" >> ~/.bashrc
source ~/.bashrc

Finally, install Nextflow:

curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
chmod +x /usr/local/bin/nextflow

Setup Workflow

Create S3 Bucket

vEcoli workflows use S3 for output storage. Create a new S3 bucket using the AWS Console or AWS CLI:

aws s3 mb s3://your-vecoli-bucket-name

Replace your-vecoli-bucket-name with a globally unique bucket name.

Danger

Do NOT use underscores in your bucket name. Use hyphens instead. Bucket names must be DNS-compliant.

Configure Your Workflow

Tell vEcoli to use your S3 bucket by setting the out_uri key under the emitter_arg key in your config JSON (see JSON Config Files). The URI should be in the form s3://your-vecoli-bucket-name. Remember to remove the out_dir key under emitter_arg if present.

On AWS, each job in a workflow (ParCa, sim 1, sim 2, etc.) is run using Docker containers managed by AWS ECS. vEcoli uses a build script to create and push Docker images to Amazon ECR.

Note

Files that match the patterns in .dockerignore are excluded from the Docker image.

The following configuration keys, in addition to the out_uri key under emitter_arg, are REQUIRED to run runscripts.workflow on AWS:

{
  "aws": {
    # Boolean, whether to build a fresh Docker image. If files that are
    # not excluded by .dockerignore did not change since your last build,
    # you can set this to false to skip building the image.
    "build_image": true,
    # Name of Docker image to build (or use directly, if build_image is false)
    "container_image": "vecoli-workflow",
    # AWS region (optional, defaults to us-gov-west-1)
    "region": "us-west-2",
    # AWS Batch job queue name (optional, defaults to "vecoli")
    "batch_queue": "vecoli"
  }
}

Tip

We strongly recommend setting progress_bar to false in your config JSON when running workflows on AWS to reduce the amount of generated logs, which are billed as described here.

Build and Push Image

The build process is handled automatically when you launch a workflow with build_image: true. However, you can also manually build and push images using the runscripts/container/build-and-push-ecr.sh script.

Running Workflows

After setting the required options in your configuration JSON, use screen or tmux to open a virtual console that will persist after your SSH connection is closed. In that console, invoke runscripts.workflow as normal to start a workflow:

python runscripts/workflow.py --config your_config.json

Note

Unlike workflows run locally, AWS workflows are run using containers with a snapshot of the repository at the time the workflow was launched. This means that any changes made to the repository after launching a workflow will not be reflected in that workflow.

Once your workflow has started, you can press “Ctrl+A D” (for screen) or “Ctrl+B D” (for tmux) to detach from the virtual console and close your SSH connection. The EC2 instance must continue to run until the workflow is complete. You can SSH into your instance and reconnect to the virtual terminal with screen -r or tmux attach to monitor progress.

Warning

AWS Batch Spot instances can be interrupted at any time. Analysis scripts that take more than a few hours to run should be excluded from workflow configurations and manually run using runscripts.analysis afterwards. If you require guaranteed compute, modify your Batch compute environment to not use Spot instances.

Handling Outputs

Once a workflow is complete, all outputs should be in your S3 bucket at the URI specified in the out_uri key under emitter_arg in the configuration JSON.

We strongly discourage users from downloading large amounts of data from S3, as that will incur significant data transfer charges. Instead, run analyses on an EC2 instance in the same region as your S3 bucket - this avoids data transfer fees.

Data stored in S3 incurs charges based on:

  1. Storage amount: Costs vary by region and storage class

  2. Storage duration: Charges are prorated

  3. Data transfer: Transfers out of AWS or between regions incur charges

  4. Request costs: GET, PUT, etc. have per-request costs

Storing terabytes of simulation data can cost $1000+/year. For cost management:

  • Delete workflow output data from S3 as soon as you finish your analyses

  • Consider using S3 Lifecycle policies to automatically move data to cheaper storage classes (e.g., S3 Glacier) after a certain period

  • Use S3 Intelligent-Tiering for automatic cost optimization

  • Run analyses on EC2 instances in the same region as your S3 bucket

If necessary, it is likely cheaper to re-run the workflow to regenerate data later than to keep it around long-term.

AWS Interactive Containers

Since all steps of the workflow are run inside Docker containers, it can be helpful to launch an interactive instance of the container for debugging. This is also useful for running standalone analyses on workflow outputs.

For simplicity, we recommend reusing the same EC2 instance that you created to launch workflows. If you need more compute power (e.g., to run ad-hoc analyses), you can change the instance type to a more powerful one following these instructions. Make sure to choose an instance type that matches the CPU architecture of your Docker image (Graviton/ARM vs. non-Graviton/x86) and remember to change it back to a smaller instance when done to save costs.

If you need more storage, create and attach a new EBS volume following these instructions. Then, follow these instructions to make the new volume available for use.

From inside your cloned repository on your EC2 instance, run:

runscripts/container/interactive.sh -r aws -i container_image

container_image should match the name in your config JSON (e.g., vecoli-workflow). A copy of the config JSON should be saved to your S3 bucket with the other output for reference (see Output).

Note

Inside the interactive container, you can use python or ipython directly in addition to the usual uv commands.

Inside the container, navigate to /vEcoli and add breakpoints as you see fit. Note the working directory (see Troubleshooting) of the Nextflow task you want to debug. Download the .command.run file for your task from S3 to a temporary debug directory and run it:

mkdir debug
cd debug
aws s3 cp s3://your-vecoli-bucket-name/path/to/workdir/.command.run .
chmod +x .command.run
./command.run

Warning

Any changes that you make to /vEcoli inside the container are discarded when the container terminates.

The files located in /vEcoli are a copy of your cloned repository (excluding files ignored by .dockerignore) at the time the workflow was launched. To start an interactive container that reflects the current state of your cloned repository, add the -d flag to start a “development” container.