Compute Environments in AWS Batch: How to Set Up

When you need to run training models and complex analysis with batch jobs at scale, AWS batching can be a good solution. AWS provides a special service to perform a large number of computing operations effectively and without management overhead. This blog post covers the AWS Batch architecture and configuration principles for batch processing.

NAKIVO for AWS EC2 Backup

NAKIVO for AWS EC2 Backup

Backup of Amazon EC2 instances to EC2, AWS S3 and onsite. Anti-ransomware options. Fast recovery of instances and application objects.

What Is AWS Batch?

AWS Batch is a cloud service provided by Amazon Web Services (AWS) designed to enable developers, engineers, and scientists to easily and efficiently run thousands of batch computing jobs on the AWS cloud. Batch computing is a way of processing large volumes of data by breaking the work into smaller units that can be processed simultaneously.

AWS Batch simplifies the process of deploying, managing, and scaling batch computing jobs. It automatically provisions compute resources and optimizes the allocation of these resources to deliver high throughput at low cost. With AWS Batch, you don’t need to install or manage batch computing software or server clusters that traditionally handle these tasks, making it easier to run complex computing jobs at scale.

AWS Batch provides the following key features:

  • Dynamic resource allocation. AWS Batch dynamically provisions resources, including the optimal quantity and type of compute resources (CPU or memory-optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.
  • Managed compute environments. You can specify the compute resources for your jobs and AWS Batch will manage the underlying infrastructure, scaling up or down as needed to run jobs as efficiently as possible.
  • Job scheduling. AWS Batch queues jobs and plans and schedules their execution based on the compute resources available and the priority of the batch jobs. It ensures that higher-priority jobs are completed first and manages the execution dependencies between jobs, if any.
  • Container support. AWS Batch is integrated with Amazon Elastic Container Service (ECS) and supports Docker, allowing you to package your jobs into containers. This ensures that your compute environments are isolated and consistent, making them more secure and easier to manage.
  • Integration with AWS services. AWS Batch can be easily integrated with other AWS services such as Amazon S3, Amazon DynamoDB, Amazon RDS, AWS Lambda, and more, enabling you to build complex, scalable batch processing architectures.

Components of AWS Batch

To understand the AWS Batch service better, it’s important to know which main components are used. The primary AWS Batch components are:

  • Job definitions. These are templates that describe how jobs are to be run. A job definition specifies various settings related to the job, such as the Docker image to use, the vCPUs and memory requirements, the command to run, environment variables, a retry strategy, and data volumes, among other configurations. You can create multiple job definitions for different kinds of jobs you plan to run.
  • Job queues. Job queues are where jobs reside until they are scheduled to run on compute environments. You can have multiple job queues for different priority levels (for example, high, medium, low) or different types of jobs. Jobs in the queue are scheduled based on their priority and the compute environment’s order of assignment to the job queue.
  • Compute environments. AWS Batch compute environments are collections of computing resources that are used to run batch jobs. Compute environments can be managed and unmanaged.
  • Jobs. Jobs are the individual units of work that are submitted to AWS Batch. Each job runs a Docker container image, and you can specify the vCPUs, memory, and other requirements for each job. Jobs can be dependent, meaning a job can depend on the successful completion of another job or jobs before it runs.
  • Scheduling. AWS Batch scheduling is the process that determines how jobs are assigned to compute resources. The scheduler evaluates job queue, priority, and compute resource availability to run jobs efficiently. AWS Batch scheduler can optimize both compute resource utilization and job completion time.
  • Lambda functions (optionally). While not a core component, AWS Lambda can be used in conjunction with AWS Batch for various purposes, such as triggering jobs in response to events, processing job results, or dynamically modifying job queues or compute environments.

Understanding Compute Environments

Compute environments within AWS Batch represent the computing infrastructure that executes your batch jobs. They are essentially the environments where your computing resources reside. You can think of a compute environment as a pool of computational resources within AWS Batch that is managed and scaled by the service to run submitted batch jobs. These environments can be configured with specific types of compute resources, defined by instance types, or optimized for particular computing tasks.

There are two types of compute environments in AWS Batch:

1. Managed compute environments

In this type of setup, AWS manages the compute environment on your behalf. AWS Batch automatically manages the scaling and provisioning of the compute resources based on the job requirements. You only need to specify the desired instance types (or range), the minimum, desired, and maximum vCPUs, and other specifics like the allocation strategy. AWS Batch will manage the rest of the tasks, including the decision to scale up or down based on the workload, automatically adjusting the amount of computational resources based on the workload without manual intervention.

This environment supports two types of instances: On-Demand Instances and Spot Instances, with the option to combine them for cost optimization and increased capacity.

The advantages of managed compute environments are:

  • Automatic scaling. AWS Batch can automatically scale resources up or down based on the job queue requirements, which helps in optimizing costs and resource utilization.
  • Easy setup. Less configuration is needed from your side, as AWS manages the underlying infrastructure, including instances and scaling policies.
  • Cost-effectiveness. You can take advantage of Spot Instances within managed compute environments to save costs for your batch jobs.

2. Unmanaged compute environments

In an unmanaged compute environment, you manage your own compute resources. This means you are in control of setting up and scaling the cluster of EC2 instances or Spot Instances that will run your batch jobs. This option allows for more granular control over the computing environment but requires more setup and management effort on your part. It’s suitable for situations where specific configurations, custom AMIs (Amazon Machine Images), or specialized resource needs are involved.

Key attributes of compute environments are:

  • Compute resource types. You can specify the type of instances that your environment will use. This can be optimal (where AWS Batch selects the resource type automatically), specific instance types, or a mix that fits your job requirements.
  • Scaling policies. For managed environments, AWS Batch dynamically scales the compute resources up or down based on job submission and completion patterns, ensuring optimal cost-efficiency and performance.
  • Launch template support. You can specify EC2 launch templates for your compute environments, allowing for customization of EC2 instances in managed compute environments.
  • Spot integration. AWS Batch supports using EC2 Spot Instances in both managed and unmanaged compute environments, offering cost savings for flexible workloads.

The advantages of this type of compute environment are:

  • Complete control. You have full control over the compute environment, including the types of instances and detailed configurations.
  • Customization. This is ideal for specific requirements that cannot be met by the managed compute environment, allowing for greater flexibility in terms of instance types, configurations and scaling strategies.
  • Integration with Existing Infrastructure. If you already have configured environments or particular requirements for security, compliance or use of reserved instances, an unmanaged environment might be the right choice.

Use Cases for AWS Batch

AWS Batch can be used in a wide range of scenarios. The most common use cases where AWS Batch provides efficiency and scalability, showcasing its versatility across different industries and applications are listed below.

  • Large-scale data processing and analysis. Organizations dealing with substantial amounts of data, such as those in the fields of genomics, financial analysis, or environmental modeling, can use AWS Batch to process large datasets. The service can efficiently manage the computational resources needed to analyze high volumes of data in parallel, significantly reducing the time needed to process and analyze data, from hours or days to minutes.
  • Machine learning model training and inference. Data scientists and ML engineers can employ AWS Batch for training machine learning models on large datasets. AWS Batch can dynamically scale computing resources to meet the demands of various training jobs, from small-scale model tuning to large-scale, deep-learning model training across numerous GPUs. Similarly, it can handle batch inference tasks, processing large batches of inference requests efficiently.
  • Image or video processing. Media companies, content providers, or even scientific research institutions often need to process large collections of images or videos, whether it’s for rendering, transcoding, or analysis (for example, satellite image analysis for environmental monitoring). AWS Batch can scale to accommodate the processing of thousands of files concurrently, significantly speeding up the workflow.
  • Simulation and modeling workloads. For industries engaged in simulations (pharmaceuticals, automotive, aerospace, etc.), where thousands of simulations may be needed to model complex physical phenomena or to test various scenarios, AWS Batch enables the efficient running of these computational-intensive tasks. It ensures that each simulation has the required computational resources, potentially reducing the time to results from weeks to days or hours.
  • Financial risk modeling. Financial institutions can use AWS Batch to run complex risk models across large datasets. By dynamically scaling compute resources, AWS Batch ensures that risk assessments, which need to analyze vast amounts of historical financial data, can be completed rapidly, aiding in swift decision-making.
  • Software build and test pipelines. Software development teams can use AWS Batch to automate their build and test pipelines. For projects with large test suites or requiring builds on multiple platforms, AWS Batch can significantly reduce completion times by running tests in parallel and scaling to meet peak demands.

Step-by-Step Setup Guide

Let’s explore how to configure an AWS batch job, compute environment, and other required components.

Preparation steps

Prepare AWS roles to be used for AWS batch jobs:

  1. Navigate to the IAM Console. Open the AWS Management Console, search for the IAM service, and open it.
  2. Create AWSBatchServiceRole:
    • In the IAM dashboard, select Roles and then click Create role.
    • Select an AWS service as a trusted entity, select Batch, and then click Next: Permissions.
    • Attach the AWSBatchServiceRole policy. If you don’t see it, search for it in the search bar.
    • Click Next, name the role (for example, AWSBatchServiceRole), and create the role.
  3. Create EC2 Instance Role:
    • Repeat the role creation process, but this time select EC2 as the trusted entity.
    • Attach the AmazonEC2ContainerServiceforEC2Role policy or any other policy required for your use case.
    • Name the role (for example, ecsInstanceRole) and create it.

Accessing AWS Batch

Type batch in the AWS services search field and click AWS Batch when this item is displayed.

How to access AWS Batch

The AWS Batch dashboard page should now open.

Creating a compute environment

Compute environments contain the Amazon ECS container instances that your jobs will run on.

Click Compute environments on the AWS Batch dashboard page, then hit Create to create a new computer environment.

How to create an AWS Batch compute environment

The Create compute environment wizard opens.

  1. Compute environment configuration.
    • Select a compute platform, for example, Amazon Elastic Compute Cloud (Amazon EC2).
    • Choose between Managed and Unmanaged. Managed environments are managed by AWS, while unmanaged environments are managed by you.
    • Fill out the form according to your needs. Enter a name for your compute environment (env01test in this example). Specify the roles created in the preparation step where required. Alternatively, you can create a new role on this screen if you haven’t created it previously. There are useful tips for each field to help you set the optimal value.

    Creating an AWS Batch compute environment with EC2

  2. Instance configuration. Set the needed vCPU (virtual central processor unit) parameters. Select the instance type. You can select Spot Instances or On-Demand Instances. Note that if the Minimum vCPUs parameter is set to 0, then the AWS resources will not be wasted when there is no work to proceed (this is the recommended value).

    Instance configuration for an AWS Batch compute environment

  3. Network. You can leave the default settings. If you need to customize network settings, you can select existing VPC IDs and subnets or create new ones.
  4. Review. Check the configuration and save your compute environment. Hit Create compute environment.

Creating a Job Queue

Job queues are used to hold jobs – jobs reside in a unique job queue. Job queues are associated with compute environments.

  1. Create a job queue. In the AWS Batch dashboard, click Job queues, then hit Create.

    How to create a job queue

  2. Select the orchestration type, such as Amazon EC2 (Fargate and EKS are other available options).

    Creating a job queue for AWS Batch processing

  3. Enter a name for your job queue and a priority (higher numbers have higher priority, 1 is the default value) in the job queue configuration section.
  4. Link your job queue to the compute environment created in the previous step (env01test).
  5. Click Create to finalize the job queue.

    Job queue configuration

Creating a Job Definition

Job definitions define how jobs will run. They include the Docker image to use, vCPUs, memory requirements, and more. Job definition parameters can be overridden when running a job.

Navigate to Job definitions on the AWS Batch dashboard and click Create.

How to create a job definition in AWS Batch

  1. Job definition configuration. Specify whether the job will run on Amazon EC2, Fargate or Elastic Kubernetes Service. Define a job definition name.

    Job definition configuration for AWS batching

    • Set up the container properties, including the image, vCPUs, memory, command (if any), and environment variables.
    • Under the execution role, specify an IAM role that has the permissions to pull the Docker image and log to CloudWatch, if necessary.
    • Hit Next at each screen to continue.

    General configuration for an AWS batch job definition

  2. Container configuration. Select the command syntax (bash or JSON). Enter the needed command in the command field using the specified syntax. You can select the environment configuration, such as the number of vCPUs and memory, and add environment variables.
  3. Linux and logging settings. You can configure Linux and logging settings, including user info, file system configuration, logging configuration, etc.
  4. Job definition review. Check your configuration. You can review the job definition configuration and copy the configuration text (script). Click Create job definition.

Submitting a Job

Now that everything is set up, you can submit a job, which is a unit of work run by AWS Batch.

  1. Go to the Jobs section and click Submit new job.

    How to submit a job for AWS batching

  2. Set the job configuration. Enter a job name. Choose a job definition and job queue you have created before. Set other additional parameters if needed.
  3. Configure job overrides (optional).
  4. Review job settings and hit Create job at the end.

    AWS batch job configuration – submitting a job

The job should be triggered in a few seconds and be in the running state.

Managing Compute Environments

You can monitor the AWS Batch job progress in the AWS Batch Dashboard under the Jobs section (Jobs > Select a job > Details). AWS Batch allows you to see real-time logs, the status of the job, and any output the job generates.

AWS batch job information

You can use the following AWS tools for monitoring and managing compute environments:

  • Amazon CloudWatch. Use this tool to monitor the metrics and logs of your batch jobs and compute environments. It is crucial for understanding job execution performance and debugging issues.
  • AWS CloudTrail. Log and monitor API calls to AWS Batch and other AWS services. This helps in auditing and tracking changes to the compute environments.
  • AWS Cost Explorer. Utilize it for monitoring and managing the costs associated with your compute environments, helping to identify opportunities for optimization.

Use the recommended practices for monitoring:

  • Regularly review AWS Batch metrics and logs in CloudWatch to identify performance bottlenecks or underutilized resources.
  • Continuously optimize job definitions and compute environments based on performance data.
  • Consider using AWS Lambda in combination with AWS Batch for event-driven batch processing.
  • Stay informed about the latest AWS features and best practices for AWS Batch and related services.

Management and optimization of compute environments in AWS Batch are continuous processes that involve monitoring performance, controlling costs, and adjusting strategies based on changing requirements and new AWS capabilities.

Conclusion

AWS Batch is suitable for a wide variety of applications, from data processing and rendering to machine learning model training and financial modeling. This is a fully managed AWS service. The ability to handle complex, compute-intensive batch jobs without the need to manage the underlying infrastructure makes AWS Batch a powerful tool for organizations looking to utilize the cloud for high-throughput computing needs. Don’t forget to configure a backup of your Amazon EC2 instances to protect data. NAKIVO Backup & Replication can help you protect EC2 data effectively.

Try NAKIVO Backup & Replication

Try NAKIVO Backup & Replication

Get a free trial to explore all the solution’s data protection capabilities. 15 days for free. Zero feature or capacity limitations. No credit card required.

People also read