1. 程式人生 > >Optimizing Disk Usage on Amazon ECS

Optimizing Disk Usage on Amazon ECS

My colleague Jay McConnell sent a nice guest post that describes how to track and optimize the disk space used in your Amazon ECS cluster.

On October 4 Amazon ECS launched support for automated container and image cleanup. Read about it in the
documentation
.

Failure to monitor disk space utilization can cause problems that prevent Docker containers from working as expected. Amazon EC2 instance disks are used for multiple purposes, such as Docker daemon logs, containers, and images. This post covers techniques to monitor and reclaim disk space on the cluster of EC2 instances used to run your containers.

Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to run applications easily on a managed cluster of Amazon EC2 instances. You can use ECS to schedule the placement of containers across a cluster of EC2 instances based on your resource needs, isolation policies, and availability requirements.

The ECS-optimized AMI stores images and containers in an EBS volume that uses the devicemapper storage driver in a direct-lvm configuration. As devicemapper stores every image and container in a thin-provisioned virtual device, free space for container storage is not visible through standard Linux utilities such as df. This poses an administrative challenge when it comes to monitoring free space and can also result in increased time troubleshooting task failures, as the cause may not be immediately obvious.

Disk space errors can result in new tasks failing to launch with the following error message:

 Error running deviceCreate (createSnapDevice) dm_task_run failed

NOTE: The scripts and techniques described in this post were tested against the ECS 2016.03.a AMI. You may need to modify these techniques depending on your operating system and environment.

Monitoring

You can use Amazon CloudWatch custom metrics to track EC2 instance disk usage. After a CloudWatch metric is created, you can add a CloudWatch alarm to alert you proactively, before low disk space causes a problem on your cluster.

Step 1: Create an IAM role

The first step is to ensure that the EC2 instance profile for the EC2 instances in the ECS cluster uses the “cloudwatch:PutMetricData” policy, as this is required to publish to CloudWatch.
In the IAM console, choose Policies, Create Policy. Choose Create Your Own Policy, name it “CloudwatchPutMetricData”, and paste in the following policy in JSON:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudwatchPutMetricData",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

After you have saved the policy, navigate to Roles and select the role attached to the EC2 instances in your ECS cluster. Choose Attach Policy, select the “CloudwatchPutMetricData” policy, and choose Attach Policy.

Step 2: Push metrics to CloudWatch

#!/bin/bash

### Get docker free data and metadata space and push to CloudWatch metrics
### 
### requirements:
###  * must be run from inside an EC2 instance
###  * docker with devicemapper backing storage
###  * aws-cli configured with instance-profile/user with the put-metric-data permissions
###  * local user with rights to run docker cli commands
###
### Created by Jay McConnell

# install aws-cli, bc and jq if required
if [ ! -f /usr/bin/aws ]; then
  yum -qy -d 0 -e 0 install aws-cli
fi
if [ ! -f /usr/bin/bc ]; then
  yum -qy -d 0 -e 0 install bc
fi
if [ ! -f /usr/bin/jq ]; then
  yum -qy -d 0 -e 0 install jq
fi

# Collect region and instanceid from metadata
AWSREGION=`curl -ss http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region`
AWSINSTANCEID=`curl -ss http://169.254.169.254/latest/meta-data/instance-id`

function convertUnits {
  # convert units back to bytes as both docker api and cli only provide freindly units
  if [ "$1" == "b" ] ; then
    echo $2
  elif [ "$1" == "kb" ] ; then
    awk 'BEGIN{ printf "%.0f\n", '$2' * 1000 }'
  elif [ "$1" == "mb" ] ; then
    awk 'BEGIN{ printf "%.0f\n", '$2' * 1000**2 }'
  elif [ "$1" == "gb" ] ; then
    awk 'BEGIN{ printf "%.0f\n", '$2' * 1000**3 }'
  elif [ "$1" == "tb" ] ; then
    awk 'BEGIN{ printf "%.0f\n", '$2' * 1000**4 }'
  else
    echo "Unknown unit $1"
    exit 1
  fi
}

function getMetric {
  # Get freespace and split unit
  if [ "$1" == "Data" ] || [ "$1" == "Metadata" ] ; then
    docker info | awk '/'$1' Space Available/ {print tolower($5), $4}'
  else
    echo "Metric must be either 'Data' or 'Metadata'"
    exit 1
  fi
}

data=$(convertUnits `getMetric Data`)
aws cloudwatch put-metric-data --value $data --namespace ECS/$AWSINSTANCEID --unit Bytes --metric-name FreeDataStorage --region $AWSREGION
data=$(convertUnits `getMetric Metadata`)
aws cloudwatch put-metric-data --value $data --namespace ECS/$AWSINSTANCEID --unit Bytes --metric-name FreeMetadataStorage --region $AWSREGION

Next, set the script to be executable:

chmod +x /path/to/metricscript.sh

Now, schedule the script to run every 5 minutes via cron. To do this, create the file /etc/cron.d/ecsmetrics with the following contents:

*/5 * * * * root /path/to/metricscript.sh

This pulls both free data and metadata every 5 minutes and push them to CloudWatch with the namespace ECS/.

Disk cleanup

The next step is to clean up the disk, either automatically on a schedule or manually. This post covers cleanup of tasks and images; there is a great blog post, Send ECS Container Logs to CloudWatch Logs for Centralized Monitoring, that covers pushing log files to CloudWatch. Using CloudWatch Logs instead of local log files reduces disk utilization and provides a resilient and centralized place from which to manage logs.

Take a look at what you can do to remove unneeded containers and images from your instances.

Delete unused containers

Stopped containers should be deleted if they are no longer needed. The ECS agent, by default, deletes all containers that have exited every 3 hours. This behavior can be customized by adding the following to /etc/ecs/ecs.config:

ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=10m

This sets the frequency of the task to 10 minutes.

Delete unused images

By default the ECS agent will check for images to delete every 30 minutes, an image will only be deleted if it was pulled more than an hour prior, and a maximum of 5 images will be deleted per check.

These defaults can be changed with the following configuration options:

ECS_IMAGE_CLEANUP_INTERVAL=10m
ECS_IMAGE_MINIMUM_CLEANUP_AGE=15m
ECS_NUM_IMAGES_DELETE_PER_CYCLE=10

Setting the above will result in checks every 10 minutes that delete up to 10 unused images that are at least 15 minutes old.

Applying config changes

For this change to take effect, the ECS agent needs to be restarted, which can be done via ssh:

stop ecs; start ecs

To set this up for new instances, attach the following EC2 user data:

cat /etc/ecs/ecs.config | grep -v 'ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION' > /tmp/ecs.config
echo "ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=5m" >> /tmp/ecs.config
mv -f /tmp/ecs.config /etc/ecs/
stop ecs
start ecs

Conclusion

The techniques described in this post provide visibility into a critical component of running Docker—the disk space used on the cluster’s EC2 instances—and techniques to clean up unnecessary storage. If you have any questions or suggestions for other best practices, please comment below.

相關推薦

Optimizing Disk Usage on Amazon ECS

My colleague Jay McConnell sent a nice guest post that describes how to track and optimize the disk space used in your Amazon ECS cluster. —

How to Troubleshoot High or Full Disk Usage with Amazon Redshift

SELECT q.query, trim(q.cat_text) FROM ( SELECT query, replace( listagg(text,' ') WITHIN GROUP (ORDER BY sequence), '\\n', ' ') AS cat_text FROM

Running Windows Containers on Amazon ECS

This post was developed and written by Jeremy Cowan, Thomas Fuller, Samuel Karp, and Akram Chetibi. — Containers have revolutioni

Orchestrating GPU-Accelerated Workloads on Amazon ECS

My colleagues Brandon Chavis, Chad Schmutzer, and Pierre Steckmeyer sent a nice guest post that describes how to run GPU workloads on Amazon ECS.

Spotinst Elastigroup for Amazon ECS on AWS

This Quick Start sets up an AWS architecture for Spotinst Elastigroup for Amazon Elastic Container Service (Amazon ECS) and deploys it into your A

Have You Tried Delphi on Amazon Linux? (就是AWS用的Linux)

enables custom customers servers nbsp ble exists compile targe The new Delphi Linux compiler enables customers to take new or existing Wi

How to force https on amazon elastic beanstalk

conf 使用 quest The 沒有 last director 改變 etc 假設您已在負載平衡器安全組中啟用https,將SSL證書添加到負載平衡器,將443添加到負載平衡器轉發的端口,並使用Route 53將您的域名指向Elastic Beanstalk環境(或等

How to use common workflows on Amazon SageMaker notebook instances

Amazon SageMaker notebook instances provide a scalable cloud based development environment to do data science and machine learning. This blog post

Accelerate model training using faster Pipe mode on Amazon SageMaker

Amazon SageMaker now comes with a faster Pipe mode implementation, significantly accelerating the speeds at which data can be streamed from Amazon

Migrate to Apache HBase on Amazon S3 on Amazon EMR: Guidelines and Best Practices

This blog post provides guidance and best practices about how to migrate from Apache HBase on HDFS to Apache HBase on Amazon S3 on Amazon EMR.

time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT sensor network | AWS Big Data Blog

Bushfires are frequent events in the warmer months of the year when the climate is hot and dry. Countries like Australia and the United States are

Selling plants on Amazon: A forest of untapped opportunity

Lauri Baker, Cheryl Boyer, Hikaru Peterson, and Audrey King researched this facet of the $14 billion US horticulture industry to establish a starting poin

Start Cyber Monday early and save $60 on Beats Solo3 Headphones on Amazon

We continue to find deals to be thankful for this weekend, and right now our eyes are on the Beats Solo3 Wireless headphones which are now $60 off on Amazo

The Guardian’s Migration from MongoDB to PostgreSQL on Amazon RDS

轉載一片mongodb 遷移pg 資料庫的文章 原文: https://www.infoq.com/news/2019/01/guardian-mongodb-postgresql The Guardian migrated their CMS's datastor

Large-Scale Machine Learning with Spark on Amazon EMR

This is a guest post by Jeff Smith, Data Engineer at Intent Media. Intent Media, in their own words: “Intent Media operates a platform for adverti

Getting Started on Amazon Web Services (AWS)

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Troubleshoot Courier Fetch Error in Kibana on Amazon ES

Use one or more of the following methods to resolve the courier fetch error: Consider resizing your cluster Confirm that

Resolve "OutOfMemoryError" Hive Java Heap Space Exceptions on Amazon EMR that Occur when Hive Outputs the Query Results

export HIVE_CLIENT_HEAPSIZE=1024 export HIVE_METASTORE_HEAPSIZE=2048 export HIVE_SERVER2_HEAPSIZE=3072 if [ "$SERVICE" = "metastore" ] then exp

Host a Public Website on Amazon EC2 Using IIS

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Amazon ECS Containers Keep Restarting

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So