Which AWS service collects metrics from running EC2 instances?

I want to send memory and disk metrics from my Amazon Elastic Compute Cloud (Amazon EC2) instances to Amazon CloudWatch Metrics. How can I do this?

Short description

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

You can download and install the CloudWatch agent manually using the AWS CLI or you can integrate it with AWS Systems Manager Agent (SSM Agent). The CloudWatch agent is supported on both Linux and Windows systems. Use these steps to install the CloudWatch agent:

1.    Create IAM roles or users that enable the agent to collect metrics from the server and, optionally, integrate with AWS Systems Manager. Attach this IAM role to the EC2 instance that you want to install the agent on.

2.    Download the agent package and install the agent package.

3.    Create the CloudWatch agent configuration file and specify the metrics that you want to collect.

This example shows a basic agent configuration file that reports memory usage and disk usage metrics on a Linux system:

EC2 instances running on Linux that use the Elastic Network Adapter (ENA) publish network performance metrics. Version 1.246396.0 and later of the CloudWatch agent enable you to import these network performance metrics into CloudWatch. When you import these network performance metrics into CloudWatch, they are charged as CloudWatch custom metrics.

For more information about the ENA driver, see Enabling enhanced networking with the Elastic Network Adapter (ENA) on Linux instances and Enabling enhanced networking with the Elastic Network Adapter (ENA) on Windows instances.

How you set up the collection of network performance metrics differs on Linux servers and Windows servers.

The following table lists these network performance metrics enabled by the ENA adapter. When the CloudWatch agent imports these metrics into CloudWatch from Linux instances, it prepends ethtool_ at the beginning of each of these metric names.

MetricDescription

Name on Linux servers: bw_in_allowance_exceeded

Name on Windows servers: Aggregate inbound BW allowance exceeded

The number of packets queued and/or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance..

This metric is collected only if you have listed it in the ethtool subsection of the metrics_collected section of the CloudWatch agent configuration file. For more information, see Collect network performance metrics

Unit: None

Name on Linux servers: bw_out_allowance_exceeded

Name on Windows servers: Aggregate outbound BW allowance exceeded

The number of packets queued and/or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.

This metric is collected only if you have listed it in the ethtool subsection of the metrics_collected section of the CloudWatch agent configuration file. For more information, see Collect network performance metrics

Unit: None

Name on Linux servers:

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}
1

Name on Windows servers:

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}
2

The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance.

This metric is collected only if you have listed it in the ethtool subsection of the metrics_collected section of the CloudWatch agent configuration file. For more information, see Collect network performance metrics

Unit: None

Name on Linux servers:

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}
5

Name on Windows servers:

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}
6

The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service.

This metric is collected only if you have listed it in the ethtool subsection of the metrics_collected section of the CloudWatch agent configuration file. For more information, see Collect network performance metrics

Unit: None

Name on Linux servers:

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}
9

Name on Windows servers: ethtool_0

The number of packets queued and/or dropped because the bidirectional PPS exceeded the maximum for the instance.

This metric is collected only if you have listed it in the ethtool subsection of the metrics_collected section of the CloudWatch agent configuration file. For more information, see Collect network performance metrics

Unit: None

Linux setup

On Linux servers, the ethtool plugin enables you to import the network performance metrics into CloudWatch.

ethtool is a standard Linux utility that can collect statistics about Ethernet devices on Linux servers. The statistics it collects depend on the network device and driver. Examples of these statistics include ethtool_3, ethtool_4, ethtool_5, and ethtool_6. When you use the ethtool plugin with the CloudWatch agent, you can also import these statistics into CloudWatch, along with the EC2 network performance metrics listed earlier in this section.

When the CloudWatch agent imports metrics into CloudWatch, it adds an ethtool_ prefix to the names of all imported metrics. So the standard ethtool statistic ethtool_4 is called ethtool_9 in CloudWatch, and the EC2 network performance metric bw_in_allowance_exceeded0 is called bw_in_allowance_exceeded1 in CloudWatch.

On Linux servers, to import ethtool metrics, add an ethtool section to the metrics_collected section of the CloudWatch agent configuration file. The ethtool section can include the following subsections:

  • interface_include— Including this section causes the agent to collect metrics from only the interfaces that have names listed in this section. If you omit this section, metrics are collected from all Ethernet interfaces that aren't listed in bw_in_allowance_exceeded5.

    The default ethernet interface is bw_in_allowance_exceeded6.

  • interface_exclude— If you include this section, list the Ethernet interfaces that you don't want to collect metrics from.

    The ethtool plugin always ignores loopback interfaces.

  • metrics_include— This section lists the metrics to import into CloudWatch. It can include both standard statistics collected by ethtool and Amazon EC2 high-resolution network metrics.

The following example displays part of the CloudWatch agent configuration file. This configuration collects the standard ethtool metrics bw_in_allowance_exceeded7 and ethtool_3, and the Amazon EC2 network performance metrics from only the bw_in_allowance_exceeded9 interface.

For more information about the CloudWatch agent configuration file, see Manually create or edit the CloudWatch agent configuration file.

"metrics": {
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "ethtool": {
        "interface_include": [
          "eth1"
        ],
        "metrics_include": [
          "rx_packets",
          "tx_packets",
          "bw_in_allowance_exceeded",
          "bw_out_allowance_exceeded",
          "conntrack_allowance_exceeded",
          "linklocal_allowance_exceeded",
          "pps_allowance_exceeded"
         ]
      }
   }
}

Windows setup

On Windows servers, the network performance metrics are available through Windows Performance Counters, which the CloudWatch agent already collects metrics from. So you do not need a plugin to collect these metrics from Windows servers.

The following is a sample configuration file to collect network performance metrics from Windows. For more information about editing the CloudWatch agent configuration file, see Manually create or edit the CloudWatch agent configuration file.

{
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "ENA Packets Shaping": {
                "measurement": [
                    "Aggregate inbound BW allowance exceeded",
                    "Aggregate outbound BW allowance exceeded",
                    "Connection tracking allowance exceeded",
                    "Link local packet rate allowance exceeded",
                    "PPS allowance exceeded"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            }
        }
    }
}

Viewing network performance metrics

After importing network performance metrics into CloudWatch, you can view these metrics as time series graphs, and create alarms that can watch these metrics and notify you if they breach a threshold that you specify. The following procedure shows how to view ethtool metrics as a time series graph. For more information about setting alarms, see Using Amazon CloudWatch alarms .

Because all of these metrics are aggregate counters, you can use CloudWatch metric math functions such as Aggregate inbound BW allowance exceeded0 to calculate the rate for these metrics in graphs or use them to set alarms. For more information about metric math functions, see Using metric math.

To view network performance metrics in the CloudWatch console

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. In the navigation pane, choose Metrics.

  3. Choose the namespace for the metrics collected by the agent. By default, this is CWAgent, but you may have specified a different namespace in the CloudWatch agent configuration file.

  4. Choose a metric dimension (for example, Per-Instance Metrics).

  5. The All metrics tab displays all metrics for that dimension in the namespace. You can do the following:

    1. To graph a metric, select the check box next to the metric. To select all metrics, select the check box in the heading row of the table.

    2. To sort the table, use the column heading.

    3. To filter by resource, choose the resource ID, and then choose Add to search.

    4. To filter by metric, choose the metric name, and then choose Add to search.

  6. (Optional) To add this graph to a CloudWatch dashboard, choose Actions, and then choose Add to dashboard.

    Which AWS monitoring service is EC2 publishing metrics at regular intervals to?

    For metrics produced by certain AWS services, such as Amazon EC2, CloudWatch can aggregate data across dimensions. For example, if you search for metrics in the AWS/EC2 namespace but do not specify any dimensions, CloudWatch aggregates all data for the specified metric to create the statistic that you requested.

    Which AWS service is used to analyze metrics?

    Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and the applications you run on AWS in real time. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.

    Which AWS service allows you to monitor the performance of your EC2 instances to assist in troubleshooting?

    For Windows, Amazon EC2 offers EC2Rescue, which customers can use to examine their Windows instances to help identify common problems, collect log files, and help AWS Support to troubleshoot your issues. You can also use EC2Rescue to analyze boot volumes from non-functional instances.

    Which AWS service is used to collect and track performance?

    Amazon CloudWatch collects and visualizes real-time logs, metrics, and event data in automated dashboards to streamline your infrastructure and application maintenance.