Skip to content

Latest commit

 

History

History
88 lines (48 loc) · 4.96 KB

power_measurement.adoc

File metadata and controls

88 lines (48 loc) · 4.96 KB

MLPerf Training - Power Measurement Rules

1. Preamble

This document describes how to measure Power for benchmarks in the MLPerf Training suite. It only outlines the additional steps to be considered for Power submissions compared with original Performance submissions. Please see the MLPerf Training rules that cover the submission, review, and publication process for MLPerf Training benchmarks in detail.

2. Overview

The MLPerf name and logo are trademarks. In order to refer to a result using the MLPerf name, the result must conform to the letter and spirit of the rules specified in this document. The MLPerf organization reserves the right to solely determine if a use of its name or logo is acceptable.

3. Requirements

3.1. Scope of Measurement

3.1.1. Measure at Node Level

A system under test (SUT) consists of a defined set of hardware and software resources that will be measured for performance and power. The hardware resources may include processors, accelerators, memories, disks, and interconnect. The software resources may include an operating system, compilers, libraries, and drivers that significantly affect the running time or power consumption of a benchmark. Each power submitter must document the system being measured in the system description file.

Example : for data center environments, think of a SUT (or system) as a host or a node. This definition should apply to Cloud, Enterprise or HPC machine installs.

3.1.2. Measure All Nodes

All nodes participating in the performance runs must be measured.

3.1.3. Measure or Estimate Interconnects

If an interconnect is used to transmit weights, weight-updates, or activations, its power usage must be measured or estimated.

The estimated power of an interconnect is its maximum specified power consumption, weighted by the provisioning to a run. The total estimated power of interconnects is the sum of estimates over connections in the network topology; submission of networking topology or estimation details is optional.

3.1.4. Not Required to Measure Other Components

Other components, such as remote storage and cooling that is not part of the participating nodes, are not required to be measured.

3.2. Measure at the PSUs

The node level measurement is at the PSU for DC. If AC/DC PSUs in an AC system can only report input power, AC/DC conversion efficiency factor can be used to derive the DC power of the system. AC/DC efficiency of the PSU must be documented, e.g., using the 80Plus efficiency rating of the converter at the operating point.

The PSU measurement must exposes the power consumed by all the components of the node in a composite manner.

Using PDU measurement for indivual nodes is also acceptable.

3.3. Measure all Runs

Follow the training rules for how many runs is needed for each benchmark.

3.4. Report 1 Measurement per Second

Power measurement must be at least once every second distributed evenly throughout the timed portion of a run.

3.5. Measure max(a run, 60 power samples)

Power measurement begins just before a run starts and stops just afterwards. Power measurement must include all timed operations and may include a small fraction of untimed operations.

For runs that take <60 samples to complete, suggest extending the training loop till the minimum required number of power samples is acquired. The extended period should use the same codepath & not skip any computation to comply with the spirit of the rule. Continue reporting performance markers, such as epoch or block start/stop and throughputs, in the extended period.

3.6. Target 5% Accuracy

A power meter is hardware and / or software that measures power and / or energy.

A power meter is considered to have x% accuracy at 1 Hz, if:

  1. It’s a SPEC or revenue grade power with specified accuracy better than x% at 1 Hz; or

  2. Showing accuracy within x% at y Hz when comparing to a power meter in 1, by the following test:

    1. Measure the same SUT;

    2. Reporting frequency at 1 Hz;

    3. Under different load conditions, e.g. idle + 2 different MLPerf benchmarks;

    4. Olympic scoring of the two meters, for 5 un-overlapped timed durations of 1 minute each, are within x % of each other.

4. Scoring

The score represents the energy consumption to accomplish a given task. For training, the task is a training run.

The energy metric is calculated by summing up energy consumption from all SUTs. For a given SUT, the energy consumption is calculated as:

  1. sum(a power reading * duration between the previous reading and the current reading) for all readings covering the timed portion of a run, if power is measured;

  2. sum(maximum specified power consumption * duration of a run * provisioning ratio of the SUT for a run) for components whose power consumption is estimated (e.g. for interconnect).

Use Olympic scoring amoung multiple runs for power scores; choose the same runs as the performance score’s Olympic scoring.