ByteMLPerf Inference General Perf Overview

Byte MLPerf is an AI Accelerator Benchmark that focuses on evaluating AI Accelerators from practical production perspective, including the ease of use and versatility of software and hardware.

Features
  • Models and runtime environments are more closely aligned with practical business use cases.
  • For ASIC hardware evaluation, besides assessing performance and accuracy, it also examines indices like compiler usability and coverage.
  • Performance and accuracy results obtained from testing on the open Model Zoo serve as reference metrics for evaluating ASIC hardware integration.

Architecture

The ByteMLPerf architecture is shown in the figure below:

general_framework

Model Zoo List

The models supported by ByteMLPerf Inference General Perf are collected under the Model Zoo. From the perspective of access rights, they are currently divided into internal models and open models. Released with ByteMLPerf is the open model included in the corresponding version.

Open model collection principles:

  • Basic Model: including Resnet50, Bert and WnD;
  • Popular Model:Includes models currently widely used in the industry;
  • SOTA: including SOTA models corresponding to business domains;

In addition to the complete model structure, ByteMLPerf will also add some typical model substructure subgraphs or OPs (provided that the open model cannot find a suitable model containing such classic substructures), such as transformer encoder/decoder with different sequence lengths , all kinds of common conv ops, such as group conv, depwise-conv, point-wise conv, and rnn common structures, such as gru/lstm, etc.

ModelDomainPurposeFrameworkDatasetPrecision
resnet50-v1.5cvregulartensorflow, pytorchimagenet2012fp32
bert-basenlpregulartensorflow, pytorchsquad-1.1fp32
wide&deeprecregulartensorflowcriteofp32
videobertmmpopularonnxcifar100fp32
albertnlppopularpytorchsquad-1.1fp32
conformernlppopularonnxnonefp32
roformernlppopulartensorflowcail2019fp32
yolov5cvpopularonnxnonefp32
robertanlppopularpytorchsquad-1.1fp32
debertanlppopularpytorchsquad-1.1fp32
swin-transformercvpopularpytorchimagenet2012fp32
stable diffusioncvsotaonnxnonefp32

Vendor List

ByteMLPerf Inference General Perf Vendor List will be shown below

VendorSKUKey ParametersSupplement
IntelXeon--
Stream ComputingSTC P920
  • Computation Power:128 TFLOPS@FP16
  • Last Level Buffer: 8MB, 256GB/s
  • Level 1 Buffer: 1.25MB, 512GB/s
  • Memory: 16GB, 119.4GB/S
  • Host Interface:PCIe 4, 16x, 32GB/s
  • TDP: 160W
  • STC Introduction
    GraphcoreGraphcore® C600
  • Compute: 280 TFLOPS@FP16, 560 TFLOPS@FP8
  • In Processor Memory: 900 MB, 52 TB/s
  • Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s
  • TDP: 185W
  • IPU Introduction
    Moffett-AIMoffett-AI S30
  • Compute: 1440 (32x-Sparse) TFLOPS@BF16, 2880 (32x-Sparse) TOPS@INT8,
  • Memory: 60 GB,
  • Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s
  • TDP: 250W
  • SPU Introduction

    With ByteIR

    The ByteIR Project is a ByteDance model compilation solution. ByteIR includes compiler, runtime, and frontends, and provides an end-to-end model compilation solution.

    Although all ByteIR components (compiler/runtime/frontends) are together to provide an end-to-end solution, and all under the same umbrella of this repository, each component technically can perform independently.

    For More Information, please refer to ByteIR

    Models Supported By ByteIR:

    ModelDomainPurposeFrameworkDatasetPrecision
    resnet50-v1.5cvregularmhloimagenet2012fp32
    bert-basenlpregularmhlosquad-1.1fp32