Open Source & Reproducibility
ByteMLPerf offers an open-source AI accelerator benchmarking tool, ensuring easy access and utilization for companies and research institutions alike.
Benchmark focuses on AI Accelerators
From a practical production perspective.
ByteMLPerf offers an open-source AI accelerator benchmarking tool, ensuring easy access and utilization for companies and research institutions alike.
ByteMLPerf continually updates its benchmarks to reflect current business scenarios and the state-of-the-art (SOTA).
The tool evaluates beyond just performance and accuracy, taking into account factors such as compiler usability and the applicability of models in real business environments.
ByteMLPerf focuses on a holistic evaluation, including power consumption, cost-effectiveness, and cross-platform compatibility.
Encourages contributions from developers and researchers worldwide, making the benchmarking tool not just an assessment tool but also a platform for innovation.
Maintains a high level of transparency and openness, with all testing methodologies, datasets, and evaluation criteria being publicly accessible.
Provide accurate chip data for user reference and comparison.
SKU | Board Design | Memory Specifications | Computing Performance Specifications | Networking Parameters | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Vendor | Name | Purpose | Picture | Process Size(NM) | Board Size | Bus Interface | TDP(W) | Memory Hierarchy Graph | Memory | 一级缓存 | 二级缓存 | PE层次架构图 | PE | Scalar Parameters | Vector Parameters | Tensor Parameters | 通信方式 | 端口数量 | RDMA协议 | 下行带宽 | 上行带宽 | |||||||||||||||||
Memory Type | Memroy Size(GB) | Memory Bandwidth(GB/s) | 缓存类型 | 缓存容量 | 缓存带宽 | 缓存类型 | 缓存容量 | 缓存带宽 | 算力架构 | Parallelism Mode | 通信带宽 | Scalar Precision | INT8向量算力 | FP16向量算力 | FP32向量算力 | Vector Precision | INT8向量算力 | FP16向量算力 | FP32向量算力 | Tensor Precision | INT8向量算力 | FP16向量算力 | FP32向量算力 | |||||||||||||||
Habana | Gaudi2 | Training/Inference | ![]() | OAM 1.1 | 600 | ![]() | HBM2e | 96 | 2450 | United Buffer | 48 | 11.2 | ![]() | 异构多核 | ||||||||||||||||||||||||
AWS | Trainium | Training/Inference | - | FHFL, Dual Slot Card | PCIe 5.0x16 | ![]() | 32 | 820 | - | |||||||||||||||||||||||||||||
AWS | Inferentia | Inference | - | - | - | |||||||||||||||||||||||||||||||||
AWS | Inferentia2 | Inference | - | FHFL, Dual Slot Card | PCIe 5.0x16 | ![]() | 32 | 820 | - | |||||||||||||||||||||||||||||
QUALCOMM | AIC100 | Inference | ![]() | 7 | HHHL, Single Slot Card | PCIe 4.0x8 | 75 | ![]() | LPDDR4x | 32 | 137 | ![]() | ||||||||||||||||||||||||||
Stream Computing | STC920 | Inference | ![]() | 12 | FHFL, Dual Slot Card | PCIe 4.0x16 | 150 | - | LPDDR4x | 16 | 119.4 | - | ||||||||||||||||||||||||||
Moffett | S30 | Inference | ![]() | 12 | FHFL, Dual Slot Card | PCIe 4.0x16 | 250 | ![]() | LPDDR4x | 60 | 246 | Distributed Buffer(x12) | ![]() | 异构多核 | SIMD/MIMT | |||||||||||||||||||||||
Moffett | S4 | Inference | ![]() | 12 | FHFL, Single Slot Card | PCIe 3.0x16 | 70 | ![]() | LPDDR4x | 20 | 82 | Distributed Buffer(x4) | 1.8 | 82 | 11.7 | ![]() | 异构多核 | SIMD/MIMT | 204.8 | |||||||||||||||||||
Moffett | S10 | Inference | ![]() | 12 | FHFL, Single Slot Card | PCIe 4.0x16 | 165 | ![]() | LPDDR4x | 40 | 164 | Distributed Buffer(x8) | 3.7 | - | 异构多核 | SIMD/MIMT | 409.6 | |||||||||||||||||||||
Graphcore | IPU C600 | Inference | ![]() | 7 | FHFL, Dual Slot Card | PCIe 4.0x16 | 180 | ![]() | 无片上DDR | Distributed Buffer | 900 | 65 | - | 同构众核 | MIMD | |||||||||||||||||||||||
NVIDIA | T4 | Training/Inference | ![]() | 12 | HHHL, Single Slot Card | PCIe 3.0x16 | 70 | - | GDDR6 | 16 | 320 | Cache(x40) | 2.56 | 4 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | A100 PCIe | Training/Inference | ![]() | 7 | FHFL, Dual Slot Card | PCIe 4.0x16 | 300 | - | HBM2e | 80 | 1935 | Cache(x108) | 20.736 | 40 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | H100 PCIe | Training/Inference | ![]() | 4 | FHFL, Dual Slot Card | PCIe 5.0x16 | 350 | - | HBM3 | 80 | 2039 | Cache(x114) | 29.184 | 50 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | A30 PCIe | Training/Inference | ![]() | 7 | FHFL, Dual Slot Card | PCIe 4.0x16 | 165 | - | HBM2e | 24 | 1223 | Cache(x56) | 10.752 | 24 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | A100 SXM4 | Training/Inference | ![]() | 7 | N/A | SXM | 400 | - | HBM2e | 80 | 2039 | Cache(x108) | 20.736 | 40 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | A10 PCIe | Training/Inference | ![]() | 8 | FHFL, Single Slot Card | PCIe 4.0x16 | 150 | - | GDDR6 | 24 | 600.2 | Cache(x72) | 9.126 | 6 | - | 同构众核 | SIMT | |||||||||||||||||||||
NVIDIA | H100 SXM5 | Training/Inference | ![]() | 4 | N/A | SXM | 700 | - | HBM3 | 80 | 3350 | Cache(x132) | 33.792 | 50 | - | 同构众核 | SIMT |
Compare details between different chips.
Feature | A100-SXM4 | Gaudi2 |
---|---|---|
厂商 | NVIDIA | Habana |
型号 | A100 SXM4 | Gaudi2 |
用途 | Training/Inference | Training/Inference |
接口 | SXM | OAM 1.1 |
内存类型 | HBM2e | HBM2e |
缓存类型 | Cache(x108) | United Buffer |
缓存容量(MB) | 20.736 | 48 |
算力架构 | 同构众核 | 异构多核 |
通信方式 | NV-Link | RoCE-v2 |