| Performance per dollar is getting faster and cheaper(wafer.ai) | |
| 281 points by latchkey 15 hours ago | 100 comments | |
tl;dr: Wafer benchmarked GLM-5.2 on AMD's MI355X (roughly 2.75x cheaper than NVIDIA's B300) and hit 213 tok/s single-stream and 2626 tok/s/node aggregate throughput—about 80% of B200 performance at less than half the cost. Getting there required MXFP4 quantization via AMD Quark, switching to sglang, and fixing two ROCm bugs blocking speculative decode plus tuning the MoE kernel selection, but notably no custom kernels. The takeaway: NVIDIA's CUDA moat is increasingly about day-0 model support rather than fundamental software superiority. | |
HN Discussion:
| |