Comments on: NVIDIA GTC 2022 Keynote Coverage Crazy New Data Center Gear

By: Honcho

Honcho — Wed, 20 Apr 2022 06:23:50 +0000

Actually, A100 has 9.7TF of FP64 non tensor, so 30TF (which was available at the whitepaper at presentation of Hopper) is 3X from A100

By: Patrick Kennedy

Patrick Kennedy — Wed, 23 Mar 2022 15:31:59 +0000

TS – updated with how NVIDIA came up with 4.9TB/s

By: TS

TS — Wed, 23 Mar 2022 15:14:59 +0000

Moore’s Law is dead just leaked the DP number:

*Vector* Double Precision = 30TF! 50% Higher than A100, which now is the right number corresponding to exactly the 50% higher HBM3 bandwidth

*Matrix* Double Precision = 60TF, basically the same Tensor Fake Teraflops you should ignore

It’s sad that Nvidia is using 3TB+0.9TB+0.9TB+0.128TB to get to the 4.9TB, looks like H100 would be a transformer-only solution for now.

By: TS

TS — Wed, 23 Mar 2022 04:44:46 +0000

@Patrick:

Thought about it for about half a day, and realized the following:

The chip was probably designed for 4.9TB/s max(6 stacks of 819GB/sec HBM3 specification), but nvidia probably couldn’t fit 6 stacks of 819GB/s HBM3 into the 700W SXM5 socket TDP limitation, so nvidia took the easy way out and got 6 stacks at 600GB/s with 1 stack either disabled or for “RAID5” purposes for a total of 3TB/s usable.

The real question is this: with only 50% more memory bandwidth than a100, how did Nvidia manage to fit 60TF of Double Precision(3x boost compared to a100, and the rest of the specs are also across the board 3x?) The DP 3x boost is questionable, because it couldn’t be cheated on like having TF32 format reducing 32bits to 19bits.

Another thing is this: if the 700W SXM5 could only do 3TB/s, the 350w PCIe will probably do 2TB/s, and would have the same memory bandwidth as SXM4 A100s, and how much cut in performance will that incur?

I think Nvidia’s H100 chip is design ready at 4.9TB/s, HBM3 wasn’t, and we will probably have to wait for a HBM3 die shrink for both capacity and TDP improvement before we will see 4.9TB/s H100 SXM5 to land? I mean 3TB/s and 4.9TB/s is a generational leap.

By: Patrick Kennedy

Patrick Kennedy — Tue, 22 Mar 2022 19:10:31 +0000

In reply to TS. Have to wait a bit to see what the actual parts come in at.

By: TS

TS — Tue, 22 Mar 2022 18:37:43 +0000

Conflicting information during the GTC presentation:

Is the H100 4.9TB/s memory bandwidth or 3TB/s(3*8=24TB/s)? According to Anandtech, it is 3TB/s, which is a bummer if true, only 50% higher than A100