Hasnain Reads

Hasnain says:

“Memory bandwidth is the limiting factor in almost everything to do with sampling from transformers. Anything that reduces the memory requirements for these models makes them much easier to serve— like quantization! This is yet another reason why distillation, or just training smaller models for longer, is really important”

Posted on 2023-08-16T03:56:31+0000