[PDG 485] TurboQuant: Online Vector Quantization with Near-optimal Dist. Rate
Details
Link to article: https://arxiv.org/pdf/2504.19874
Title: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Content: TurboQuant is a data-oblivious vector quantization method that uses random rotations plus near-optimal scalar quantization per coordinate to achieve near-information-theoretic distortion rates for MSE, across bit-widths and dimensions. For inner products, it adds a residual 1-bit Quantized JL stage to remove bias, and experiments show strong practical results in KV-cache quantization and nearest-neighbor search with essentially no indexing overhead.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.
In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.
