Zum Inhalt springen

Details

Link to article: https://arxiv.org/pdf/2504.19874
Title: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Content: TurboQuant is a data-oblivious vector quantization method that uses random rotations plus near-optimal scalar quantization per coordinate to achieve near-information-theoretic distortion rates for MSE, across bit-widths and dimensions. For inner products, it adds a residual 1-bit Quantized JL stage to remove bias, and experiments show strong practical results in KV-cache quantization and nearest-neighbor search with essentially no indexing overhead.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.

In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

Verwandte Themen

Artificial Intelligence
Deep Learning
Machine Learning
Natural Language Processing
Neural Networks

Das könnte dir auch gefallen