1 article
A team slashed their AI inference bill by more than half using a new routing layer, but these savings were tied to an unmeasured loss in quality.