Request: EVA models suitable for speculative decoding.

#2
by SabinStargem - opened

I tested out the EVA 72b with a EVA 14b as the draft. It was slower than 72b alone, unfortunately. Odds are that we will need smaller EVA if we want to use speculative decoding for the big models. The v0.1 of EVA that is 7b, took too much memory, so the potential speed gain was lost. The experimental EVA-D doesn't have a matching vocab count, so I assume that KoboldCPP would refuse to use speculative decoding with it.

With that said, thank you for reading this, along with the models that have already been made. :)

I would also like to see this.

Sign up or log in to comment