Log-Linear Attention

URL: arxiv.org
2 comments

I think it would be very good if they can make this work. I suspect that we do something not entirely unlike this, and that is why spaced repetition is so good for stuffing things into our long term memories.

> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states

Does this mean the models can be smaller too (on top of the primary benefit of being faster)?

Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)