News | drihu.com

By sva_, 16 hours ago

URL: arxiv.org

2 comments

By btilly, 10 hours ago

I think it would be very good if they can make this work. I suspect that we do something not entirely unlike this, and that is why spaced repetition is so good for stuffing things into our long term memories.

By iknownothow, 11 hours ago

> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states

Does this mean the models can be smaller too (on top of the primary benefit of being faster)?

By Lerc, 10 hours ago

Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)

Log-Linear Attention