We actually recently did an independent implementation of this paper in our open-source optimizing llm proxy optillm - https://github.com/codelion/optillm/blob/main/optillm/plugins/memory_plugin.py
We were able to use it as a basis for the memory plugin in optillm that gives LLMs short term memory. It helps improve accuracy on long context retrieval and even enables LLMs to have unbounded context if needed.
We were able to match SOTA on a recent benchmark from Google Frames benchmark (https://huggingface.co/datasets/google/frames-benchmark) with only gpt-4o-mini v/s Gemini 1.5 Flash which has a context length that is 10x more.