* There's no mention of model performance on recall tasks. Models with attention do well on recall tasks, but models without it, like this one, tend to do poorly.[a] What is the performance of this model on recall tasks?
* As others here have pointed out, the github link is dead. In addition, the pretrained weights don't seem to be available on HF or anywhere else. It's hard to check any claims if we have neither code nor weights!
---
For what it's worth, RWKV's (another "transformer-less") website on that matter mention that yes it's bad on recall, but for the vast majority of tasks you can just ask the question *before* the content, and it'll handle the task just fine. (I'm just reporting, I haven't spent time trying myself)
* section 4.3, in both the paper you linked, and the paper we're commenting on. why did I notice that? it took me several minutes to understand "how the paper changed": it was 2 papers all along, I just switched tabs without realizing. And it's only Tuesday.
See "Long-Context QA tasks in Scrolls", I don't want to copy and paste the whole thing, I'll elide the words in between: "...long-context open-book question answering (QA), we use a simple prompt {CONTEXT} Q: {QUESTION} A"
n.b. It's literally the same eval in both papers :) I know, it's buried and non-obvious, it took me 20 minutes so far today to double check it 3x, even after reading it yesterday.
It would make more sense to create a new context every day and integrate it into the model at night. Or a every day a new context of the aggregated last several days. Giving it time to sleep on it every day and it being capable to use it the next day without it needing to get passed in the context again.
Now the repo has been re-opened at https://github.com/XuezheMax/megalodon
The model checkpoint is still under Meta legal review. We will release it once we get approval.
2. Find GitHub
3. Read Source
https://github.com/XuezheMax/megalodon Dead link
I have stoped reading the papers, I only care about working code that can be used. Arvix LLM papers have reached the level of academic mastrubation in general.
Fix the bad link and then we have something to talk about.
XuezheMax's GitHub profile shows that his contributions in April were all to private repositories.
It's only got 3 commits and 2 are from 18 hours ago--one of which fiddled the repo-made-public date and added the arxiv link--so perhaps there's a private working repo and they decided to make a ~clean public repo at the last minute (i.e., maybe the authors are self-conscious about the commit log).
> We are sorry for that.
> It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.
> We are currently completing this test quickly and then will re-release our model as soon as possible.
It took a bunch of detective work, for what should have just been a NOTE on the read me on the repo.
Is this why "science communicators" are needed?
https://xkcd.com/1254/ <<< very relevant.
This made me laugh so hard I hate it. I hate that it feels just like something I would do/have done.