167 points by amichail 13 days ago | 6 comments
cs702 13 days ago
The two issues I see with this work:

* There's no mention of model performance on recall tasks. Models with attention do well on recall tasks, but models without it, like this one, tend to do poorly.[a] What is the performance of this model on recall tasks?

* As others here have pointed out, the github link is dead. In addition, the pretrained weights don't seem to be available on HF or anywhere else. It's hard to check any claims if we have neither code nor weights!

---

[a] https://arxiv.org/abs/2402.01032

phh 13 days ago
> * There's no mention of model performance on recall tasks. Models with attention do well on recall tasks, but models without it, like this one, tend to do poorly.[a] What is the performance of this model on recall tasks?

For what it's worth, RWKV's (another "transformer-less") website on that matter mention that yes it's bad on recall, but for the vast majority of tasks you can just ask the question *before* the content, and it'll handle the task just fine. (I'm just reporting, I haven't spent time trying myself)

a2128 13 days ago
I thought the recommendation for long contexts with RWKV was to put the question after the content, otherwise it can forget the question
ronsor 13 days ago
This is no longer the case for RWKV-5/6
refulgentis 13 days ago
Section 4.3 addresses this, runs 3 benchmarks, tl;dr: 7B roundly beats LLamA 2 7B, almost same performance as LLamA 2 7B-L, which got an extra 500K of training tokens specifically at long context length.

* section 4.3, in both the paper you linked, and the paper we're commenting on. why did I notice that? it took me several minutes to understand "how the paper changed": it was 2 papers all along, I just switched tabs without realizing. And it's only Tuesday.

cs702 13 days ago
Those benchmarks are about long context. The question isn't about long context. It's about recall, or the ability to fetch and repeat parts of the input context. There's nothing about recall on section 4.3.
refulgentis 13 days ago
Do I have the wrong paper open again? :)

See "Long-Context QA tasks in Scrolls", I don't want to copy and paste the whole thing, I'll elide the words in between: "...long-context open-book question answering (QA), we use a simple prompt {CONTEXT} Q: {QUESTION} A"

n.b. It's literally the same eval in both papers :) I know, it's buried and non-obvious, it took me 20 minutes so far today to double check it 3x, even after reading it yesterday.

euclaise 13 days ago
This one does have attention, it's just chunked into segments of 4096
cs702 12 days ago
Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.
qwertox 13 days ago
I was just chatting with ChatGPT about unlimited context length, and even if you theoretically could archive to have a personal assistant this way, one which would know all your chat history, an unlimited context length doesn't seem efficient enough.

It would make more sense to create a new context every day and integrate it into the model at night. Or a every day a new context of the aggregated last several days. Giving it time to sleep on it every day and it being capable to use it the next day without it needing to get passed in the context again.

zingelshuher 13 days ago
If we can keep unlimited memory, but use only a selected relevant subset in each chat session. This should help. Of course the key is 'selected', it's another big problem. Like short memory. Probably we can make summaries from different perspectives on idle or 'sleep' time. Training into model is very expensive, can be done only from time to time. Better to add only the most important, or most used fragments. It likely impossible to do on mobile robot, sort of 'thin agent'. If done on supercomputer we can aggregate new knowledge collected by all agents. Then push new model back to them. All this is sort of engineering approach.
maxma1987 12 days ago
We are sorry that we temporarily closed the repo because we were unfamiliar with the code release policy from Meta. We had to re-organize a small part of the code.

Now the repo has been re-opened at https://github.com/XuezheMax/megalodon

The model checkpoint is still under Meta legal review. We will release it once we get approval.

YetAnotherNick 13 days ago
This model has attention, just the sequence in broken into chunks of length 4096 and the attention is only applied for the chunk. Llama 2 was trained on chunks of length 4096 so this model has the same quadratic complexity for any sequence which fits within llama 2 context size.
patrickhogan1 13 days ago
Show me a working request/response that does better than state of the art.
zer00eyz 13 days ago
1. Open paper

2. Find GitHub

3. Read Source

https://github.com/XuezheMax/megalodon Dead link

I have stoped reading the papers, I only care about working code that can be used. Arvix LLM papers have reached the level of academic mastrubation in general.

Fix the bad link and then we have something to talk about.

abathur 13 days ago
bitvoid 13 days ago
I think that is a fork before the actual repo was made private.

XuezheMax's GitHub profile shows that his contributions in April were all to private repositories.

abathur 13 days ago
Seems like a reasonable reading.

It's only got 3 commits and 2 are from 18 hours ago--one of which fiddled the repo-made-public date and added the arxiv link--so perhaps there's a private working repo and they decided to make a ~clean public repo at the last minute (i.e., maybe the authors are self-conscious about the commit log).

elvircrn 13 days ago
Perhaps it was made private as part of a conference submission process?
amitport 13 days ago
Unlikely. Having code somewhere is OK, you just don't refer to it in the submitted version.
maleldil 12 days ago
If you link to a private repository, your GitHub profile can still be seen, which violates anonymity.
dartos 13 days ago
I wonder what happened
13 days ago
lumost 13 days ago
This happened to WizardLM2 yesterday as well https://wizardlm.github.io/WizardLM2/
viksit 13 days ago
they released an announcement on this.

> We are sorry for that.

> It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.

> We are currently completing this test quickly and then will re-release our model as soon as possible.

https://x.com/wizardlm_ai/status/1780101465950105775?s=46

13 days ago
kristjansson 13 days ago
very happy my first step with any model with big claims is 'huggingface-cli donwload ...'
zer00eyz 13 days ago
Well that's interesting.

It took a bunch of detective work, for what should have just been a NOTE on the read me on the repo.

Is this why "science communicators" are needed?

https://xkcd.com/1254/ <<< very relevant.

marci 12 days ago
> https://xkcd.com/1254/ <<< very relevant.

This made me laugh so hard I hate it. I hate that it feels just like something I would do/have done.

padthai 12 days ago
It is up again, as why it was down: https://news.ycombinator.com/item?id=40061362
13 days ago