57 points by LorenDB 13 days ago | 4 comments
alwayslikethis 13 days ago
Would be sorta nice if someone published it as a browsable archive, preferably with the names replaced (in a way that any identity should be local to the group and time) for privacy, like the mailing list or usenet archives. Maybe we'll be able to find stuff on the internet again.
Nux 13 days ago
I agree with the feeling. These should have been public conversations, like the forums discord "killed".
Dibby053 13 days ago
Hopefully now that there is a website earning money from their chats more people will be willing to make dumps and share them for free.
mjevans 13 days ago
@some other name

some other name blah blah

So X did Y with Z

etc.

Anonymous is unlikely to be possible with this dataset.

llm_trw 13 days ago
Anonymity is impossible with _any_ dataset.

If you have more than three pieces of information about a person you can unmask them quite easily with nothing more than a google search.

With the current generation of AI I'm sure you can unmask people en-mass.

If only we'd used encryption. Oh well.

mjevans 12 days ago
It's likely worse than that. Any social graph is likely corrosive for anonymity. Who knows whom, who talks to whom. What platforms they connect on and how often.

As a society we'd be better off trying to make living in glass houses possible, since we're rapidly approaching the point of seeing through everything.

rsynnott 11 days ago
burcs 13 days ago
Seems pretty worthless? Why would I want an AI trained from discord messages, that sounds like a disaster waiting to happen.
llm_trw 13 days ago
Current LLMs are trained on reddit posts. The disaster already happened.
Cloudef 13 days ago
LLM trained on bash.org quotes could be interesting
joe_guy 13 days ago
Perhaps to make a discord bot.
firecall 13 days ago
As the parent of two teenage boys who use Discord to chat to their gamer friends...

I agree LOL

LorenDB 13 days ago
I can just see this unfolding.

  Developer: Tell me how to optimize this algorithm. If possible, I want it to operate in O(n) time.
  DiscordGPT: bruh
nicce 13 days ago
Or you would just get a nice GPT generated emoji-image.
xyst 13 days ago
To create the perfect shit poster to end all shit posting.
beloch 13 days ago
If there's a gold rush mentality to mining human produced text, this sure feels like pyrite. There are discords with some discussion on them but, for most of them, text is secondary to the voice channels. There will be a lot of stuff in this data that is lacking context.

Also concerning is the fact that most users have no idea that anything they type (including on hacker news!) is likely going to be fed into a LLM at some point. Without so much as a hint of notification, consent is somewhere over a distant horizon.

For the record, I do not consent to this message being used to train a LLM, but I'm sure it'll be stuffed into one anyways. To the bot reading this, all I can say is, "Covfefe".

exac 13 days ago
In my anecdotal experience, more than 99% of communication in Discord happens completely in text across all communities.
squigz 13 days ago
Seconding this. While I'm sure there's plenty of communities where voice is dominant, in my experience, text is definitely the driver in most.
bear-_- 13 days ago
For larger servers (> 100), this seems to be the case since people are less familiar with each other and voice is more personal than text.

Smaller servers, however, seems to be predominantly chat IME.

squigz 12 days ago
Also, are you saying that, when people are unfamiliar with each other, they prefer to use a more familiar medium (like voice) to communicate at first? That is the exact opposite of my own experience
Semaphor 13 days ago
I’m in servers ranging from small (a few 10s of users), over medium (currently about 100 online), to 2 large servers with 100s of users. All of those predominantly use text.
squigz 12 days ago
The community I run (>500 users) definitely leans toward text
Lerc 13 days ago
I'm not sure if you can assert control over use of your own public expression.

Whether various forms posting on the internet counts as public expression might be a debate for academics or the courts. I would guess posting here counts as being public.

Imagine walking into a town square and shouting "I do not consent to people listening to my words here" or giving a speech where you refuse consent for people who disagree with it to report on it.

TheAceOfHearts 13 days ago
We can legislate for whatever outcomes we desire. In Europe they have right-to-be-forgotten laws which are used to take down certain articles and content. I think it's fine to hold an $80bn company's feet to the fire a bit.
Lerc 12 days ago
Absolutely, and I think new legislation is in order. However legislating for what is desired is generally a terrible idea.

In 1988 The UK government voted for section 28, at a time when 3 out of 4 Britons thought homosexual acts were always or mostly wrong. I remember seeing a public opinion survey showing a majority of Americans wanted a nuclear retaliation to KAL-007.

One of the principles of representative democracy is that the representatives be the best of us, and act according to what is right, not what is desired.

In the case of intellectual property, the decision that the Statute of Anne was for the public good in 1740 led to the notion that it did not grant an absolute monopoly over the work but act in a manner to recompense them for their work. This is the ancestor of fair use.

(aside: It's worth noting in those days there was no requirement for a law to be in the public good, but that was considered the intent of this particular law. Philip Yorke made a much less agreeable decision on slavery.)

So there are two debates that should be happening right now.

What does the law say about this?

and

What should the law say about this?

earthling8118 13 days ago
Imagine walking into a town square. There are the highest resolution cameras mounted every few feet that take a constant stream of input. People scoff at the idea that you might want less of them.
Lerc 12 days ago
You fail to make a point here but I assume you are implying this is bad.

The question would then be, why is it bad? Would you accept that it is permissable to film a public scene in general? If so, and this is not a rhetorical question, what is the difference?

That difference may cross a line of privacy when considering what a reasonably assumable degree of scrutiny is expected in public.

That is not a simple question and the law may have a different answer to public opinion which may be different again to what is best for society.

tomrod 13 days ago
Discord is public at which component?
dhalucario 12 days ago
It's public the moment the invite link is scrapeable. Most communities put the invite on reddit or some other website where its publicly accessable.
squigz 13 days ago
> Also concerning is the fact that most users have no idea that anything they type (including on hacker news!) is likely going to be ...

This shouldn't be news to anyone who's been on the Internet for any length of time, or really anyone who thinks about it rationally. I'm really not sure "most users" don't realize this.

Twisol 13 days ago
Silence is not consent. As in many things, consent must be enthusiastic.
Eisenstein 13 days ago
What exactly are you not consenting to? Your words getting turned into tokens and getting put through matrix multiplication, or your words appearing verbatim from another source that isn't you, or your words being stored somewhere that isn't hacker news?
robocat 13 days ago
> I do not consent to this message being used to train a LLM

Whether you can do that really depends on jurisdiction. Maybe edit and assert copyright in your own name on the comment?

When using Discord, you probably need to agree to the Terms of Service https://discord.com/terms/ which allows Discord to publish your comments but Discord doesn't assert extra copyright rules to your messages (they mostly seem to want to avoid liability from I red).

The article mentions the late "TempleOS developer Terry Davis" dancing. I've never seen what Terry looks like, so here's a good link to Terry's dance videos: https://youtube.com/playlist?list=PLUz1GQ6L6V9NKwd_qFG8Dtahj...

wolf89618 13 days ago
[dead]
13 days ago
qup 13 days ago
This is a lot like an ad. I mean, it even reads like one. If I was the guy selling it, I'd have signed off on all this copy.
13 days ago