Fresh Hacker News | Data centers contain 90% crap data

▲Data centers contain 90% crap data(gerrymcgovern.com)

224 points by billybuckwheat 2 days ago | 54 comments

▲danpalmer 2 days ago

> One organization I knew of had 1,500 terabytes of data, with less than 2% ever having been accessed after it was first stored.

On a related note, probably a similar percentage of people claim on their car insurance. If only the rest realised they had "crap insurance" and were paying for nothing, they could save so much money!

This is obviously sarcasm, but I think it's important to remember that much of the data is stored because we don't know what we will need later. Photos of kids? Maybe that one will be The One that we end up framing? Miscellaneous business records? Maybe those will be the ones we have to dig out for a tax audit? Web pages on government sites? Maybe there will suddenly be an interest in obscure pages on public health policy if a global pandemic happens.

Complaining that data is mostly junk is not a particularly interesting conclusion without acknowledging this. Is there wastage? Yeah sure, but accuracy on what needs storing is directly traded off with time spent figuring that out, and often it's cheaper to store the data.

▲ivraatiems 2 days ago

This is exactly the problem. Consider an order processing system that processes a million orders a day, and retains them for ninety days. What percentage of those 89-day-old orders are actually needed at 89 days old? It could be quite low, maybe a couple thousand out of a million.

But if those orders aren't there, shit hits the fan. PCI compliance audits fail. The ability for customers to reconcile their charges with their purchases breaks. In that 0.01% of cases where the order was fraudulent, placed by mistake, or just didn't have what the customer thought it had in it, not having that data makes the order processor read as, if not malicious, at least incompetent.

The real question is, how much data do we need to store inefficiently, in a way that uses a lot of power and space?

▲makeitdouble 2 days ago

> The real question is, how much data do we need to store inefficiently, in a way that uses a lot of power and space?

This is indeed the critical question, and it's far from being trivial.

One issue we all hit is moving the data from the higher tier storage to the cheaper and more efficient one, which requires sync and paying for the transfer most of the time, but also handling two separate access and authorization process, backup and recovery system for data that absolutely needs to be accessible for the few years of legal retention, and can/must completely disappear afterwards.

In most orgs I've seen the cost of going through all that complexity is just not worth it, compared to "just" paying for the higher tier storage for the few years long lifetime of the data.

▲newAccount2025 2 days ago

Seems like we should be solving this at the storage system layer, instead of per application?

▲numpad0 2 days ago

  probably a similar percentage of people claim on their car insurance. 
  In that 0.01% of cases where the order was fraudulent, placed by mistake, 
  how much data do we need to store inefficiently, in a way that uses a lot of power and space?

I'm feeling, the real real question, as Sci-Fi as it gets, is, is the winning ticket data even data, OR is it more like a thumbnail of "the whole data" that is 98%+ worthless, than standalone piece of data?

The winning ticket ID, e.g. a "0x-636e-7461-4e7b", only makes sense in the context as one among the entire cohort of contestants; I can make one up like I did, but I can't walkout with the payout unless the rest of the lottery didn't exist.

Statistically, philosophically, technically, and all sorts of *-cally speaking, is the 2% data, the winning ticket datum, even data?

▲ 1 day ago

▲AStonesThrow 2 days ago

I was just pondering this today, in terms of how much data and objects I create in a Google or Microsoft account, for example, and then they create a burden of cost and maintenance for me down the road. Especially deleting old emails and photos. That's arduous and sometimes poignant as I flush away my personal life story.

Cloud services make difficult and sometimes even byzantine processes for deleting stuff, and it's often impossible to operate en masse in order to clean up swaths of stuff quickly and efficiently. It's in their interest to retain everything at all costs, because more used storage can mean more profits. Cloud services also profit from unused storage, because if they're charging $20/year to 100,000 users who use 2% of their storage space, ka-ching!

It irks me to this day that standard or even advanced filesystems don't include "expiration dates" or "purge dates" on file objects. Wouldn't it be logical, if an organization has a "data-rentention policy" that mandates destruction after X date, that the filesystem simply purges it automatically? Why does this always get delegated to handmade userland cron jobs? Moreover, to my knowledge, nobody is really interested in devising a way to comb through backup media in order to [selectively] destroy data that shouldn't exist anymore. Not even the read-write media!

Google is now auto-deleting stuff like OTP SMS messages. I'd love it if auto-delete could be configurable account-wide, for more than just our web histories and Maps Timeline and stuff. Unfortunately, to "delete" cloud data means it still exists on backups anyway. But without deleting data in your cloud account, it becomes a juicier hacker target as it ages and accumulates personal stuff that shouldn't fall into the wrong hands. Likewise for any business, it behooves them to delete and destroy data that shouldn't be stolen. At least move it offline so that only physical access can restore it?

I will say that modern encryption techniques can make it easy to "destroy" data, simply by destroying the encryption keys. You can quickly render entire SSDs unreadable in the firmware itself with such a command. Bonus: sometimes it's even done on purpose!

But even deleting data presents a maintenance cost. So if 90% of an org's data is indeed crap, then 90% or more of your processing resources are going to be wasted on sifting through it at some later date. Imagine when your file formats and storage devices are obsolete, and some grunt needs to retrieve some record that's 30 years old, and 90% of your data was always crap. That grunt is hopefully paid by the hour. We really had this happen at a few of my jobs, where we had old reel-to-reel backup tapes and it was difficult enough to load the data into a modern SunOS machine.

https://m.xkcd.com/1683/

▲jl6 2 days ago

Unfortunately a lot of data retention policies aren’t so mechanical, but are of the form “delete after 7 years unless there is a legal hold in force”, which is usually just rare enough of an edge case that orgs evaluate it manually and hence only do periodic manual purges. But probably the main reason auto-delete isn’t popular is because a process that can delete your old data is one bug/misconfig away from deleting your new data too.

▲faust201 2 days ago

> Especially deleting old emails and photos.

Google or apple can put a big button delete everything in their phones/accounts but then some prankster will do it to a family member and this gives bad PR. Let's be pragmatic.

> It's in their interest to retain everything at all costs, because more used storage can mean more profits.

As an user, I get the reverse. When I had local NAS then I dumped anything and everything - assuming.. this costs not much. Will clean up later. Once I moved to cloud that changed to If I put crap then it will cost me money! Keep it clean

Once upon a time Google had given generous unlimited to all education and workspace. They stopped it in the last 2 years and you can see most educational and companies are running a tight ship.

Due to backup costs in our organisation we are forcing people to use max of 100GB for emails.

> advanced filesystems don't include "expiration dates" or "purge dates" on file objects. Wouldn't it be logical, if an organization

Totally agree.

Outlook email service has some kind of keep only the last newsletter from this service etc.

> auto-delete could be configurable account-wide, for more than just our web histories

- features are designed with majority in mind - most users have some sort of nostalgia for reading or keeping old emails SMS etc

▲ghaff 2 days ago

Assuming your somewhat methodical about labeling email in your non-primary tabs in Gmail, you can pretty much do periodic purges while not worrying much about your primary which, in my case at least, gets relative little traffic.

▲zanecodes 2 days ago

I would love a feature like retention policies at the filesystem level. If it could somehow take data provenance into account, that would be even better, i.e. data that I've created directly is irreplaceable and should never be deleted unless I say so and should always be backed up (photographs, writing, project source code, art, game save files, my list of installed applications, etc.); data that I've created indirectly is high priority but may be deleted under certain circumstances or may be omitted from backups (browsing history, shell history, recently opened file lists, frequently used app lists); data that can be easily replaced is low priority and may be cleaned up at any time and need not be backed up at all, contingent on how replaceable it is (application files, cached data).

▲palata 2 days ago

Not at the filesystem level, but Android does something that feels like that: an app can write files in its own space, and when you delete the app, it deletes its files.

Except when the app writes files outside of its own space, which is meant for stuff that should stay (like pictures).

Of course, some apps store the pictures in their private space, and you lose the pictures when you remove the app. And some apps write crap in the shared space. But that seems like a fundamental limitation (even if it was done at the filesystem): how do you make sure that the apps/software you use is doing it right?

▲AStonesThrow 2 days ago

See now, that's quite unrelated what I'm envisioning, or the topic of the OP's article, but that's definitely the way we're headed in terms of file handling.

"File-type strictly associated with app" is good for security and it's good for people's mental organization. Unfortunately you're referring strictly to local file storage, I assume, and practically nothing on my phone is in local storage but "in the cloud". The only local storage I use is for ringtones and certain audiobooks/prayers which I enjoy listening whether or not I'm connected to a network.

It would seem that data centers and cloud services are a long way from "file types strictly associated with app" and also the sort of behavior you describe would be really horrible and undesirable in a typical data-center setting. I mean, I can't even imagine how it'd be implemented, given that data storage is often remote or at least decoupled from the "application server", and it's definitely never a 1:1 relationship between those abstracted components.

Can you imagine needing to "delete the /bin/tar application" in order to purge every file ending in ".tar" or ".tgz"?

But it's true that the new mobile OS paradigm has advantages for cleanup. If you've got a 3rd-party app and you've created some big crap files with it, and you uninstall that app, then it's a clean sweep when it takes along all the crap and you don't need to worry about lingering loose ends, or files you simply can't open, because their handler's gone.

Scoped storage is way more secure; it's also wonderful on my Androids that I can selectively grant access to files and media, rather than giving "the keys to the kingdom" to all these Social Media apps. I definitely never want to post my bank screenshots to Facebook and the possibility no longer looms!

These days I am sort of nonplussed when a web app demands an all-access pass to my Google Drive, because permissions can be appropriately scoped now, and the app can play in its own Drive "Sandbox" without having access to my spreadsheets and medical records. I just wish that there were some really good "cleanup" apps. Remember those "cleanup" programs that would come with antivirus for your PC, or Regclean, or whatever freeware utility would purport to liberate your 10MB HDD? Those were good times.

It's the same with social media posts--I often wish to really scrub my account, delete all posts and all comments and basically all activity, and while that's possible on Facebook it's arduous and it takes me weeks to scrub the major stuff after 4-5 years. But I only think of it on Shrovetide Monday or Fat Tuesday, and then I spend 3 weeks in Lent doing scrub, scrub, scrub. Sometimes the mouse/keyboard gestures are more repetetive & physically demanding than cleaning the toilet. Way too difficult, and that's by design.

▲nn3 2 days ago

I suspect for most cloud providers you deleting data is cheaper because the data is not charged by the byte. But then they like having data, maybe just to train their AI models or for bragging rights to their investors.

For the expiration dates most modern file systems have the concept of arbitrary extended attributes per file. It's quite easy to add meta data like this yourself.

▲AStonesThrow 2 days ago

Unused services are always pure profit, and storage space is no exception. Providers can offer 100GB for like $24/year, because only like 2% of the subscribers will ever approach the limit, and so the extra space is never wasted but can be allocated to someone or something else.

It's like gym memberships, ISP/telco service bundles, and amenities at your apartment complex. Anyone not using every possible service is wasting money, but it's impossible to purchase a bespoke service, so essentially everyone will waste money because they're chipping in money for services that someone else uses more than they do.

Here at home I don't ever use the gym, the racquetball courts, the doggy-doo supplies, the laundry room, or a parking space, and yet my rent (everyone's rent) includes upkeep for all of those things. I'm subsidizing all my neighbors and all the wear-and-tear damage they put on those common amenities. Likewise, everyone who's paying $24/year for storage, or any business that purchases big multi-terabyte storage media, they're paying for unused storage space and giving profit. It's practically impossible to rightsize your storage media, and you never want it undersized, and you can't simply shrink them and reclaim the resources you invested, you just keep adding on new ones and replacing the malfunctioning ones. So nearly everyone always owns or rents more space than they can realistically utilize.

Furthermore, you'll notice that I specified "automatic" destruction of data by expiration date. Of course it is trivial to tag any arbitrary file with arbitrary metadata, but the challenge is to create a filesystem that executes automatic data purges on schedule, rather than pushing it into a rickety old handmade cronjob in userland. I've never ever seen a filesystem with such a feature, nor does it seem that anyone's interested in doing so.

And here I thought that computers were useful for automating business logic and easing the burden on human effort. And this is me, manually sifting through emails and photos in order to manually delete each one with 3 dialog boxes intervening. It takes hours, days, weeks.

▲bob778 2 days ago

There’s a whole concept of records management in enterprises that manages the disposal of data. It’s far more complex than just purge dates as there’s often regulatory requirements and legal discovery issues so <2% of data is actually disposed due to perceived risk.

For personal data, the concept would be simpler but still has requirements like say tax records need to be kept 7 years.

▲econ 1 day ago

Maybe park the data with the regulator as leaving it in your hands longer than you need it is asking for trouble.

▲econ 1 day ago

Ah yes, when doing the manual Cron job I often say out loud to myself this is not why we made computers!

There is no clear path forwards. Perhaps regulation could be a solution but I doubt it would turn out nice.

▲InsideOutSanta 2 days ago

The article touches on this issue by pointing out that:

>The Cloud is what happens when the cost of storing data is less than the cost of figuring out what to do with the crap

But I think that's wrong. The actual issue is that you often can't figure out "what to do with the crap" because the difference between useful data and crap data is determined at the point in time when you need it, not when you store it.

I'm relatively careful with deleting data, but even so, there were countless instances where I thought something was no longer needed and deleted it, only to figure out that, a month later, I needed it after all.

Also, the article has a few anecdotes like this:

>Scottish Enterprise had 753 pages on its website, with 47 pages getting 80% of visits

But that is completely orthogonal to the question of what is "crap." The fact that some data is rarely needed does not mean it's crap. If anything, the fact that it is rarely needed might make it more valuable because it might make it less likely that the same data can be found elsewhere.

▲ghaff 2 days ago

I do cull photos and docs but it takes effort and you will make mistakes. On the other hand, intelligent culling makes it easier to find things.

It’s a trade off especially with digital where there’s not that much cost associated with holding onto things relative to the physical world. My whole house is basically in storage because of a kitchen fire and I’m planning to be pretty selective about what I move back in.

▲InsideOutSanta 2 days ago

>intelligent culling makes it easier to find things

That's true, but things get easier to find over time. I have thousands of digital photographs from the 1990s that I burned on CDs and then copied to a NAS later. Today, for the first time, they're actually searchable because they're now indexed in Immich. So they're sorted by the person in them and searchable by their contents.

If I had culled the photos back then, I would have lost many photos that have become searchable (and thus valuable) only recently.

▲ghaff 2 days ago

And geoencoding on especially phone photos makes manual metadata entry less important. I still there’s value to culling redundant and bad photos.

▲mrexroad 2 days ago

I have ~2.5TB of photos in iCloud, via Apple Photos. Excluding the various sized previews, I doubt many originals have been accessed in quite a while. I also have about 1TB Lightroom library archived to a different service; representing countless hours of photo processing work spanning over a decade. Haven’t touched that one in years. Neither are crap. (Yes, both have other backups and, yes, I’ve probably forgotten to sync one of them).

▲eloisius 2 days ago

Sometimes I wish there was a feasible system to audit and reduce redundant photos in my iCloud. I have an embarassing ton of pictures that I never look at just like everyone else. I wouldn’t mind a tool that said “here’s 10^x photos that are the most similar to all your other photos would you like to put them in the trash?”

▲mrexroad 2 days ago

If you’re using Apple Photos, the feature is there. It will detect both exact copies as well as (nearly) identical visual duplicates. Look under utilities.

▲eloisius 2 days ago

Maybe it's different on the desktop, but on the iPhone it only detects the same photo that may have been saved at different resolutions or things like that. That's useful, but I really want something that can say use an embedding or something to group all 10 photos I shot of friends around the campfire and help me select one keeper and delete the other 9.

▲mrexroad 1 day ago

The feature works the same on iPhone. Looking through my duplicates just now, the detected duplicates that had a dozen or so photos to merge, are either high speed sports photography bursts or astrophotography—the rest were just mostly two photo duplicates. I suspect the threshold for determining a the photo as duplicate is too high for successive* photos to be considered such, as with your campfire example.

With that said, I’m surprised Apple hasn’t implemented a feature to group/bundle successive photos beneath what is determined to be “best.”

[*] Note: successive photos being separate shutter taps/actuations (potentially several seconds apart), where bursts are continuous (generally as fast as hardware allows, ms apart).

▲makeitdouble 2 days ago

Thing is, the tool needs you to pick the one instance that survives the purge. Which means looking through 10^x of mostly similar photos to pick one.

Removing duplicates isn't that complex already, and these tools exist so anyone can try it. It's just a truely grueling process.

▲secabeen 2 days ago

Yeah, there have been a number of companies that say they'll offer AI Photo Culling, but I have yet to find one that actually does it.

▲mattkrause 1 day ago

Can you recommend any particular tools to do this?

▲econ 1 day ago

I download them 200 at a time then delete the selection.

▲prawn 2 days ago

I have about the same in iCloud and then dozens of drives of video footage. Beyond the "keep it in case you need it" aspect, another angle is that it's often cheaper to just keep everything than spend all the time it would take to do a thorough cull. Video is much slower to review, but also chews up far more space, so maybe that balances out.

▲snapplebobapple 2 days ago

Crap is in the eye of the beholder...

▲mlinhares 2 days ago

If I knew the data i was going to need in the future i'd be in the future predicting business.

But there's an important piece there that is about data that should not have been stored in the first place. All the big data bullshit made people believe every single piece of data you could possibly store from a user is data that you should store and this is both a huge waste of resources and a huge liability, because now a data leak with useless to you PII could completely bankrupt your business.

▲1vuio0pswjnm7 1 day ago

"Photos of kids? Maybe that one will be The One that we end up framing? Miscellaneous business records? Maybe those will be the ones we have to dig out for a tax audit? Web pages on government sites? Maybe there will suddenly be an interest in obscure pages on public health policy if a global pandemic happens."

Perhaps I am missing something, but these examples all sound like candidates for _offline_ storage, for which no third party custodian or data center is required.

The price of large capacity NVMe SSDs continues to fall.

The amount of energy, the resource requirements, not to mention the environmental and community impact, of running a car insurance company are miniscule in comparison to that of running a data center.

▲wongarsu 1 day ago

Offline storage is only easy on the very small and the very large scale. On the very small scale you can get archival-grade CD/DVD/Blurays. On the very large scale you dedicate climate-controlled rooms to an automatic tape library.

In between is a vast gulf where your only good options are disks that you have to occasionally spin up to check integrity, have the disk trigger a data refresh if it's an SSD, and replace any disks that failed and rebuild their data from redundancy (raid or whatever you prefer). HDDs die even when off [1] and flash storage in SSDs is only rated to hold data for three years without power.

Sure, you can roll this yourself, but it can easily go wrong. Easier to either keep the disks spinning or pay someone with a tape archive (e.g. AWS deep glacier)

1 https://www.tomshardware.com/pc-components/storage/twenty-pe...

▲jocaal 1 day ago

For your car note, insurance is only worth it if the thing being insured can ruin you if something were to happen to it. In the case of cars, you can potentially get ruined from the value of the other party's car. But if you live somewhere where most people drive normal cars, it might be worth it not having insurance. Our culture of insure everything, from your iphone to house is a market failure. If your house were to burn down, the value of the asset is still mostly concentrated in the land.

▲Pooge 2 days ago

Are you going to be able to find the data you're looking for if judges demand it?

Data that ends up in that sea of crap is very often poorly labeled.

Data that you cannot find again is useless.

If you take more than 3 minutes to find a picture you wanted to show me, it doesn't deserve to be showed anymore.

▲userbinator 2 days ago

"Better to have and not need, than need and not have."

Storage is cheap, very cheap.

▲mft_ 1 day ago

> …I think it's important to remember that much of the data is stored because we don't know what we will need later.

This is the basic argument that every hoarder uses to justify their hoarding.

▲jongjong 2 days ago

True, when it comes to data, there is a strong case to be a compulsive hoarder. It's cheap to store; often free and data is often marginally more useful and interesting than most generic mass-produced old junk which sits in garage gathering dust.

▲InsideOutSanta 2 days ago

Technology for sorting through unordered piles of data also gets better over time. When I started storing digital photographs, face detection wasn't a thing, but it is now, so those tens of thousands of photos from the 90s suddenly became more useful to me.

▲palata 2 days ago

I think you miss the point of the article, which precisely says that we store a ton of data we don't need to store in the first place.

Photos of kids are obviously not part of that "crap" (even if we have too many of them and it would be worth triaging for our own sake).

The question is: what makes for most of that data? Is that all business records, or is that storing stuff because we can? I've worked in multiple startups, all were tracking users as much as they could. Adding tools that collect data is easy, and storing that data is cheap. "We may need it later". Never needed any of that crap, and it was invading our users' privacy.

In any case, I think the article raises a good point: it's so cheap to store crap that we don't even think about it. And it's bad for the environment. Just like it's so cheap to take a plane that we don't think about it, even when taking the train takes almost the same time.

▲2muchcoffeeman 2 days ago

Photos of kids? Maybe that one will be The One that we end up framing?

Look through the pictures with your kids (and with your kids) and pick out the best ones. Delete the rest.

▲stephen_g 2 days ago

I go back and look at old photos all the time (like when I think of a trip, and look back to it, and then just keep scrolling for a while). The pictures bring back more memories and that's to me a great joy - and on the other hand I find no value in deleting the photos that are not "the best" (I'm talking about good enough photos, I delete blurry, useless or duplicate ones immediately) - what would it gain? A few dollars less storage cost over the next decade? The satisfaction of digital cleaning?

▲Double_a_92 2 days ago

The problem with that is that what you might find important or beautiful changes with time.

That random photo of your messy living room might be trash right now, but beloved in 20 years when you want to remember your old home.

▲philips 2 days ago

The photo apps really don't have a good "sorting through the shoe box" UX. It would be great if I could in Google Photos, for example:

1. Bulk favorite photos in Google Photos for long term archiving

2. Set a retention policy of months/years for photos based on metadata.

3. Have a UX to quickly sort the week or months photos via a swipe left/right UX.

▲foobahify 2 days ago

[flagged]

▲manyturtles 2 days ago

About a decade and a half ago I worked on a large data migration project at a FAANG. Multi-exabyte scale, many clusters across many countries. Once everyone was moved the old storage platform wasn't completely empty, because the number of migrations was large and users were (naturally) more focused on ensuring their data was in place and available on the target platform rather than ensuring every last thing was deleted on the legacy platform. We weren't initially concerned about it because it would all get deleted when we turned down the old setup.

As we were gearing up to declare victory and start turning down the several dozen legacy storage clusters someone mused that given some users were subject to litigation holds -- not allowed to delete any data -- that at least some of the leftover data on the old system might be subject to litigation hold, and we'd need to figure that out before we could delete it or incur legal risk. IIRC the leftover 'junk' data amounted to a few dozen petabytes spread across multiple clusters around the world, in different jurisdictions. We spent several months talking with the lawyers figuring that out. It was an interesting dance, because on the one hand we were quite confident that there was unlikely to be anything in the leftovers which was both meaningful and not migrated to the new platform, while on the other hand explaining that it wasn't practical to just "go and look" through a few dozen PB of data. I recall we ended up somewhere in between, coming up with ways to distinguish categories of data like caches and working data from various pipelines. It added over six months to the project, but was quite an interesting problem to work through that hadn't occurred to any of us earlier on, as we were thinking entirely in technical terms about infrastructure migration.

▲isaacremuant 2 days ago

That does sound very interesting. Any insights on what would you do differently if you had to do it again? Any way to accelerate things now that you know the pain or do you think it's quite unavoidable and "legal Time"?

▲manyturtles 2 days ago

Doing it in parallel, which of course is only an option if you know about it. And templating that into a general approach and having legal folks sign off on that.

▲mrb 2 days ago

Some fun math: according to some estimates there is 175 zettabytes of data worldwide. Assuming 20-terabyte harddrives, this could be stored on 8.8 billion drives. Assuming 10 drives per rack unit, 42 RU per rack cabinet, 16 square feet per cabinet (including aisle space), that means you need about 330 million square feet of data center space to host this data. If it was hosted in a single square data center, it would be 3.5 miles long and wide. (I always like to picture the physical space something would occupy.) And energy wise, assuming 5 watts per drive, it would consume 44 gigawatt, so it could be powered by about two large hydro dams similar to the Three Gorges Dam (22 gigawatts capacity). I am assuming a PUE close to 1.0 for simplicity. Of course one would not be able to spin up all these drives at once, since a drive spinning up consumes about three times more power (15 watts). So you would definitely staggered spin up when booting the servers :-)

If 90% of this data is "crap" and could be cut down, it would still be just a drop in the bucket compared to worldwide energy use.

▲vb-8448 1 day ago

Furthermore: a big amount of this data is stored on physical tapes and not hard drives. A LTO-9 tape is able to store up to 45TB of data and doesn't require any electricity when not mounted to be read.

▲andyp-kw 2 days ago

I wonder how it compares to electric car usage, or street lights in the USA.

▲bnewbold 2 days ago

I agree with the general sentiment here, but don't like the examples. 200 photos per person per year isn't very much! That is all fine.

What really bloats things out is surveillance (video and online behavioral) and logging/tracking/tracing data. Some of this ends up cold, but a lot of it is also warm, for analytics. It bloats CPU/RAM/network, which is pretty resource intensive.

The cost is justified because the margins of big tech companies are so wildly large. I'd argue those profits are mostly because of network effects and rentier behavior, not the actual value in the data being stored. If there was more competition pressure, these systems could be orders of magnitude more efficient without any significant different in value/quality/outcome, or really even productivity.

▲boznz 2 days ago

Don't forget emails.. I have everything I ever sent or received, and I have it backed up. I expect 90% of my inbox is the jpg signature logo they attach to the bottom of my clients email rather than hyperlink.

▲duxup 2 days ago

I ended up working on some software and I was deemed the email guy (it's a very small % of my job but it is the biggest pain).

"I need an email when this happens.. and when this happens."

The requests are endless and I'm convinced there are people who if they could would do their entire job from their inbox and get everything and anything an application can do via email.

The insidious problem is that it never solves anything. "I didn't get the email!" is a constant refrain (yes they did, they always did). "Oh someone didn't do the thing so can you add this to the email too." and so on.

It is such an abused tool.

▲mrweasel 2 days ago

We had a similar request for a sales team once: "If this fails we want an email".... Okay, but we never seen this fail, so you'll never get the email. As expected, they never got an email, because no failures. So instead they wanted an email every time the job ran (once every night), so they'd know that the job had failed, if they DIDN'T get the email. Only time that happened was because the email got trapped in a spamfilter I think.

There must be thousands of copies of that email sitting in inboxes say: Job X ran successfully @ 04:30.

▲Dylan16807 2 days ago

> The requests are endless and I'm convinced there are people who if they could would do their entire job from their inbox and get everything and anything an application can do via email.

That sounds like a reasonable goal for a whole lot of job duties. And yes some entire office jobs. (Excluding some direct human communication but a lot of jobs already have too much of that in situations that could have been an email.)

> "I didn't get the email!" is a constant refrain (yes they did, they always did).

Well having to manually check wouldn't improve that, would it?

▲Cthulhu_ 2 days ago

I can see how that would work (email centric workflow), not unlike how some people now try to have a chat / Slack centric workflow.

▲saalweachter 2 days ago

Yeah, everyone knows the proper thing to do is orchestrate it all through Emacs.

▲jeffbee 2 days ago

Well "the cloud" will generally store exactly one logical copy of your static jpg, one of the reasons why clouds are pretty good for efficiency.

There is really no sense at all in the article's claims that "we are destroying the environment" to do x y and z thing the author whines about. We are destroying the environment to drive a Dodge Ram to the Circle K to buy a 52oz Polar Pop. Information sector doesn't even show up on top ten lists of things that are destroying the environment.

▲HPsquared 2 days ago

It's probably deduplicated on the server though, so the millions and millions of messages with that logo likely share the same piece of disk space. Probably one reason why free providers don't tend to offer End-to-End encryption. It prevents deduplication (and otherwise compressing redundant information).

▲palata 2 days ago

> Probably one reason why free providers don't tend to offer End-to-End encryption.

I would think that email providers tend to not offer E2EE just because it fundamentally isn't practical with email. Providers like Proton try to do it, but it works only if you talk to someone else on Proton (at which point you may as well do it on Signal, which has better encryption).

▲justsomehnguy 2 days ago

Nope.

Back in the day Exchange offered SIS but in 2010 they ditched it. It's plainly not effective any more. Even regarding the OP' "the jpg signature logo" - it's a part of multipart in the message, not a separate file.

And one more thing - you can't just turn the dedup and be dandy, now you need to check against the hashes to determine if this chunk is unique or you already have it. And with TBs of data you need TBs of hashes to check. Until you have like 99% dedup efficiency, ie 99% of your incoming data is literally the same data you already have - it doesn't worth it.

https://techcommunity.microsoft.com/blog/exchange/dude-where...

▲Sprite_tm 2 days ago

Not sure about Microsoft solutions, but modern file systems like zfs and btrfs can do this on the filesystem level, no support from the mail server needed.

▲tored 2 days ago

NTFS and ReFS has both deduplication support (only on Windows Server).

▲LegionMammal978 2 days ago

Also, I can imagine such schemes getting ditched for the same reasons of "now people can detect whether we already have a copy of the file, which is a privacy hole!" that killed cross-origin caching in browsers.

Or if everything in the year 20XX gets pushed into using E2E encryption, since that's pretty much antithetical to deduplication.

▲hobs 2 days ago

That's not even crap data - that's archived data that might be useful someday (though de-dupe is probably a great idea and email sigs are definitely wasteful trash) - most of the crap data is things that would never have been useful under any circumstances.

I have cleaned up dozens of product databases in cost management efforts and have found anywhere from 50-99% of data stored in product databases is crap, because they are not well managed and any single mistake can lead to a huge outsized impact on storage.

What to log all those http requests for just a day? Might as well turn that on for all time...

▲kdamica 2 days ago

To repurpose an old saying: "90% of my data is crap. I just don't know which 90%"

▲somat 2 days ago

Isn't this just a specific case of sturgeons law?

https://en.wikipedia.org/wiki/Sturgeon%27s_law

▲BrenBarn 2 days ago

Yeah it seems like Sturgeon's law combined with a bit of Pareto principle.

▲foobahify 2 days ago

[flagged]

▲precommunicator 1 day ago

We recently discovered we store 500MiB of email tokens (expired). Then a copy of them in history table. Then this data is replicated. Then there are total of 4 backups of it. Backups that are done every 2 hours for half of each day. And we store those backup for up to half a year. I don't even want to add that up...

▲392 15 hours ago

If you believe adding it up linearly is valid, and you care about the results, you should be using more efficient storage software.

▲precommunicator 4 hours ago

It's "nice go have" i.e. cheaper to just pay for more storage than do it properly

▲hoseja 2 days ago

"We’re destroying our environment to (...)"

No we're not. I really dislike this "environmental" anti-technologist angle. A single steel plant in china has tenfold "environmental impact" than all photos stored on a platter everywhere.

Would you prefer the photos are a cocktail of weird chemicals on a negative and printed on glossy photo paper?

Digital data is the most ephemeral we are able to make it through vast effort.

▲tbrownaw 2 days ago

Storage being cheap enough that it's not worth policing doesn't seem very consistent with it being expensive enough to include much energy use (what I assume the "destroying the environment" hyperbole is referencing).

▲Jean-Papoulos 2 days ago

The other day I had to go through 15 years old PowerPoint files to grab the originals of pictures made by a guy extremely proefficient at creating detailed artwork from PowerPoint forms that's now retired. Was can now render them to full-hd PNGs instead of the 256x256 BMP files we were using before.

Storing "useless" data makes financial sense.

▲ein0p 2 days ago

A gross underestimation, IMO. When I was in big data, fewer than 5% of data written was ever touched again, and only a single digit number of our large customers (out of tens of thousands) actually made real use of their "big data", and created most of the load. That's the trouble with "checkbox driven development" - 10 years ago you were required to have a "big data strategy" for anyone to take you seriously, even if your strategy boiled down to just ETL-ing a bunch of crap you're never going to need into the cloud and never touching it again. Now I'm in AI, and the same thing is happening to AI. It's great if you're selling shovels, so to speak, but not so great if you plan on selling them for an extended period of time.

This, by the way, has implications on storage systems design. You want something that's cheap yet dense to encode, potentially at the slight expense of decode speed. Normally people really lose sleep about decode speed first and foremost, which, while important, does not minimize the overall resource bill.

▲otterley 2 days ago

90% of libraries consist of books that are never opened. These books were all produced by destroying and processing trees, sometimes with toxic chemicals, and their information density is orders of magnitude lower than that of a hard disk or SSD. Same with photo processing, where 90% of photos taken are discarded, and the toxicity of the chemicals is even higher.

So the question isn't simply whether storage is wasted; it's how much waste there is relative to the environmental impact. Granted, books and photographs don't need to be continuously fed energy to make the information available. However, the cost of storage is now so cheap that even with 90% waste, it's economically viable to keep it online. So the problem, if you can call it one, is that energy is too cheap, and externalities are not accounted for in the cost.

▲plasticeagle 2 days ago

> 90% of libraries consist of books that are never opened

I'm reasonably certain that this statistic is completely made up. The best number I can find for the proportion of library books that are never borrowed was from a university library, and was 25%.

▲mvc 2 days ago

> 90% of libraries consist of books that are never opened.

Citation required. But don't bother because it's a meaningless statistic, or at least one designed to make it look like there's a lot more wastage in libraries than there actually is.

The statistic could be true, and yet still be the case that the vast majority of library books are well utilized.

▲otterley 2 days ago

I should have been less specific than “library” as such. Libraries aren’t only institutional. Consider every book, magazine, and newspaper that sits today in everyone’s home, office, and in every institution in the world. The vast majority of them have been read once and then left on a shelf or in a box somewhere, taking up more or less valuable space.

▲vel0city 1 day ago

> 90% of libraries consist of books that are never opened

> The vast majority of them have been read once and then left on a shelf

So opened at least once, with once being higher than never.

▲otterley 1 day ago

Never opened again after the first time—which I think most readers implicitly understood. Regardless, that’s my bad.

▲mrweasel 2 days ago

Now we're also getting into the topic of "what is waste". Because the majority of the books that are opened are rehashes of the same murder mystery over and over and over.

I'd guess that 75% of all new books sold here are variations on "Someone is murdered in a brutal fashion. A old drunken cop from somewhere in Scandinavia is assigned the case. He's helped by a young woman, who may be his daughter or who he'll a father daughter relationship to. They solve the case, maybe, the end. You just tweak the details a little, but it's the same bloody story over and over.

That seems like such a waste of paper in my mind.

▲S_Bear 1 day ago

As a librarian, that's only around 35% of new books. The rest are about a hard-working professional woman in the big city that inherits her dead aunt's coffee shop in Cape Cod and has a prickly love-hate relationship with the local handyman. There may be a magic cat involved. And books about vampires having love triangles with a werewolf/zombie/black lagoon monster and a Mary Sue insert.

▲foobahify 2 days ago

Yes. Rather than compare to paper, compare to yesterday's HDDs. Do we need more HDDs today than then?

▲teleforce 1 day ago

> The Cloud made the crap data problem infinitely worse.

This article is mainly focusing about the unused data by website and enterprise databases, only toward the end of the article it barely touched upon "the elephant in the room" of data in cloud.

Now everywhere in the world data centers are being built at breakneck speed to cater for the AI data modeling, training and serving. Most of the AI based data are being kept in datalake in the form of raw data that will probably never see the light of that day i.e never being processed.

Bill Inmon warned us against this potential data swamps in data center due to the increasing popularity of the datalake [1].

Hopefully open table format like Apache Iceberg can rectify this unused raw data epidemic but time will tell [2].

[1] Lakehouses Prevent Data Swamps, Bill Inmon Says

https://www.datanami.com/2021/06/01/lakehouses-prevent-data-...

[2] What Are Apache Iceberg Tables and How Are They Useful?

https://www.snowflake.com/guides/what-are-apache-iceberg-tab...

▲nameless912 2 days ago

There's another dimension to this, that storage is so cheap that being wasteful with it isn't really disincentivized. I know for example at work of a portal that accepts uploads of large files from external clients that stores both the initial upload and every subsequent transformation of the file (of which there are 4-6) permanently. It's extremely useful for debugging, as one of the bits of metadata we shove on the zip archive is the git hash of the code that was running, so it's trivial to pull down any failed step and diagnose what happened.

We are using 4-6 times as much storage as we need to, and these are often not small files (on the order of 100 MB - 5 GB, several dozen times a day) but fixing this overuse is so far down the priority list that I don't think it survived the great Jira purge of mid-2024.

▲eadmund 2 days ago

> storage is so cheap that being wasteful with it isn't really disincentivized

I think another way of phrasing that is usage is correctly incentivised. In the example you give, the value to debugging is more than the cost of storage — and even if that’s not the case it’s so low-priority that it might not even be on your list of priorities anymore!

That literally means that it’s worth your limited, valuable time to do something else.

▲alright2565 2 days ago

It's pretty simple: sorting through the data to determine which 90% is crap is more expensive and uses more of our scarce resources than just storing it all.

▲sali0 2 days ago

When traveling, a funny thought I always have is watching other tourists take the same photos, from the same exact location, knowing it will be backed up to iCloud. I can't help but imagine how much disk space is taken up by duplicates of the exact same photo.

▲pants2 2 days ago

That and videos of concerts / fireworks shows.

For a fun example, every time I'm on the Las Vegas strip I see dozens of people taking videos of the Bellagio Water Show.

There are 30 shows per night, if 50 people take videos in 4K 60fps (default on new iPhones), that's around 60 GB of data per show or ~600 TB per year of just videos of the Bellagio Fountain Show!

▲umutisik 2 days ago

Make the cost of operating data centers reflect the damage to the environment and see how quickly people optimize. I don’t know how damaging storage is, but that’s the only way to make a difference.

▲jeffbee 2 days ago

The cost to operate a data center is already, to a close approximation, the cost of its electricity. The data center operator already pays for every joule.

Data centers used ~4% of USA electricity last year, about .15 quads, while the chemical industry used 10 quads, virtually all in the form of fossil fuel. If you are going to start reflecting the externality of energy consumption in the price of goods, information sector will probably not need to adjust anything, while the chemical sector will be fundamentally impaired.

▲knowitnone 1 day ago

so charge more? I'm sure they are already charging as much as the market will bear. Governments can impose a tax which they already have.

▲ltbarcly3 2 days ago

> We’re destroying our environment to store copies of copies of copies of stuff we have no intention of ever looking at again

While I agree that most of the stuff in data centers is probably crap, but that's because most of everything people do is crap. That's not for me to decide though, people save things because they find value in them. Most of what has value to another person won't have value to you. Most of what people treasure in their life is thrown away after they die because nobody wants it, even their closest family members. Who gets to tell everyone the bad news, that objectively their memories are trash and they don't have a right to keep them anymore? Gerry Fuckin' McGovern?

Secondly, we aren't destroying the environment for any of this. Data centers use like 5% or less of the overall electricity use. It's a lot, but we don't have to put datacenters in random locations, we can (and do) put them where electricity is cheap. That generally means that the 5% of electricity used for data centers, kwh for kwh isn't as impactful as an average kwh of end use. Large companies like Meta and Google claim to have zero net carbon by obtaining offsets. So in general we aren't "destroying the environment" to store copies of photos.

▲cyberjerkXX 2 days ago

I couldn't agree more - the data stored in data centers have "some" value, otherwise people wouldn't pay to host it. This author anecdotally comments about his experience while pearl clutching about the environment, typical.

▲hannob 2 days ago

I have seen takes about the environmental footprint of unused data before, and I have severe doubts this is a relevant issue.

I mean, sure, there is some impact. Storage media has to be produced. But there's a reason storage is cheap, it's not a whole lot of resources going into it. And hard drives that are idle in some data center without being accessed don't consume a lot of electricity.

There are very real and concerning problems with the environmental impact of IT. But they are primarily found in other areas. Energy consumption is mostly a function of "how much you compute with data", not "how much data you have".

In other words: be concerned about so-called "AI", be concerned about Bitcoin. Don't worry about unused data too much.

▲mproud 2 days ago

There are a few arguments here, and yeah, I get it, some of the shit being stored is shit — or shitty versions of content already stored. But the argument that “these pages haven’t been reviewed in 20 years!” is exactly the reason for preservation! We want to be able to read, listen, and review content that is rare.

▲toader 1 day ago

In this book 'World Wide Waste' he states 'Every time I download an email I contribute to global warming.' Is this true? Aren't some data centers, Google for example carbon neutral?

▲BSDobelix 1 day ago

>>Google's carbon footprint jumped by 48% from 2019, amounting to 14.3 million tons of carbon dioxide emissions for 2023. According to the company's Environment Report 2024, 24% of its total emissions, or over 3.4 million tons, comes from market-based sources, i.e., purchased electricity

published July 3, 2024:

https://www.tomshardware.com/tech-industry/google-reveals-48...

▲iamleppert 1 day ago

I recently worked on a project that proposed using ancient cave systems, full of priceless stalagmites as data centers. The first step in the operation? Raze the cave and cover everything in a thick layer of cement. Use of the underground water table was recommended to cool the AI machines. It was all very dystopian. It was even suggested that the national park system be sold off to build these mega data centers underground.

▲ashoeafoot 1 day ago

So if its 90% dead cold data, the temptation must be there for "ocassional hit"data, to switch from ssd/ hdd in a server to some slow access magnetic tape with a delayed reply..

▲jodrellblank 2 days ago

It's a problem for us when it comes to the GDPR and rights to be forgotten; companies will say they store your data carefully, will say they have shown you everything they have, will say they delete it, but "the company" in aggregate has no idea there's a thousand SharePoint sites and ex-employees mailboxes and filestores, and copies of old filservers from before a migration, and test databases containing copies of real data from a half-abandoned project where new management fired the contractors and then never got around to hiring new ones.

▲otterley 2 days ago

The right to be forgotten was never a practically enforceable right to begin with. It’s a right that burdens others with perpetual effort and cost, unlike most rights, which only require others not to do things. I’m not sure why Europeans are so obsessed with it.

▲wvenable 2 days ago

I hate searching for vintage computer stuff only to find broken hyperlinks to lost pages that once contained useful information or software but is now gone forever.

▲Borg3 2 days ago

Yeah, because we do NOT mirror usefull information. Only crap :) I remember years ago (20 years, even 10 years ago) it was easy to mirror stuff. These days, web pages are loaded w/ JS junk.. Is hard, so I just extract interesting text and put it into my storage.

▲srmarm 2 days ago

Some fair points, but some 'crap data' has storage requirements orders of magnitudes from each others.

One short video can equal a year worth of emails for someone. Similarly those many webpages that don't get viewed often probably require only a negligible amount of resources to keep online and might help someone who'd otherwise be faced with linkrot.

Best to focus on the low hanging fruit.

▲tromp 2 days ago

> the cost of storing data is less than the cost of figuring out what to do with the crap.

Or the cost of figuring out that it's not worth saving...

▲zekenie 2 days ago

Not my area of expertise but I don’t think storage is a big deal, relatively speaking. It’s all about the compute

▲geor9e 1 day ago

Instagram and Facebook, at least, will compress a 4K video down to 360p if it doesn't get any views for a couple of months. So, there are some mechanisms to degrade the crap.

▲SturgeonsLaw 2 days ago

90% of everything is crap. I think there's a law about that or something

▲RobotToaster 2 days ago

I imagine there's a similar statistic for book shops and public libraries.

▲elmolino89 1 day ago

Without rules, metadata, systems to organize files even the most valuable data can be stored as a steaming pile of poo. In my case I was diving into a compressed tar containing compressed tar-s, which again may have giant uncompressed data files inside. The whole pile of crap had premature EOF so for sure some non trivial amount of data was lost. And on the top of it what was preserved should have a prefix: "unobtanium_". Storing several tar.gz several terabytes in size on some puny domestic use NAS made extraction of the files a crying game.

In another site I found a mix of some old version windows disk images with data. With more crap inside.

In the end: storage may be cheap. Storing piles of disorganized crap is very costly if you want to find something

▲matt-p 2 days ago

Must be more, surely.

▲smetj 2 days ago

> Why were they created?

Proof of work. Look at all this data I/we created.

And the article didn't talk about logs and other operational data yet

▲donatj 2 days ago

Burn the libraries, they're full of books no one's read in years /sarcasm

▲_Algernon_ 2 days ago

Surprised it isn't higher

▲floppiplopp 2 days ago

That's a pretty optimistic estimate, tbh.

▲ctoth 2 days ago

Surely a few more nines if the base rate is already Sturgeon.

▲cyberjerkXX 2 days ago

I wonder what this guy thinks about Internet Archive?

▲maigret 2 days ago

This is the least of problems. The Wayback Machine is 100 petabytes. https://en.wikipedia.org/wiki/Wayback_Machine . "Every day, we create more than 350 million terabytes of data." https://www.rankred.com/largest-data-centers-in-the-world/ , so 3500x the Wayback Machine every day.

▲qingcharles 2 days ago

Photos not seen by humans again, but plenty of value for the AI overlords to examine. These things have value again.

Didn't Facebook start to move most of their least-used data onto optical arrays a long time ago?

▲cactusplant7374 2 days ago

Fortunately with LLM's we can generate data and not store it. If you think about it that means articles could always be relevant with additional details added as time goes on.

▲tylerrobinson 2 days ago

No, the reality of this situation is that people need to save a snapshot of what the LLM said when they read it, just in case they need to substantiate it or blame someone for what they read at the time.

▲cactusplant7374 2 days ago

No, it requires a shift in thinking. There are no guarantees with information one reads as it is.

▲irjustin 2 days ago

At the moment, I don't like this for fear of revisionist history. The equvilent would be if wikipedia didn't have to have a source reference.

not saying that it's infallible but yeah we need some chain of audible at least 1 or 2 layers deep.

▲xyst 2 days ago

> because when it comes to technology, these managers exist on a whole other level of stupid vanity and narcissistic pursuit of their own selfish agendas

Regardless of what you think about the article. This rings so true at many Fortune 500 companies.

The number of times I have seen teams work through pointless bullshit to push some meaningless objective for the company. Just so the middle manager (aka “Director of SVP of X product of Y branch”) can get a bullet point(s) in the quarterly “all hands”.

Oh and those 10 developers/off shore people that were just hired? It was all to pump his/her “head count” number to get to the promotion to next grade/level.

Then when that person gets promoted, those people get scattered throughout the firm or just let go.

It’s truly just weaponized incompetence.

▲acd 1 day ago

Thats why human brains forget

▲the_real_cher 2 days ago

> were destroying our environment

citation needed

▲knowitnone 1 day ago

One man's crap is another man's crap. My data, which may be 100% crap to you, is obviously not (all) crap to me. I often unerase USB drives, HDDs, and SD cards find pictures, movies, etc. Obviously, they don't have much value to me so lots of it is crap but sometimes there's gold hidden in the crap if you dig.

▲chiggsy 2 days ago

> Having to deal with senior managers has always been the most unsavory part of my job, because when it comes to technology, these managers exist on a whole other level of stupid vanity and narcissistic pursuit of their own selfish agendas.

Whoever sent this dude made a mistake. People who don't share your worldview need to be persuaded, not insulted! Some dude stomps in, thinks all the snaps in the cloud are crap, things the big bosses are stupid for not instantly deleting the pictures they saved into the cloud.... and then what? Download Lisp? Thought we got over this, pal.

WORSE IS BETTER.

P.S. do not erase our porn. WORSE IS BETTER.

▲bocytron 2 days ago

It seems you're all missing the point here: it's not about storing useless data, it's about destroying the environment in the process. I understand you all want to keep everything, just in case, because it's cheap and you don't see the externalities. But there are externalities, and they are big.

▲palata 2 days ago

This.

We need to think about the data we need to store before we store it, only store the data that we need to store, and only store it for as long as needed.

It reminds me of CIs. It's now so easy to throw 40 jobs on GitHub actions that people don't think about them. I have been in a startup where people would debug in CI: they wouldn't have e.g. Windows on their machine (maybe they should have, given that their product was supposed to run there) and were fixing compilation issues by sending patches and patches to the CI. Every single time it would trigger the 40 jobs. Sometimes you could see a patch sent every 5 min for 3 days (where reproducing the issue locally would actually take 3s and not 5min). They did not even bother disabling the 39 uninteresting jobs.

For open source projects, it's just wasted energy, for private repos it was costing the company a lot. This was just malpractice. But nobody cared. The finance person would say "GitHub is expensive", the CEO that "well we need it" and the engineers that "I don't want that Windows crap on my computer", I suppose.

▲andybak 2 days ago

In which case we should be reading an article about how important it is to correctly price externalities. And that is not this article.

▲palata 2 days ago

Well that article clearly says "it's hurting the environment, and in my experience the vast majority of that data is useless".

Which I believe is not uninteresting, given the amount of answers here where people say "Is that data useless? I don't know, I could imagine that it's not, I think it's a hard problem". Well here we have one person saying "I have experience in that, and I can tell you that most of it is useless". Just a data point, but that's still interesting.

▲imcritic 2 days ago

WON'T SOMEBODY, PLEASE, THINK ABOUT THE ENVIRONMENT!?

▲kkfx 2 days ago

Not too related but... Did you ever tried to imaging a world where computing is back to interconnected desktops/homeservers? Well, if you see trends like {fog,edge}-computing or recent ideas to distributed LLM computing (BrianknowsAI DCI Network for instance) it's clear that giants well known they can't keep up the modern mainframe model preferring to makes users paying the iron and bandwidth while they handle the software stack.

Now it's clear the new deal could be implemented only in homes/sheds with domestic p.v. and storage, smart cities keeps to fail since the ancient Fordlandia, see Neom, Songdo, Masdar, PlanIT Valley, Lavasa, Ordos, Santander city, Toronto Quayside (Google Sidewalk Labs), Amazon HQ2, Egypt new Cairo still nameless, Modi's Indian 100-smart city program, Arkadag, Innopolis, Nusantara, Proton City, ... and can't be powered with a smart-grid at such scale.

So well, new well insulated buildings, with ventilation of course, with p.v. and storage, with room for a domestic rack(s), with FTTH. Anyone with such settlements could have his/her own "datacenter" at home, following the same trend for medical devices more and more cheaper and smaller. A LOM? Well a NanoKVM PCIe or an external JetKVM cost MUCH less than classic LOM and do much more. We have all the gear to makes such "datacenter at home" assemblies, anyone holding preferred crap and participating in distributed computing networks to pay at least a bit gear and bandwidth.

It's not for all of course, some will be trapped in dense cities while some large owner dream an obviously not possible conversion from offices to apartments and datacenters like https://finance.yahoo.com/news/southern-californias-hottest-... or https://www.euronews.com/next/2024/02/29/madrid-to-convert-u... and https://www.theguardian.com/society/2025/jan/05/office-to-ho... or https://czechdaily.cz/half-of-pragues-office-buildings-are-a... etc for all over the developed world. That's while we admit https://doi.org/10.1073/pnas.2304099120 we need a full-remote DISTRIBUTED shift.

Food, meds, general retail distributed by a single integrated logistic platform for maximum efficiency in a spread society, the IT evolution makes Distributism possible.

Doing so erase the big amount of concentrated energy, dense network, heat handling and water problem of datacenter and also reduce much the crap, because anyone keep it's own personal and being not free keeping it they'll learn to be storage continuous.

▲Bluglionio 2 days ago

[dead]

▲edwardoelliott6 2 days ago

[dead]

▲nimchimpsky 2 days ago

[dead]

▲eadmund 2 days ago

> We’re destroying our environment to create and store trillions of blurred images, half-baked videos, rip-off AI ‘songs’, rip-off AI animations, videos and images, emails with mega attachments, never-to-be-watched-again presentations, never-to-be-read-again reports, files and drawings from cancelled projects, drafts of drafts of drafts, out of date, inaccurate and plain wrong information, and gigabytes and gigabytes of poorly written, meandering content.

Storing the files for Mr. McGovern’s website requires plastics, metals, power and physical space, yet I assume he believes that environmental effect is worthwhile. Who is he to decide for others that their choice to pay for the storage of data is not equally worthwhile to them?

That’s the beauty of a price system: each of us gets to decide what we will buy, and what we will not buy.

Now, perhaps his argument should be that the price of storing digital data does not adequately reflect the true cost. Perhaps there are unaccounted-for externalities. If so, then he should make that argument, perhaps arguing for a tax to align prices with costs.

Someone else might argue that data is a liability as well as an asset. That’s another argument he could make.

But haranguing folks for spending their money in ways he doesn’t like doesn’t seem likely to produce the outcome he appears to wish.