This post helps understand some basic DB pitfalls for frontend developers, but I wouldn't use D1 regardless. If you can figure out how to use D1 as a frontend dev, you can use a hosted Postgres solution and get much more power and performance.
[1] https://speedvitals.com/ttfb-test?url=https://northwind.d1sq...
I was surprised because Cloudflare’s central messaging is that their network is fast, and disappointed becuase I’m a happy user of their other products (domains, DNS, pages, and R2).
It'll be interested to see where D1s performance falls after these reaches general availability.
Also I've had really good luck with the CF discord for support, certainly better than CF tickets or the forums. I tend to only go to support with really weird/novel scenarios, so on tickets I end up banging my head against the wall with a tier 1 support staff for a couple of weeks before I have any chance of a real answer. But on the discord I often get an answer from an actual expert within a day.
I've glanced through D1's docs and I immediately noticed system traits like:
- database stored in a single primary region where all writes need to go,
- cold starts involve opening a connection to D1,
- cache misses in local replicas involve fetching data back from the primary region,
- D1 is built upon sqlite, which I think doesn't support write concurrency well.
- D1 doesn't actively cache results from the primary region to the edge, so you'll have cache misses pretty frequently.
Etc.
These traits don't scream performance.
When I asked on Discord, someone from Cloudflare confirmed that DO is indeed the only way to do tenancy-based sharding (you give the DO a name to obtain a handle to the specific DO instance to talk to), but the DX experience between DO and D1 is quite stark; D1 has better in DX in many ways but can’t scale, DO can scale but has terrible DX.
That being said, the pipe dream statement is accurate IMO. I do think they'll get there, but like you said - a lot of the ideas put forward around a DB per customer just aren't present at all in the DX.
If I were to hazard a guess, Durable Objects will slowly gain many of the features and DX of regular workers. Once they reach parity workers will begin end-of-life (though I'd guess the workers name will be kept, since 'Durable Objects' is terribly named IMO).
This is kind of what happened (is happening) with pages right now. Workers gained pretty much all of their features and are now the recommended way to deliver static sites too.
For me, this can't come quickly enough. The DX of workers with the capability of DO is game changing and one of the most unique cloud offerings around.
It'll bring a few new challenges - getting visibility across that many databases for example, but it completely removes a pretty big chunk of scaling pain.
Read replicas for lower latency and sessions for partitions per user like you mention.
I dont't think your take makes sense. I'll explain why.
Cloudflare's doc on D1's service limits states that paid plans have a hard limit on 50k databases per paid account. That's roomy for sharding, but you still end up with a database service that is hosted in a single data center whose clients are served from one of the >300 data centers, and whose cache misses still require pulling data from the primary region. Hypothetically sharding does buy you less write contention, but even in read-heavy applications you still end up with all >300 data centers having to pull data from the primary region whenever a single worker does a write.
1. https://developers.cloudflare.com/workers/configuration/smar...
After a cursory glance at Cloudflare D1's description and service limits, I was left with the impression that the usecases are not that of a generic RDBMS. I mean, the limit for each database is 10GB, and apparently a D1 database resides in a single location. Of course this means >200ms to even hit the server when requests don't come from the same data center.
Perhaps this is one of those things where a underperforming implementation is released to minimize time-to-market, and then they work on performance. However, for a distributed service this suggests to me that this solution only works as an ad-hoc storage of complex relational data for high-read, low-write scenarios, where you have Cloudflare Workers query D1 but either aggressively cache the response or store queries in Cloudflare KV to avoid the roundtrip cost.
What world are you living in?
When using a European server from Europe I expect < 10ms ping. Less than 100ms for East coast server, less than 150ms for west coast ones. Only cases I'd expect more than 200ms (and not a lot more) is when hitting Australian servers.
I'm living in reality, where a globally available application needs to serve people all over the globe from Cloudflare's >300 data centers. This means a worker running in the same data center hosting your D1 databases might get ~10ms requests, but workers running on all other >300 data centers have to pay for the round trip to the primary data center whenever there is a cache miss.
The question you need to ask yourself is how come you failed to notice that. Do you have a good grasp on what you are doing?
This is already waaaaaay too much. Same datacenter I expect nano seconds round trips. Datacenter to a datacenter on the other side of the world? Maximum 300ms. It was what you expected of game servers two decades ago. I don't see what could eat more than a couple dozen ms on the server with current hardware to make a 10ms round-trip become "> 200ms" and consider this a service you'd pay for.
I read a blog post about this but can't find it now. I did find this [release note](https://developers.cloudflare.com/d1/platform/release-notes/...) though.
They describe removing _at least_ two round trips back to the DC hosting the D1 instance.
If your application server, and your hosted postgres server live in different buildings, you'll also be disappointed in performance and pay 10x the price of D1.
Your persistent servers may have a larger fixed cost up-front, but you can save on engineering hours otherwise spent handling serverless foot-guns.
If you introduce a cartesian explosion of database queries on your persistent instance, it'll fairly quickly just crash and give you downtime. On your serverless version, it'll just keep going for as long as it can and you won't find out until an eye-watering bill lands in your inbox at the end of the month.
Hard downtime is not desirable, but that 10x initial cost can be dwarfed by an errant commit that leads to a practically endless execution loop on your API or DB.
Using CF Workers + DigitalOcean Postgres, I was seeing query responses in the 50-100ms range.
Using CF Workers + CF D1, I was seeing query responses in the 300-3000ms range.
Both workers had Smart Placement enabled.
I was actually very impressed how slow and bad it is.
AFAIK when you call D1 it's Worker > D1 (Durable Object) > local storage.
So using DO directly would just be DO > local storage.
I believe the connection between worker and D1 is the painful part, requiring multiple round trips.
I used Mongo serverless few years ago when it was first released, I didn't know how the pricing works so I wasn't aware how much these full table scans will cost me even on a small collection with 100k records...
For example in logdy.dev[1] I'm using D1 to collect all of the things listed above and it works like a charm with Cloudflare Workers. Just last week I published a post on how to export D1 and analyze it with Meatabase[2], for the next post I think I'm going to describe the whole stack.
[1]https://logdy.dev/ [2]https://logdy.dev/blog/post/how-to-connect-cloudflare-d1-sql...
They mean Metabase, though Meatabase could get interesting as a product
When I needed to ensure atomicity in such a multi-part "transaction", I ended up making a batch request, where the first statement in the batch checks a precondition and forces a JSON parsing error if the precondition is not met, aborting the rest of the batch statements.
SELECT
IIF(<precondition>, 1, json_extract("inconsistent", "$")) AS consistent
FROM ...
I was lucky here. For anything more complex, one would probably need to create tables to store temporary values, and translate a lot of application logic into SQL statements to achieve atomicity.