Fresh Hacker News | SQL nulls are weird

214 points by subomi 294 days ago | 36 comments

SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!

If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:

1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.

2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.

3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.

4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!

▲crazygringo 293 days ago

That doesn't address anything in the second half of the post though, starting with this pull quote:

> The fact that NULLs are distinct for UNIQUE columns but are indistinct for SELECT DISTINCT and UNION continues to be puzzling. It seems that NULLs should be either distinct everywhere or nowhere. And the SQL standards documents suggest that NULLs should be distinct everywhere. Yet as of this writing, no SQL engine tested treats NULLs as distinct in a SELECT DISTINCT statement or in a UNION.

▲sitharus 293 days ago

This is confusing when you know that NULLs are not comparable, but it makes some sense if you consider the result of distinct/union as the output of a GROUP BY. You can consider everything that's NULL to be part of the same group, all the values are unknown.

So NULLs are not comparable but they are part of the same set.

▲adammarples 293 days ago

If nulls are distinct then group by should not group them together, this just ignores the problem. Why does group by treat them as equal?

▲getnormality 293 days ago

It doesn't treat them as equal, it treats them as one group. It does this because the result is more useful.

It is not the case that nulls are always the same as one another. It is also not the case that nulls are always distinct from each other. Thus, the normal rule of grouping, that same values are combined and distinct values make different groups, does not apply. Another principle is needed to determine the result of grouping with nulls.

Logic which allows for an unknown value can't be shoehorned into always giving definite true or false answers.

▲rawgabbit 293 days ago

Nulls are not necessarily distinct.

I believe this confusion is confusing the tool with the thing being measured. For simplicity, I will use the analogy of a record (stored as a row in the database) as an observation in a scientific experiment. If the tool was able to record a value, I enter a value like 579.13. If the tool was not able to record a value, the tool will enter NULL. I make a total of one hundred observations. Of one hundred rows, some have values and some are NULL.

Are NULLs distinct values? No, they are simply a failure in measurement; it is like asking if all errors are distinct or the same. Are NULLS part of the same dataset? Yes, because they are all observations for the same scientific experiment. What does it mean when "select distinct ... " returns several rows for known/measurable values and but only one row for NULL? If this is confusing, the scientist can update the rows and substitute "UNKNOWN/ERROR" for every NULL. When you do "select distinct ...", you will get the same thing. It will return several rows for known/measurable values and but only one row for "UNKNOWN/ERROR".

▲sitharus 292 days ago

This is where the foundation of a relational database semantics in set theory shows through. You can’t model the behaviour with pure boolean logic.

In the SQL spec by default unique indexes consider nulls distinct because you’re adding tuple to the relation, and this is done by equality.

When doing a select distinct or group by you’re not doing a boolean grouping, you’re doing a set projection. NULLs are considered part of the set of “unknown values”, so NULLs are grouped together but they’re still not equal to each other.

The behaviour is mathematically well defined, but it doesn’t match boolean logic.

I’ve been dealing with databases for well over 20 years now and I can only think of a couple of times when this behaviour wasn’t wanted, so I think it’s the right choice.

▲GuB-42 293 days ago

That's because "different" and "distinct" don't mean the same thing.

Two unknown values are assumed to be different, but they are not distinct from each other.

For example, take two boxes, in each box is a die, the value of the box is the value shown on the die inside. You don't know the value since you don't see the die, it may even change as you manipulate the box, so it is unknown, NULL in SQL. Because of that, you assume the two boxes have different values. They are, however, indistinguable, so, not distinct. All you need to know is that you hold two boxes with dices in it, which one you hold doesn't matter, and that's what "SELECT DISTINCT" tells you: that you have two boxes and it doesn't matter which is which, even though they have different values.

▲SigmundA 293 days ago

>That's because "different" and "distinct" don't mean the same thing.

The literal definition distinct is:

>recognizably different in nature from something else of a similar type.

If you want to get down to it nothing is "equal" or the same.

Is a temperature measurement 25C the same as another of 25C? No these measurements are an approximation of the actual values which are actually not equal to each other they are distinct they have just been lumped into the same 25C group due to the resolution of measurement yet equality works just fine on that value in sql.

I have used SQL for a long time null handling is weird and inconsistent and a waste of time. For all the language bugs due to the existence of null at least I can count on null=null and not write garbage like value=param or (param is null and value is null)

▲buttercraft 293 days ago

> The literal definition distinct is

Irrelevant. What matters is the meaning in the context of SQL.

> weird and inconsistent and a waste of time. For all the language bugs due to the existence of null

There are necessary, semantic cases that need to be dealt with. How else would you do it?

Also, it's really weird to use "bugs" to refer to well defined and well documented behavior.

▲munk-a 293 days ago

I wanted to briefly reinforce this point with the fact that SQL has multiple equality operators - there is both `=` and `IS NOT DISTINCT FROM`. The later operator will treat null values as equal in alignment with the `DISTINCT` and `DISTINCT ON` operators.

It is extremely easy using partial uniques and the proper operators to treat nulls as non-distinct values and the approach we have allows some very important functionality that aligns with standard statistical data treatment which was influential to how SQL logic was originally designed.

▲dragonwriter 293 days ago

> Is a temperature measurement 25C the same as another of 25C?

Yes, the measurements are the same.

The actual temperatures probably are not, but measurements are not the same as the thing measured.

▲SigmundA 293 days ago

>Yes, the measurements are the same.

By the logic two unknown (null) measurements are the same regardless of the actual value which I agree with.

▲dragonwriter 293 days ago

An unknown measurement isn't a measurement value its a statement of (lack of) knowledge about a measurement, that doesn't tell you what the measurement is. Knowledge about a measurement is as different from the measurement as the measurement itself is from the thing measured.

Whether two unknown measurements are the same is unknown.

▲SigmundA 293 days ago

Whether two measurements of 25C are the same is unknown, these are just values recorded in a database. 25 is a value, null is a value.

The values in the db are the same in both cases which is what I would like my db language to deal with and not make assumptions about what that value actually means.

I see no value in treating null special when in comes to equality in a sql db, in fact it is a hinderance that it does so in my experience.

▲shawnz 293 days ago

The SQL null is a database-specific keyword and not something that's part of the domain of your measurements. If you want some kind of sentinel value for your measurements that doesn't have the "unknown" meaning of SQL null, then you should use your own sentinel value and not reuse null for that purpose

▲SigmundA 293 days ago

Sentinel values suck especially when the language already has a perfectly good one built in.

Is 0 for a temp measurement unknown sentinel or an actual measurement, how about 2,147,483,647 great probably not a measurement now its always included in greater than queries same with max negative with less than.

Null separates the value into its own distinct group and prevents it from being including in range queries due to it not being an actual numeric value while most languages still allow you to compare equality using standard operators.

Sum types would be great in sql but currently we get a union of the sql type and null, so null for sentinel values it is except for the having to using weird syntax to compare it.

▲Dylan16807 293 days ago

Null is not your value that the database is making assumptions about, it's the database's value that you are making assumptions about.

A real sum type would be nice, but when you're using null then you need to accept that null was not designed with your specific use case in mind.

▲dragonwriter 293 days ago

> Null separates the value

NULL is not a value.

NULL is a statement that a value is not available or unspecified reasons.

If you want a particular value where a query would return NULL, it's your job to replace the NULLs with the contextually-appropriate value, e.g., using COALESCE(), to provide it.

It's a convenience shortcut to allow more complicated data models to be rpersented in simpler table structures than a fully normalized NULL-free data model would require, and to provide information about missing data (which can be used with things like COALESCE, where appropriate) when a more complex data model is simplified into a resultset via a query with JOINS, etc.

▲SigmundA 292 days ago

>NULL is not a value.

I do not agree it is most certainly a value that is stored in the database.

>NULL is a statement that a value is not available or unspecified reasons.

Again I disagree, it is a value that denotes its value is of a different type than specified in the columns data type. The meaning of that value is for the user to decide. The system is literally storing a value that can be compared against using special equality syntax (IS NULL, IS DISTINCT FROM, etc).

The actual column definition is a sum type defining possible values in the column it is a constraint on the values:

columnA int null columnB int not null

If someone asks you what the value of a column in a result is when null do you say "I don't know" or do you say "null"?

For all these statements about what null means philosophically and the history about why it is treated the way it is in SQL there is little compelling argument to what value having the the equals operator always returns false when comparing nulls and instead one must use a separate syntax to properly compare null values for equality other than its for historical reasons and it changing it would be difficult.

This adds no value over typical programming languages where the normal equality operator can be used, its is weird and the source of confusion and even more bugs than your typical null handling creates.

I would prefer a database with fully fleshed out sum types rather than marking a column nullable, then a column could be marked as say a number + string + special sentinel type or whatever combination of types makes sense for my application, and if it that db did exist I am sure its equality operator would properly compare type and value to give a consistent binary result rather than the nonsense that is SQL null equality.

Your statements about using coalesce don't seem compelling to me, maybe I am a misunderstanding, to efficiently search for a value with a index you must use that value, I should be searching for with column IS NULL not Coalesce(column,[special value]) = [special value] which would be extremely inefficient.

Many languages have null coalescing operators and still use the standard equality operators for null. Coalesce to a special numerical value for a numerical column to represent a sentinel value is again a waste of time that again leads to strange greater than less than issues. Given a type system that allows nulls I would rather use IS NULL than coalesce that would be a further step backward but even better would be = NULL and get rid of the IS operator and its variants.

▲GuB-42 293 days ago

Changing the emphasis.

> recognizably different in nature from something else of a similar type.

But anyways, the point wasn't to justify the choices of SQL but rather as a way to make intuitive sense of its logic. SQL is one of the oldest and most successful programming languages in existence, we are not going to change it, and it is not going to disappear anytime soon, so we have to go with it, like it or not. There have been some attempts at alternatives, both at changing the paradigm (NoSQL) and at cleaning up the language, which, to be fair, would be a good thing, but without much success. The relational paradigm just works, and SQL is usable enough to make the cost of switching not worth it.

Edit:

And writing things like "value=param or (param is null and value is null)" is usually the sign of a poor understanding of the NULL logic. You are using it wrong basically. Sometimes, it is the right thing to do, but if that pattern starts appearing all over the place, it is usually a result of thinking "NULL is broken, I have to use this pattern to handle NULL properly". That's cargo culting, don't fix problems you don't understand by copy-pasting code you don't understand.

Note: this is not addressed to "you" in particular, there can be good reasons, no offense intended. But I think that in general, it is a code smell.

▲SigmundA 293 days ago

If it is not recognizably different than it is the same in that context correct?

Two measurements of 25C are not recognizably different therefore they are equal, correct, regardless if the actual temperatures are not the same?

Two measurements of unknown are not recognizably different therefore they are equal in the context of the database.

Having null!=null has never been intuitive to me especially since every other programming language treats them equal. I am not hoping this gets changed, I know SQL is to far along for that, I can still complain about it and agree its wierd.

>And writing things like "value=param or (param is null and value is null)" is usually the sign of a poor understanding of the NULL logic.

It's needed with parametrized sql when your db doesn't support "is not distinct from" which is itself a silly way to just write '=' or '==' like a normal programming language. The distinct predict exist for this very reason to have yet another way to express equality that includes nulls: https://modern-sql.com/caniuse/T151

▲Aloisius 293 days ago

MS SQL Server treats NULLs as indistinct for UNIQUE constraints, SELECT DISTINCT and for UNION.

Indeed, the sqlite page the pull quote is from says as much.

▲hans_castorp 293 days ago

> MS SQL Server treats NULLs as indistinct for UNIQUE constraints

Postgres lets you control that behaviour when creating the constraint (or index)

▲swiftcoder 293 days ago

Although only in relatively recent versions. I had to hack around this in TypeOrm, because their Postgres backend hasn't exposed the option yet.

▲marcosdumay 293 days ago

Yes, ergonomics dictates some weird behavior for nulls.

Luckily, Postgres nowadays lets you declare the behavior of each null in unique constraints, like it should be. We can expect this to creep down to the other DBMSs with time.

Making nulls distinct on a "select distinct" or a "group by" would be completely useless.

▲Pxtl 293 days ago

If you're compromising on your high-minded and unorthodox purist concept for ergonomics, you may as well finish the job of ergonomics and just use normal nulls where X=X is true.

▲AlienRobot 293 days ago

It's possible that this is due to the underlying implementation.

In a unique column normally you'll have an index, so NULL becomes a special value in an index, but in SELECT DISTINCT you probably won't have an index, which means a full scan is performed, then every row has to be compared with every other row.

▲SoftTalker 293 days ago

If you're including possibly NULL columns in a distinct or group by and you want to treat them in a particular way, use the COALESCE() or NVL() or whatever similar function to give a real value to the NULL for that purpose.

▲nilamo 293 days ago

Isn't "select distinct" wildly frowned upon anyway? It's the same as "group by", but with less options...

▲mulmen 293 days ago

It’s not and it’s not, respectively.

▲tanelpoder 293 days ago

Yeah nothing wrong with “select distinct” itself if it’s used correctly for its intended reasons.

But when I see select distinct at the start of a big and complex query, I do immediately suspect that the developer might have missed some join condition down the line and “got too many rows” back from the query. And since the rows look like duplicates due to the missing join predicate, for a junior (or careless) developer, the quick “solution” is to just apply distinct in the top level query to get rid of these pesky duplicates, declare success and move on.

▲tessierashpool9 293 days ago

there is a pattern starting to emerge here on hackernews of highly voted posts by people who present themselves as experts and thought leaders who shamelessly put their lack of understanding at display. it's frightening.

▲daswerth 293 days ago

The idea that someone should refrain from publishing a blog post about _anything_ unless they are a certifiable expert is not reasonable. Many people (correctly) write to learn, and even if they are publishing just to "present themselves as experts", it's on the reader to determine value.

▲izacus 293 days ago

In a world filled with false bullshit, crating more false unchecked writing instead of educating yourself is not a benefit to anyone.

▲DangitBobby 292 days ago

What was false in the article?

▲ozim 293 days ago

In a world where there was less false bullshit people believed smoking is fine and sugar is healthy.

Amount of false bullshit doesn’t make qualitative difference.

Only difference to make is that people should not take something as truth just because it is written in a book or in a blog post or if person has a degree or not.

▲hobs 293 days ago

I think that's more a pattern of your understanding growing over time.

Most technical writing is actually at the start of the technical journey, there's more people there and its more interesting to talk about strategy and ideas when its not "messy" with internal details and trade offs you make.

I encourage folks at any technical level to write, and I encourage folks to check authors work with a critical eye no matter how long the source has been in the industry, no amount of experience can protect you from being wrong.

▲ozim 293 days ago

I see it the other way around.

People think if someone wrote blog post with technical details and it got upvoted - somehow it has to be an expert.

▲tessierashpool9 293 days ago

go the extra mile and click on about and then check out the linkedin profile.

i quote:

"I graduated top of my class with a BSc in Computer Science [...]. I have a strong background in software engineering and technical leadership"

▲zie 293 days ago

Who doesn't think of themselves as an expert? That doesn't mean they are one.

▲bunderbunder 293 days ago

Why not both? In my career, I have met countless people who are experts in programming in general, but with relatively modest skills in database systems.

Which is fine! It's really hard to be truly expert in both. There's a reason why "programmer" and "database administrator" used to be two different professions. I'd like to think that I'm better than your average developer at flogging RDBMSes, but most DBAs I've worked with can still run circles around me when it comes to information modeling and database & query optimization.

▲Tostino 293 days ago

At a lot of companies, there are still full teams of people slinging t-sql or pl/SQL all day long to support their organization. Not DBAs, just developers who primarily work inside the database system their entire life.

▲hiAndrewQuinn 293 days ago

I keep my old SQL Server Anki cards alive for just such a use case. It's been a minute since I had to jump into a 3-digit-LOC SQL script that does some arcane financial processing or what have you, but there's a nice steady niche there in case I ever want to throw my hat back into the ring.

▲finnthehuman 293 days ago

> starting to emerge here on hackernews

It's not getting worse, you're getting better.

HN has for a long time been where I go for whatever you call the tech equivalent of watching stoners think they're having a deep conversation.

▲Pxtl 293 days ago

Or maybe people understand but still think it's dumb and hideously inconvenient?

Ergonomics matter.

▲pwdisswordfishz 293 days ago

Starting?

▲lynguist 293 days ago

Like on Reddit etc, which I deliberately avoid for this reason. The hiding of the vote count and the heavy moderation still help a lot that HN is still a massively better platform than any of its alternatives.

▲noduerme 293 days ago

eh. This probably shouldn't have gotten so many votes, but it's a little interesting from a logic standpoint. It falls somewhere in the region of a StackOverflow question that makes some people scratch their heads and functions as nerdbait for everyone who knows the answer. These things don't rank for that long on HN, (and I agree that the self-important "expert" blog posture is silly), but I do find them to be a better daily checkin for my brain than actually going on S.O. anymore...

▲ 292 days ago

▲gigatexal 293 days ago

This is the correct way of thinking about things. Null is one of the hardest things for traditional software engineers in my experience as a guy who came up as a data admin.

▲drpossum 293 days ago

Null in not-SQL (which is most things) usually isn't this tortured and isn't hard.

▲mathw 293 days ago

That's because null in not-SQL is a rather different concept, and while it's pretty easy to understand it's absolutely is hard to actually work with. Hoare didn't call it a "billion dollar mistake" on a whim.

▲swiftcoder 293 days ago

I don't know about that. null/undefined in javascript gives it a pretty good run for it's money

▲rad_gruchalski 293 days ago

Because they're two orthogonal problems. It's not like you do select distinct from program variables group by scope in your frontend programming language.

▲gigatexal 293 days ago

Yeah the 3 valued logic of SQL trips people up, me too from time to time

▲mulmen 293 days ago

SQL is not three valued. Neither is NULL. BOOLEAN is accused of being three-valued but it has two values and like all values they can be unknown. Similarly a SMALLINT has 65,536 possible values not 65,537.

▲gigatexal 293 days ago

It’s not? https://modern-sql.com/concept/three-valued-logic

▲mulmen 293 days ago

It’s not.

Your link makes the same mistake I already addressed. It conflates nullable booleans with tri-state logic.

Null is not a value. It is the absence of a value.

> The SQL null value basically means “could be anything”.

This is wrong. Null means it could be any valid value but that value is unknown. If the datatype is DATE then the value cannot be boolean TRUE or the string ‘purple’.

▲gigatexal 293 days ago

How is that different than “anything”?

If I’m comparing a value of type date to a null I still think it works as it should if value is “unknown”. What greater insight or context do we have if it’s a small-int null?

▲mulmen 293 days ago

> How is that different than “anything”?

Because the possible values are known.

> What greater insight or context do we have if it’s a small-int null?

The insight is that null is not a value. It’s not a smallint or a boolean or a date. It’s the absence of a possible value. The only way to see a null as tri-state is to conflate it with a nullable boolean. This is an incorrect mental model which leads to confusion.

▲rawgabbit 292 days ago

Suppose you have table with two columns to represent a classroom’s exam scores. It has the columns student_id (varchar) and score (int).

If a student is sick and has not taken the exam, yes you could enter -99 to represent they did not take the test. But if you want to find the class average, you would have to do something like this:

select average(case when score =-99 then null else score end) as class_avg from …

Or you could have entered null to begin with.

▲ 293 days ago

▲Pxtl 293 days ago

> Null is one of the hardest things for traditional software engineers

Making them harder is not better.

▲gigatexal 293 days ago

Never said it was

▲maest 293 days ago

The Maybe monad is really well understood at this point.

▲mrkeen 294 days ago

It's unfortunate to namesquat on 'boolean' if your elements have three possible values. Just call it 'ternary logic' and let individuals decide which system to use.

▲foobarchu 294 days ago

Who's name squatting boolean? The bool column is exactly what it claims to be, you just have the option of introducing unknowability if you define it allow nulls.

▲bunderbunder 294 days ago

The distinction is that not all formal logic systems are Boolean. Meaning that it is nonsensical and confusing to use "Boolean" as a generic synonym for "truth value" in the same way that it's nonsensical to use "Pantone" as a generic synonym for "color value", including when the specific kind of color value you're talking about is CMYK or HSV and definitely not Pantone.

▲wodenokoto 293 days ago

There are two values, TRUE and FALSE. Null is not a value, it the the lack of a value.

You have a list of people and you ask if they own a car. You didn't get around to asking George, so that, somehow means he owns a car because you are using boolean logic? Or does it mean he doesn't own a car, because you are using boolean logic?

No, it means you haven't gathered this data point and don't know.

▲bunderbunder 293 days ago

If there are exactly two possible values, TRUE and FALSE, you're working with Boolean logic.

If there are three possible values, TRUE, FALSE and NULL (unknown), then you're probably working with something like Kleene logic. You can't truly be working with Boolean logic, though, any more than you can be doing integer arithmetic when 15.37 is a member of your domain.

To put it another way, if we're talking about the mathematical definition of boolean algebra and not just some programming language's quirky implementation that happens to be called "bool", then boolean values would by definition be non-nullable. That logic that allows nulls has been implemented using the same unified set of keywords and operator names is a pragmatic decision that simplifies the language implementation and spec, not a principled one that tries to be pedantic about mathematical terminology.

▲mrkeen 292 days ago

> That logic that allows nulls has been implemented using the same unified set of keywords and operator names is a pragmatic decision

That's why it's name-squatting. Rather than introduce a 'kleene' datatype & operations, and let the user opt-in, they decided that in our 'bool' world, the following is not a truism:

  a = a or a <> a

▲mrkeen 293 days ago

> No, it means you haven't gathered this data point and don't know.

This is how it should be.

> Somehow means he owns a car because you are using boolean logic?

This is how it unfortunately is. There are 3 people, and there are 3 people who don't have a NULL car. Therefore George has a car.

  CREATE TABLE people(name text, carId uuid);
  INSERT INTO people values('Bill', '40c8a2d7-1eb9-40a9-b064-da358d6cee2b');
  INSERT INTO people values('Fred', '3446364a-e4a5-400f-bb67-cbcac5dc2254');
  INSERT INTO people values('George', NULL);

  SELECT Count(*) FROM people WHERE name NOT IN (
    SELECT name FROM people WHERE carId = NULL
  );

Elsewhere people have argued that NULL propagates, so that your small unknowns infect larger queries. I could get behind that line of thinking, but the query above confidently returns 3.

▲mrighele 293 days ago

> There are 3 people, and there are 3 people who don't have a NULL car.

This is not what you are asking with your query: as someone else stated, NULL is meant to be "UNKNOWN", or "it could be any valid value".

So nothing is ever equal to something that can be anything, because even another NULL (i.e. unknown) value is in general different.

So in the line

    SELECT name FROM people WHERE carId = NULL

the condition will always be false. Now if instead if meant to search for the rows where carId is actually unknown you have to write

    SELECT name FROM people WHERE carId is NULL

And your query will return as one may expect 2.

▲mrkeen 292 days ago

> So nothing is ever equal to something that can be anything

It's worse than that. It's something is neither equal nor not-equal to something else.

Whether you can implement something differently as a workaround is immaterial. It's weird, per the title.

▲Spivak 293 days ago

But it is a boolean value, there's only two possible values TRUE and FALSE. But because it's SQL you can define any column as TYPE | NULL.

You could say that a boolean column with a NULL value is FALSE like how a lot of programming languages coerce it but if you wanted that you would just make a default of FALSE. The meaning of NULL in general being "value not specified" lends itself pretty nicely to "either true or false."

▲lmm 293 days ago

What I want is for e.g. "x OR y" where y is NULL (and/or of nullable type) to be an error rather than silently giving surprising results. Just like in a decent programming language I can't do x || y where x and y are of type boolean?, I have to explicitly handle the case where one or other of them is null (or e.g. write x!! || y!! - and that will at still error if either is null rather than silently evaluating to a funny result).

▲bdangubic 293 days ago

The meaning of NULL in general being "value not specified" lends itself pretty nicely to "either true or false."

You mean neither true or false?

▲Spivak 293 days ago

I think I mean either. So yes NULL is a distinct value from true and false so I think it's also right to say it's neither true nor false. But the value NULL does represent is roughly "could be true or false, we don't know yet."

▲lmm 293 days ago

It could mean anything or nothing depending on context, which is part of the problem. (Plenty of people think the meaning of NULL is clear and obvious and consistent, unfortunately they give different answers about what that meaning is)

▲8n4vidtmkvmk 293 days ago

It could also be neither. It's whatever you define it to be. Null could mean you don't know if it's true or if it's false, or it could mean you know it's neither true nor false.

▲nyrikki 293 days ago

This is the map territory relation problem.

We bring in the 'Trinity of thought' a priori and forgot about the advantages and costs.

You have the principal of excluded middle, principal of identity, and the principal of non-contradiction.

If your problems territory fits things are convenient.

Once you introduce known, unknown, or unknowable unknowns, the classic model falls down.

Unfortunately you have to choose what to sacrifice based on the context of the problem at hand.

This is exactly where Rice's theorm, the halting problem etc.. arise.

AC from ZF(C) and IID from statistics bring PEM, which gives or forces the classical behavior.

The 'non-trivial' properties of Rice's theorm are exactly those properties where PEM doesn't hold.

The frame problem in machine learning is also related. As is Gödels (in) completeness theories.

Sometimes you can use failure as negation, other times you can use methods like accepting that NULL is an unknown.

But you have to decide what you can sacrifice and still solve real problems.

I think thinking of a problem and it's complement is better.

E.G P is the truthy T and co-P is the truthy F. Obviously P=co-P means that the traditional method works, but jump to NP vs co-NP and ot doesn't.

NP chained ORs, co-NP is chained ANDs is another way of thinking about it.

▲solumunus 293 days ago

It could be true or false, but it’s unknown. For example. a user doing a survey is yet to fill in the answer. That doesn’t mean there is no answer, it’s just unrecorded.

▲chuckadams 294 days ago

Maybe GP was edited, but it doesn't use the word "boolean" anywhere.

▲hiAndrewQuinn 294 days ago

Correct, I edited "boolean" out prior to ^^P's comment. My apologies.

▲ 294 days ago

▲ 293 days ago

▲dominicrose 293 days ago

A null boolean good very well mean true, if the default value is true (in the code if not in the DB).

▲nilamo 293 days ago

Is this just being pedantic wrt uninitialized values?

▲thayne 293 days ago

If only it had a name that was more indicative of that, like UNKNOWN, or UNDEFINED or INDERTIMINATE or something.

▲dominicrose 293 days ago

Javascript has both null and undefined and I'm not sure that's a good idea. At least in SQL we only have one of them, but it can mean unknown or it can mean N/A or even false. It's like a joker, what it means depends on how you use it.

▲hobs 293 days ago

No, it's not those other things, that's just using the tool incorrectly. A NULL is definitely "we dont know", not false, not N/A, especially not any known value.

▲swiftcoder 293 days ago

Except in every other programming language with a null, null is the definite absence of something

▲dominicrose 290 days ago

Yes and I think that Clojure handles nil pretty well, and it's a functional programming language like SQL. It's also interesting to see that Typescript has added an unknown type. So something that's a value in SQL (null being unknown) is a type in TS.

▲hobs 293 days ago

Sure, and we're talking about SQL nulls in this context, which is why I am strict in my definition.

▲hun3 293 days ago

Or VBA, which has Empty, Null, and Nothing:

https://excelbaby.com/learn/the-difference-between-empty-nul...

(and sometimes Missing)

▲salawat 293 days ago

Null is shorter, and given the nightmarish queries I've had to read/write, I'll take any mercy that comes my way.

▲atombender 292 days ago

E. F. Codd originally suggested two types of values: "unknown" and "missing". Somehow we got NULL, which represents both.

ANSI SQL:1991 provides an optional feature that introduces a special value UNKNOWN to boolean expressions [1]. But few databases support it. MSSQL [2] is one of the few that do. As I understand it, it's not a data type that be used in tables, but is only a potential result of boolean operations.

[1] https://modern-sql.com/concept/three-valued-logic

[2] https://learn.microsoft.com/en-us/sql/t-sql/language-element...

▲Pxtl 293 days ago

Honestly, at this point I just wish SQL servers supported proper discriminated union types and nullable columns were implemented as some kind of MaybeKnown<T> and I could just have a normal Maybe<T> with normal equality semantics if I wanted.

SQL needs to join 21st century type systems... or needs to get replaced altogether. SQL is the FORTRAN of relational programming languages, but hangs around because every time somebody tries to replace it they throw the relational-theory baby out with the bath-water.

▲johnthescott 292 days ago

> SQL is the FORTRAN of relational programming languages

and what is an alternative to sql ... quel?

▲Pxtl 293 days ago

> SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!

Kleene's TRUE-FALSE-UNKNOWN logic is weird.

SQL nulls effectively violate the reflexive property of equality, because X=X does not result in a value of TRUE.

And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.

So that means that X=X is effectively FALSE in SQL*.

That is a clown language.

*(yes, I have the same opinion about ANSI NaN, but that wouldn't come up so often if Javascript didn't love NaN for some stupid reason.)

▲dragonwriter 293 days ago

> And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.

NULL is not equivalent to FALSE, it is neither FALSE nor TRUE. It has the same effect as FALSE as the final result of evaluating a WHERE clause condition only because WHERE clause conditions allow a row to be included only when they evaluate strictly to TRUE. But if NULL were equivalent to FALSE in a WHERE clause, than a WHERE clause condition which would evaluate to NULL that was instead negated would be equivalent to TRUE but instead it remains NULL which remains not TRUE.

▲Pxtl 293 days ago

No, because NOT NULL/UNKNOWN is still NULL/UNKNOWN.

I realized earlier I was using the term NULL, but going forwards let's use the ANSI SQL concept where the null state of a Boolean is called UNKNOWN. You'll have to forgive me for using the term NULL this far, but in my defense the concept of NULL column-values and UNKNOWN boolean expression results are pretty intertwingled.

    SELECT * FROM foo WHERE NOT (1 = NULL)

resolves to

    SELECT * FROM foo WHERE UNKNOWN

which is the same value that

    SELECT * FROM foo WHERE 1 = NULL

resolves to.

So the WHERE clause is treating UNKNOWN/NULL as equivalent to false. The rest of the Boolean algebra is not.

SQL likes to have it both ways. Sometimes UNKNOWN/NULL is equivalent to false sometimes it's not.

It does this because UNKNOWN/NULL Booleans are incredibly inconvenient and tedious and sometimes they'd rather not think about them.

I'd like to do that as well, but this hoary old language won't give me the same permission it gives itself.

▲tpmoney 293 days ago

> And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.

I don't think any databases treat `NULL` as `FALSE` in the WHERE clause. `SELECT * FROM foo WHERE bar = NULL` doesn't return rows with a NULL in the bar column. `SELECT * FROM foo WHERE bar != NULL` doesn't return rows without NULL in the bar column. `SELECT * FROM foo WHERE (bar = 'a') = NULL;` doesn't return rows where bar is not equal to `a`[1]. As far as I know every DB treats NULL as what it is, an unknown value.

It also doesn't to my mind violate the reflexive property because NULL is not equal to anything. It is a marker for an unknown value, not a value in and of itself. If you have a database of every person in a room and what color shirt they're wearing, and in your database, Alice and Bob both have NULL in their "shirt_color" column, that does not mean that Alice and Bob have the same color shirt. Nor does it mean that they don't have the same color shirt. Nor does it mean that someone with a green colored shirt has the same color shirt as Bob or Alice. It doesn't mean they don't have a shirt either. It means you don't have a record of/don't know what color their shirts are. You can't violate the reflexive property because you can't say what color shirt they have. You're not doing `X = X -> false`, you're doing `X = UNKNOWN -> UNKNOWN`

[1]: https://www.db-fiddle.com/f/iVDDRJos1pUqxnuy1jTEEe/0

▲Pxtl 293 days ago

It treats the NULL/unknown value of the boolean as false

1 <> NULL => Boolean UNKNOWN,

so SELECT * FROM foo WHERE 1 <> NULL returns nothing.

1 = NULL => Boolean UNKNOWN,

so SELECT * FROM foo WHERE 1 = NULL returns nothing.

That's the thing that's being treated as FALSE. That UNKNOWN. Not the value of NULL itself.

> You're not doing `X = X -> false`, you're doing `X = UNKNOWN -> UNKNOWN`

That's not how "=" works. If you want a relationship for testing equality than handles unknown, don't call it equality.

Basic properties of equality, from Wikipedia

https://en.wikipedia.org/wiki/Equality_(mathematics)

- Reflexivity: for every a, one has a = a.

- Symmetry: for every a and b, if a = b, then b = a.

- Transitivity: for every a, b, and c, if a = b and b = c, then a = c.

edit:

We can also see the incoherence of this concept when we look at set theory.

Because UNKONWN booleans are neither true or false, if you use them in a WHERE clause you get the ugly result that the set of

    X ⋃ Xᶜ

is not everything.

▲tpmoney 292 days ago

> That's the thing that's being treated as FALSE. That UNKNOWN. Not the value of NULL itself.

I think the difference between how we're looking at this is for me there is no "value" of NULL. NULL has NO value which is why you can't do `1 = NULL` or `1 <> NULL` and have to specifically use `1 IS NOT NULL` or `1 IS DISTINCT FROM NULL`

>That's not how "=" works. If you want a relationship for testing equality >[that] handles unknown, don't call it equality. >Basic properties of equality, from Wikipedia >https://en.wikipedia.org/wiki/Equality_(mathematics) >- Reflexivity: for every a, one has a = a. >- Symmetry: for every a and b, if a = b, then b = a. >- Transitivity: for every a, b, and c, if a = b and b = c, then a = c.

Sure, that's all well and good, but equality only can work IF you know what the values are on either side. NULL means you don't know what a value is and therefore you can't answer whether or not a = NULL because NULL isn't known yet.

Or let me put it another way, is the launch price of the iPhone 17 != $799? The answer is neither true nor false, because the launch price of the iPhone 17 is unknown. We can make reasonable guesses, but if I give you a database with entries for iPhones from the first model and placeholder rows for the 17, 18, 19 and 20 models, none of them belong in the list when someone asks "which iPhones cost $799 at launch?" But equally true that none of them belong in the list when someone asks "which iPhones did not cost $799 at launch?"

▲empthought 293 days ago

> That's the thing that's being treated as FALSE. That UNKNOWN. Not the value of NULL itself.

No, it's being treated as UNKNOWN, and the semantics of SELECT...WHERE only returns rows where the value of the condition is TRUE.

I think you need to look into https://en.wikipedia.org/wiki/Negation_as_failure

▲zigzag312 293 days ago

They are weird because they are inconsistent compared to nearly everything else.

Being based on someone's logic is not sufficient. Most weird things are based on some (weird) logic.

▲tshaddox 293 days ago

Even simpler, note that the first two examples are essentially just two of the most basic boolean algebra identities. Replace UNKNOWN with a variable and change the notation to the more conventional boolean algebra notation and you've got:

1. x ∧ 1 = 1 (identity law for conjunction)

2. x ∨ 0 = 1 (identity law for disjunction)

▲btown 293 days ago

> quite parasitic

This is the mental model that I use and teach: a NULL is a virus that will likely make everything it touches NULL... unless cordoned off by a COALESCE or IFNULL or something else that explicitly speaks about its NULL-sensitive behavior in its documentation.

So if you see a bug where a WHERE or FILTER clause is unexpectedly failing, and you're in a time crunch, rather than pounding your head against the wall, start sprinkling COALESCE statements around (while being thoughtful about the domain meaning of what you're doing) and it's more likely than not to solve your problem!

▲ 292 days ago

▲remywang 293 days ago

Except that NULL is not the same as UNKNOWN! NULL is a data value (like integers) that can appear in data expressions (like NULL + 1) and comparisons (like NULL = 1) whereas UNKNOWN is a truth value that can appear in boolean/logical expressions constructed from logical connectives like AND, OR, NOT.

A data expression always evaluates to a data value, and usually whenever any part of the expression is NULL, the entire expression evaluates to NULL.

A comparison evaluates to a truth value, and usually when a comparison invovles a NULL it returns UNKNOWN. This leads to weird behaviors where both `SELECT 3 WHERE NULL = NULL;` and `SELECT 3 WHERE NULL <> NULL;` returns nothing (because the query engine does not output a row if the predicate returns UNKNOWN on it).

What you listed above only comes into play for boolean/logical connectives like AND, OR, NOT, and in that case we follow 3-valued logic.

And there's more annoying corner cases when you deal with DISTINCT. The situation is so hopeless that SQLite has a whole table documenting divergent behaviors of NULL in different systems: https://www.sqlite.org/nulls.html

▲hiAndrewQuinn 293 days ago

Indeed, they're not identical - that's why I just said "based on", and that's likely why the word UNKNOWN itself isn't used in SQL.

Nevertheless I find it a useful intuition pump. I wager that most people reading `UNKNOWN = UNKNOWN` or `UNKNOWN <> UNKNOWN` and thinking about the examples above would stop and say, "Wait, I actually don't know the value of that statement for sure either, since the LHS and the RHS could be completely different things," and would then double check what their SQL dialect would actually do in this situation.

▲remywang 293 days ago

> the word UNKNOWN itself isn't used in SQL

That leads to an even more confusing point, that some systems (at least SQLite) overloads NULL to mean UNKNOWN, for example `SELECT 1 WHERE NULL = (NULL = NULL)`.

And the dangerous thing about NULLs is not when they are explicitly used in the query as a literal (as I did for brevity), but when they appear in tables. It's perfectly reasonable to assume `SELECT COUNT( * ) FROM t;` should be the same as `SELECT COUNT( * ) from t WHERE t.x = t.x OR t.x <> t.x`, but they are not the same because the latter does not return NULL rows. This has lead to real query optimizer bugs. For more examples see this paper https://dl.acm.org/doi/10.14778/3551793.3551818

▲atombender 292 days ago

ANSI SQL has had IS [NOT] UNKNOWN since SQL:1999 [1]. It's an optional feature that some databases don't support. Postgres, MySQL, and MSSQL do support it.

[1] https://modern-sql.com/concept/three-valued-logic#compatibil...

▲demurgos 294 days ago

> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.

Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.

Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.

0: https://www.postgresql.org/docs/15/sql-createtable.html

▲layer8 294 days ago

SQL NULL doesn’t behave like “unknown” in all contexts. That’s one issue of NULL, that it doesn’t really have consistent semantics.

▲masklinn 294 days ago

Furthermore if null only means unknown then we need a value for “known absent”, there’s a reason why null is so often used as that.

▲int_19h 293 days ago

From a purely relational perspective, if some piece of data can be absent, it's a 1:N relation where N<=1, and ought to be encoded as such.

(Of course, this is rather awkward in practice, and when NULL is there, it's inevitably going to be used for that instead.)

▲feoren 293 days ago

It is encoded as such. That's why most columns are made nullable. It's crazy to say you need to use the full power of a 1:N relation with some child table when you know N cannot be greater than 1, when a nullable column already exactly encodes a 1:(0..1) relation. I'm not trying to shill for null here: one of null's great problems is exactly the fact that null can represent "unknown", "known absent", "not applicable", or even some sentinel-ish "other" escape condition, each with their own subtle differences in how they should be handled.

Null has tons of problems, of course. But it's patently absurd to claim that you "ought to be" making a different 1-column table with a unique foreign key or shared primary key for every column that might be absent, because of some vague appeal to the fact that you can write it as a 1:N relation with N<=1. You can just as easily claim that every non-nullable column is a 1:N relation where N==1 and "should be encoded as such". It is encoded as such! That's what a column is!

▲int_19h 290 days ago

I don't know if it's "patently absurd" given that it's a well-known C.J. Date take on NULLs in SQL. You may disagree with him on this - I do - but the very fact that the father of relational algebra has this take should be sufficient evidence that it's not a trivial question to be dismissed without consideration.

▲ 293 days ago

▲tpmoney 293 days ago

But the "known absent" value is going to be different for different domains. For example, in EEOC databases the "known absent" value for a race would be "declined to answer". In a database of test scores, it might be "Didn't complete", but it could also be "was absent from class on exam day" so SQL can't specify what that is. On the other hand "this value is unknown" can use the same marker in all domains, and SQL chose NULL as that marker. To be completely strict about it, "have a value/don't have a value" is one piece of data if that's something you care about and "what is that value" is another one. So in an ideal system, you should have a column for "value is [present | known absent | unknown]" and a separate column for the actual value when "value is present"

Most of the time it's not that important and people can and do shortcut "null" to mean "not present" but then the issues with using null in equality statements is a result of taking the short cut, not necessarily with the logic around null.

▲thaumasiotes 293 days ago

Do you actually need that in a Boolean context? It would only be useful for evaluating self-referent claims like "this sentence is false".

▲masklinn 293 days ago

Your questions might be relevant if null were limited to boolean contexts.

It’s not.

▲marcosdumay 293 days ago

We could really use some kind of "polymorphic table" that can represent a sum type in a convenient way.

▲cm2187 293 days ago

The problem is that in practice in a database NULL is a placeholder for a missing value, not an unknown value.

▲SoftTalker 293 days ago

Best way to think of NULL is "no value." Not "unknown value," as that implies that it is a value you just don't know what it is. Not "missing" value as that even the notion of a value being "missing" tells you something. NULL is no value. It's like a black hole, it consumes anything it touches. Any expression involving a NULL term becomes NULL.

▲bballer 293 days ago

And to further apply semantics to this just to be snide, the value is only "missing" if it could possibly ever be defined for that tuple. There are cases where you expect the value to be "missing", and it not being "missing" would be considered a data integrity issue. Fun.

▲cm2187 293 days ago

Yes I should have rather written “an absence of data”. But still not “unknown”.

▲magicalhippo 294 days ago

The result of comparisons involving NULL values can result[1][2] in UNKNOWN, and in PostgreSQL for example you can test[3] for this using IS UNKNOWN.

That said, as someone self-taught in SQL, I agree NULL was not a good choice.

Replacing NULL with UNKNOWN and the third boolean value as INDETERMINATE for example would have been better.

[1]: https://stackoverflow.com/a/79270181

[2]: https://learn.microsoft.com/en-us/sql/t-sql/language-element...

[3]: https://www.postgresql.org/docs/current/functions-comparison...

▲labster 293 days ago

SQL was developed in the 1970s, there’s no way they’d waste all those bytes to spell out UNKNOWN and INDETERMINATE.

▲jandrese 293 days ago

Have you ever seen SQL? Blowing lots of bytes on a query was not a concern. They could have made a much more compact syntax with little effort, but it wouldn't have looked kind of like English the way SQL does.

You could imagine a world where instead of:

SELECT ( email, name, outstanding_balance ) FROM accounts WHERE outstanding_balance > 0 AND last_payment > 60 ORDER BY name

the queries looked more like:

accounts: outstanding_balance > 0 & last_payment > 60 => email, ^name, outstanding_balance

There were plenty of contemporary languages that looked like the latter, but few survive to this day. SQL's relative verbosity was not seen as a problem then and is definitely not one today.

Besides, if the verbosity was a problem they could easily shorten it to UNK. That would have been perfectly normal.

▲ChoHag 293 days ago

[dead]

▲otteromkram 293 days ago

Also self-taught SQLer and I don't have an issue with NULL.

I also don't use UNIQUE constraints, so maybe that has something to do with it.

▲magicalhippo 293 days ago

I don't have an issue as such, I was a fairly experienced developer first time I had to dabble with SQL, but sometimes it can still surprise.

For example I learned the hard way that the DB we use at work does not index NULL values.

And once in a while if I'm tired or stressed I might forget about UNKNOWN and thus that "Col <> 42" does not return rows where Col is NULL.

Not that better naming would prevent such surprises, but I still think the current naming is less than optimal from a pedagogical perspective. At least I see this at times when teaching our support folks SQL (many have domain background and not a technical background).

▲ 293 days ago

▲chongli 293 days ago

UNKNOWN isn’t always correct though. Let’s say your data is input by users filling out a form and some of the fields are allowed to be left blank. NULL captures both the case where the user intentionally left the field blank but also the case where they accidentally skipped that field.

So NULL can capture multiple distinct concepts: unknown values (say, as a result of a calculation), not applicable (where the data has been deliberately left out), and missing data (possibly due to instrumentation or user error).

▲jandrese 293 days ago

Is it really desirable to combine "unknown" and "no value" into the same identifier? They seems like very distinct concepts. Using your form example you might have a name field that includes a spot for a suffix (John Doe III) for example that many people would leave blank because they don't have a suffix. In that case you should encode it as a NULL, but if they fail to fill in their family name then that's UNKNOWN.

If you do it this way you can avoid some confusion as to what NULL represents.

John Doe NULL IS EQUAL John Doe NULL

John Doe NULL NOT EQUAL John Doe UNKNOWN

John Doe UNKNOWN NOT EQUAL John Doe UNKNOWN

Determining if any particular input is NULL or UNKNOWN is a tricky problem, but at least this gets the programmer thinking about it up front and structuring their code to behave in the sanest possible manner.

▲NoMoreNicksLeft 294 days ago

>also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.

Why would anyone want to use another database?

▲stronglikedan 294 days ago

Simplicity. PG is often overkill for simple apps, where MySQL/Maria/et al is easier to maintain, or even SQLite for very simple apps where zero DB maintenance is preferable.

▲lcnPylGDnU4H9OF 294 days ago

Why would you say MySQL/Maria/et al are easier to maintain for simple apps than PG?

▲demurgos 294 days ago

The main pain point for smaller apps is that every major Postgres version requires an explicit migration of the underlying data representation. I get why it's there, but for simpler apps I would appreciate a flag to do it transparently.

▲throwaway10235 294 days ago

I'm not sure what you mean. I have migrated versions without having to update any applications that connects to it?

Maybe it is a driver specific issue? I have used Python/Java, and haven't updated any of my code or dependencies because of a major Postgre update

▲Volundr 294 days ago

It's not client side, it's server side. The binary format between Postgres major versions is generally not compatible so you need to do a pg_dump from the old database and do a pg_restore into the new one. pg_upgrade [1] can let you bypass this by migrating the binary data directly, but still requires having both the new and old postgres versions installed. There's also things you can do with replication, but since we're talking simplicity I don't think that really applies.

Personally I think upgrades are the one thing MySQL has on Postgres at this point.

[1] https://www.postgresql.org/docs/current/pgupgrade.html

▲homebrewer 294 days ago

It requires manual interventions because the upgrade process is basically dump + restore. MySQL and MariaDB upgrade between major versions automatically — you simply install the next version (or change the version tag in your container definition) and restart the server.

Usually it takes almost no time, altought might be just as slow as PG when major changes to the data format are introduced. The only example I can remember is 8.0 when oracle completely rewrote the data format (making things like atomic ddl possible).

▲marcosdumay 293 days ago

The GP is complaining about the need to run the upgrade cluster command, and reconfigure your server when you want to use a different version of postgres itself. MySql does it transparently if you just open the database in the new server.

Of course, lots of people also think it's a feature. But those aren't very vocal. Anyway, this is a main reason why people keep using old postgres versions, they have to think about upgrading, and they often don't.

▲panzi 294 days ago

You need to dump the database on the old PostgreSQL version and then load the dump on the new PostgreSQL version. Some other database servers can just use the old table data or migrate the data files on the fly transparently.

▲wruza 293 days ago

Do you even have to upgrade to the latest major version? What does that do, on average?

▲phplovesong 294 days ago

Usually its a skill issue

▲ 293 days ago

▲badlibrarian 294 days ago

VACUUM

▲kstrauser 294 days ago

If your data's large and changing enough that you have to care about vacuuming, any reasonable database is going to require some tuning, tending and management.

I'd posit that only a tiny fraction of PostgreSQL uses have to know or care that vacuuming is a thing because the autovacuum default handle it for them.

▲mardifoufs 294 days ago

Sure, it's never going to be plug and play, but it doesn't mean that all the issues will be equivalent. Vacuuming doesn't really have an equivalent in say, MySQL. It's something you don't have to worry about if you use the latter.

For example, HA and clustering will always be challenging to deploy/maintain, but you will still have a harder time doing that with postgres than with MySQL. Postgres also has a lot of benefits obviously, though.

▲homebrewer 294 days ago

Also the reverse — MySQL et al support much more complex replication topologies out of the box, including multi master for the very rare use case when you need it.

It's also much easier to tune, most database instances require setting innodb_buffer_pool_size, and that's basically it. Newer versions can even set it automatically if you're fine with consuming all memory on that machine, thus requiring no tuning at all.

▲poincaredisk 294 days ago

* legacy applications or vendor lock-in

* use of a cloud provider that favours another database (like SQL server)

* some people claim mysql/maria is faster for them, simpler, or has a better support for replication

* use of sqlite for lightweight or portable apps

* domain specific databases that still use sql as their query language

* someone may want to use another database for fun or to learn something new

▲hot_gril 293 days ago

SQLite has its own use cases. And MySQL was all-around better than Postgres in the past, so it still has more usage in some areas. Nowadays I'll take Postgres over MySQL by default, but it's not a huge difference.

▲password4321 294 days ago

Replication

▲duncan-donuts 294 days ago

Introducing “unknown” feels like another kind of hell like undefined in JavaScript.

▲jampekka 293 days ago

JavaScript's undefined is great. It's sort of similar to a maybe monad. Or IEEE 754 NaN. JS could have nicer mechanisms to handle undefined though.

▲demurgos 294 days ago

Just to clarify, I'm not advocating to introduce a new `unknown` keyword. I'm saying that the existing `null` in SQL was not named properly and that the name `unknown` would have been more fitting. SQL's `null` already has the semantics of `unknown` as explained in the part of the article that I quoted.

▲wvenable 294 days ago

SQL's use of "null" is probably one of the oldest instances of that concept in computing. It's exactly equivalent to unknown. That is its definition.

▲int_19h 293 days ago

SQL NULL is not "exactly equivalent" to unknown. E.g. in an outer join, there's nothing unknown about the result that is missing a row from one side, yet SQL semantics is to fill it with nulls.

In practice, it behaves as "unknown" in some contexts, as "missing value" in other contexts, and sometimes it's just plain WTF like SUM() returning NULL rather than 0 if there are no rows.

▲demurgos 294 days ago

Really? I know that SQL is old but I would have expected `null` to refer to pointers at first.

Going by Wikipedia, I see that SQL is from 1974 and C from 1972. Were there earlier uses/drafts where `null` is "unknown" instead of "unset"?

▲adrian_b 293 days ago

The term "null" comes from C. A. R. Hoare, who has proposed in November 1965 various additions to the programming language ALGOL 60, including pointers (called references by Hoare, "pointer" comes from IBM PL/I, in July 1966), and including a special value "null" for references a.k.a. pointers, for denoting unknown or not applicable values.

C. A. R. Hoare, 1965-11:

"In order to enable references to represent partial functional relationships, i.e. ones which do not necessarily yield a value, a special reference value null is introduced. This value fails to refer to a record, and any attempt to use it to refer to a record leads to an undefined result."

In his proposal, Hoare has followed the COBOL 60 terminology for some of the additions to ALGOL, i.e. "record" instead of the word "structure", introduced by IBM PL/I in 1964-12, and "record class" instead of "structure type", hence the "class" of SIMULA 67, from where the word "class" has spread into all OOP languages.

In Hoare's proposal, references a.k.a. pointers could point only to values belonging to user-defined types, i.e. records a.k.a. structures, not to primitive types.

▲wvenable 294 days ago

I wouldn't necessarily define `null` as "unknown" -- it's just "no value" -- which is really the same thing and also somewhat equivalent to "unset". But null pointers aren't unset as pointers aren't initialized to null in C and you can explicitly set a pointer to null.

E.F. Codd added nulls to relational model in 1970 so that does pre-date C. The concept is even older than that I imagine.

▲recursive 293 days ago

In nth normal form, you can't have 'no value'. That would mean your model is wrong. In academic relational data books, null does mean "unknown". There is a value, we just don't know what it is (yet).

If there might actually not be such a value, you're supposed to change your schema to reflect that.

▲wvenable 293 days ago

> There is a value, we just don't know what it is (yet).

In all my years, I've never used null for that. If I don't have a value yet then generally I'm not writing any part of the record. I only ever use nulls for the absence of a value. Creating a new table for every potential optional column is the "academic relational way" and also insane. :)

▲chongli 293 days ago

What happens if your data is produced by some automated process such as a sensor reading and occasionally the sensor fails to return a value? NULL seems exactly the appropriate value to use.

▲recursive 293 days ago

Then you're supposed to use another table with a foreign key to canonical measurement record. This is the concept of fully normalized schemas.

What you're describing is closer to how people do it in practice.

▲chongli 293 days ago

I'm still a bit confused. Suppose you have another table, call it temperatures with columns id and temperature, where every row contains only a valid temperature (no NULL records), and you have a main logging table with date and temperature_id so that you can join on temperature_id = temperatures.id. This seems to be what you mean, with a canonical measurement record table related via the temperature_id foreign key.

But then if your sensor fails to record a measurement don't you end up with NULL for that row's temperature_id?

▲recursive 292 days ago

Foreign key would probably go the other way:

LogEntry(LogEntryId, Date)

Temperature(TemperatureId, LogEntryId, DegreesF)

If there is no temperature measured, then you don't create a record in Temperature.

▲wvenable 293 days ago

Ah but then how do you record that the measurement actually happened but did not produce a value? I want a record of that failure.

I mean sure, you could do yet another table. But honestly that level of normalization is much more work than it's worth. Just because it's some academic definition doesn't make it right.

▲recursive 292 days ago

You put a record in the Measurement table, and none in the Value table.

> But honestly that level of normalization is much more work than it's worth

Yes. I question whether it's worth anything to begin with.

▲rawgabbit 292 days ago

     Select M.*, V.* from Measurement as M left outer join Value_table as V…

You end up with nulls again.

▲recursive 292 days ago

Yes. That's part of the semantics of outer joins.

But there is no stored null representing 'no value'.

▲floating-io 293 days ago

You'd also have to ask when NULL came into common use in C (to which I do not know the answer). AFAIK NULL was not considered to be standard until C89. As far as I'm aware, all C compilers implement it as a #define in a standard header somewhere; it's not actually part of the core language itself.

I wonder who first added that macro? Was it there from the beginning?

Just random thoughts...

▲bunderbunder 294 days ago

> ... and this is even less obvious if you’re used to using ORMs.

Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.

As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.

▲feoren 293 days ago

Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose relational queries dynamically, with the full power of your primary language, instead of inside of database stored procedures in a language totally devoid of any support for basic software engineering best practices. They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it! SQL is not the only possible implementation of the relational model, and it's not even a particularly good one. Even SQL's founders don't think it implements EF Codd's relational model very faithfully. Good ORMs act as a domain-specific language for the relational model embedded inside the parent language.

▲rqtwteye 293 days ago

"compose relational queries dynamically"

That's an important one. It would be super nice to have a SQL dialect that works more like LINQ where you can compose your queries easily. I always hate it when I have to write SQL directly. It's super powerful but the syntax just isn't designed well. To me it feels like a throwback to the good old FORTRAN or COBOL days: you can get stuff done but modern languages are so much better.

▲fauigerzigerk 293 days ago

>They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it!

But this has never been their primary purpose and it's not what they are good at. ORMs are supposed to map the relational model into an object oriented model so that you can work with objects rather than sets of tuples. And that's exactly how people use them.

ORMs incentivise people to replace simple and declarative set operations with complex procedural code operating on individual objects.

ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.

▲feoren 293 days ago

Exposing the relational model to be manipulated dynamically within the parent language is exactly what LINQ in C# is. That was its primary purpose. True, LINQ is not itself an ORM -- it was built to support other ORMs like LINQ to SQL and Entity Framework, which aren't as "pure" on this subject. I don't actually like the LINQ syntax that much since it's not as extensible, but its existence is proof that the C# team did in fact intend to expose the relational model to C#.

Entity Framework did try to cater to the "SQL is scary, let me use objects" crowd, and that is the majority of how it's used, and that is a mistake in my opinion. But it is also very good at supporting relational algebra within C# and composing queries dynamically; ironically, it's best at it if you disable or avoid many of its features (dynamic subclassing, linked objects). Pass IQueryables around and compose them together and life is good. Updating needs work, but updates have always been a weakness for SQL too.

▲zzzeek 293 days ago

> ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.

and that's why ORMs are so unpopular and entirely absent from successful production applications for the past 30 years

▲fauigerzigerk 293 days ago

All ideas that were popular for a while are basically guaranteed to be in some successful applications. That includes bad ideas.

▲rqtwteye 293 days ago

Are you sure about this?

▲taco_emoji 293 days ago

yeah I mean, isn't all of this true for ANY abstraction? Once you're off the beaten path, they're all hard to debug and optimize, they introduce extra complexity, etc. BECAUSE they are attempting to abstract away certain details

This is true for an HTTP library as much as it is an ORM.

▲fauigerzigerk 293 days ago

>yeah I mean, isn't all of this true for ANY abstraction?

No, it is only true for bad abstractions.

▲thrance 294 days ago

My experience with ORMs is that most of the time you end up needing to write some queries by hand, in raw SQL. Usually these are the most complex, that you can't express in your ORM's DSL. My point being, I don't think using an ORM really shields you from having to learn how it works behind the scenes.

▲globular-toast 293 days ago

It's not even about having to write SQL by hand. In an ORM like Django that's exceedingly rare. But you still need to understand what's going on underneath. In other words, it's the most leaky abstraction there is. I think the popularity is mostly aesthetic and convenience. Most people into ORMs like Django don't really know about layered architecture and that you can keep all your SQL in one place in the data access layer. They just scatter that stuff everywhere in the codebase.

▲feoren 293 days ago

I don't know Django specifically but I'm always floored by how people talk about ORMs. They're only a leaky abstraction if you believe their point is to shield terrified junior devs of the inner workings of the scary relational database. That's an awful way to use ORMs, and the source of most of the flak they get. To be fair, some are designed that way, or at least strongly push you toward it.

Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose SQL dynamically, with the full power of your language. SQL is an awful language to write application logic in, because it has horrible support for abstraction, composition, encapsulation, dependency injection, etc. The ORM gives you a way to produce SQL in an environment that actually supports basic software engineering principles. Scattering ORM logic everywhere in the codebase is the point: putting all your SQL in one data access layer is like putting all your arithmetic in one calculation layer. Why would you ever do that? What's wrong with seeing a plus sign in more than one file? What's wrong with seeing language-encoded relational logic in more than one file?

I can guarantee you the popularity is not "aesthetic". And convenience is a real thing that actually does reduce costs. People complain about ORMs, but have you seen the absolute horse-shit-level code that people jam into SQL functions and procedures to do the utterly most basic things? The standard for what ends up in SQL Stored procedures is the most unmaintainable garbage in the entire software engineering ecosystem.

▲globular-toast 293 days ago

Are you thinking of something like LINQ or SQLAlchemy Core? You do not need to use an ORM to interface with a SQL database with basic software engineering principles.

The reason for a data layer is because the underlying data representation might change. For example, you might change the schema to handle some new performance requirement. Now you have to hunt down and change everywhere you've queried that table in your whole code base. Every time you directly call SQL you are coupling your code strongly to the database schema which is ultimately an implementation detail.

> And convenience is a real thing that actually does reduce costs.

I know convenience is a real thing. I also know that it very often increases costs in the long run.

▲feoren 292 days ago

> Are you thinking of something like LINQ or SQLAlchemy Core? You do not need to use an ORM to interface with a SQL database with basic software engineering principles.

Well you certainly can't do it all in SQL, because SQL doesn't support basic software engineering principles at all. That means you're picking a parent language that does support the principles you want, and attempting to build a library that allows you to build, pass around, and compose relational queries in a way that leverages the benefits of the parent language. To do this I'd argue you need these things:

1. a structured representation of the (possibly incomplete) relational query as it's being built, e.g. SQLAlchemy's internal representation or IQueryable in C#

2. a method of building those representations in a way that conforms to the norms of the parent language and leverages its benefits as much as possible, e.g. LINQ or direct use of the various extension methods on IQueryable. It seems like you could pick either SQLAlchemy or SQLAlchemy Core depending on whether you want to think in terms of objects or queries (I'd usually pick queries), but I'm not that familiar with SQLAlchemy.

3. a method of translating the query representation into a particular SQL dialect.

I don't know what exactly your definition of "ORM" is, but I'd argue that if any of those steps involve an object-level representation of the schema of your tables (e.g. "class Employee { int ID }" type of thing) then it's an ORM. Do you need that? Well, no, probably not, but why not? C# is strongly typed, so why wouldn't I tell its type system what the schema of my table is? That's part of leveraging the benefits of the parent language.

> The reason for a data layer is because the underlying data representation might change. For example, you might change the schema to handle some new performance requirement. Now you have to hunt down and change everywhere you've queried that table in your whole code base.

This is an argument for strong typing, not for putting all your code that has the word "select" in it in the same file. And building queries with a strongly typed representation of your schema in the parent language is the point of the ORM!

Without an ORM, you still have to hunt down tons of spots where things change when you change your schema. How do you ever not have to do this?

> Every time you directly call SQL you are coupling your code strongly to the database schema which is ultimately an implementation detail.

There exists some canonical relational model of your domain. This model is not an implementation detail, it is the foundation of most of your code. I choose to keep my SQL schema as close as possible to this canonical relational model as I can. The fact that a Monitoring Instrument has an Operating Status is a fundamental fact about my model; whether that column is called "operating_status" or "OperatingStatus" or "operating_status_ID" is an implementation detail. It's just hard to agree with you that "the database schema" is merely implementation detail -- clearly parts of it are not, unless it's completely untethered from your canonical domain model (which would be a nightmare, of course). Of course your code is strongly coupled to the fundamental domain model upon which it is operating. I'd certainly agree that it'd be nice to not have to worry about implementation details like the exact naming convention of the columns -- which, of course, ORMs let you do.

>> And convenience is a real thing that actually does reduce costs.

> I know convenience is a real thing. I also know that it very often increases costs in the long run.

Code that is easier to write is generally also easier to read and maintain. That's what I mean when I say "convenience". Simplicity and convenience go together.

▲Izkata 293 days ago

> I don't know Django specifically but I'm always floored by how people talk about ORMs.

Django seems to be an outlier on the side of "an ORM that's actually good". Whenever people have specific technical complaints about ORMs, it's generally not a problem or there's already a solution in Django you just have to use. It's usually only when you get to the more conceptual stuff like object-relational mismatch that such complaints tend to apply to Django.

▲hot_gril 293 days ago

If you can avoid learning SQL or the underlying DBMS then great, I have no problem with people ignoring things that are ignore-able. Problem is you can't. You will need to learn the DBMS, and the ORM will end up just getting in the way of what you really want to do.

▲DangitBobby 292 days ago

I've used ORMs extensively in my career, and I've seen it trotted out as a Boogeyman here a million times. Why do I still prefer writing queries with a good ORM over awkwardly composing SQL by conditionally concatenating strings? Is buggy, error prone string concatenation with a bunch of if statements the true way? What am I missing?

▲hot_gril 292 days ago

Conditional concat should be pretty rare. When are you doing that?

▲DangitBobby 292 days ago

When I need a new where clause or a new order by, or I have to stop using an IN filter because you can't use an empty tuple. Stuff like that.

▲hot_gril 290 days ago

Yeah, if you're going to conditionally add filters, this is mostly the only way with SQL. There are a few tricks depending on the dialect, like =ANY() is more convenient than IN, or COALESCE can avoid some dynamic WHEREs.

You're not really missing anything other than query builders, which are worse for other reasons.

▲zzzeek 293 days ago

I was actually going to comment on how breathtakingly ignorant the blog post author was for making this statement. What on earth ORM would make a UNIQUE constraint against multiple columns without an explicit instruction, much less where some of the columns are NULLable and then leave the user entirely unaware that anything like that was happening? ORMs do nothing like this and the developer would always have built this constraint explicitly and therefore is working at the DDL layer. ORMs have nothing to do with whether or not someone knows SQL just like cars have nothing to do with whether or not someone knows how to walk.

▲irrational 294 days ago

I expected the article to mention how in Oracle NULLs are equal to empty strings. Now that is weird.

▲hyperman1 294 days ago

Oh man. I had a talk with a DBA about how oracle could not deal with an adress with no street name - literally a tiny village with 10 houses on 1 nameless town square. It was unsearchable in parts of the app because street='' was interpreted as street is null. DBA kept claiming oracle was right and the town should adapt their naming to our software.

This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs

▲aidenn0 293 days ago

> This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs

That was definitely one part; another part was sharp corners in MySQL (at least as of 20 years ago; I would be surprised if many of them haven't been rounded off in the meantime). The last part was places with no DBA with developers unaware of how to handle schema migrations.

▲zo1 293 days ago

It's weirder. If you insert an empty string into a VARCHAR field in Oracle, it returns Null back to you when you query that same field. At the very least, I'd expect a software system to behave in a deterministic way. I.e. either throw an error because you're not doing something right (whatever Oracle deems right in this case), or give you back what you gave it, especially for database software who's entire role of existence is to persist data without side-effects.

▲svieira 294 days ago

Domain-embedded nulls are the bane of my existence.

▲datadrivenangel 294 days ago

SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.

▲grahamlee 294 days ago

Exactly this. SQL is based on the relational algebra and that's well-defined, NULL along with other features of SQL work in an entirely regular and predictable way. The only time it's weird is when a developer decides that it should work the way Javascript (or whatever) NULLs work because that's the last time they saw the same word used in a programming language, in which case it's the assumption that's weird.

▲tzs 294 days ago

That's not the only time it is weird. There's even a whole book by one of the pioneers of the relational DB model, Date's "Database Technology: Nulls Considered Harmful" [1], covering many of the ways it is weird.

[1] https://www.amazon.com/Database-Technology-Nulls-Considered-...

▲setr 294 days ago

The part that’s weird with nulls is that it’s a trinary logic stuffed into a boolean algebra. The use of x = NULL instead of x IS NULL is pretty much always a mistake.

More importantly, x = value instead of (x = value and x IS NOT NULL) is almost always a mistake, and a stupidly subtle one at that. And for this curse, we get… nothing particularly useful from these semantics.

Also the x != NULL case is completely cursed

▲grahamlee 290 days ago

> The part that’s weird with nulls is that it’s a trinary logic stuffed into a boolean algebra.

It's a three-valued logic (though not trinary, which would use a base-3 number system) in a three-valued algebra: specifically, the relational algebra. The outcome of a logical test has three values: true, false, or NULL; this is distinct from Boole's algebra where outcomes have a continuous value between 0 and 1 inclusive.

▲dataflow 292 days ago

> SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.

Could you explain how this makes sense then?

  SELECT ...
  WHERE NULL

If NULL is just "unknown" then shouldn't this be a type error?

Moreover, could you explain why the heck this ought to be empty?

  WITH T AS (SELECT 1 AS C1,
             NULL AS C2)
    SELECT C1, C2
    FROM T
    INNER JOIN T
    USING (C1, C2);

As a human this looks insane to me, "relational algebra" be damned. You find a row, then you look it up again, and it's not there? What the hell?

▲whstl 294 days ago

Agreed.

I will die on the hill that regular C-like nulls are the actual thing that's weird.

The real billion dollar mistake [1] was the damage it made on the minds of developers.

[1] https://en.wikipedia.org/wiki/Tony_Hoare

▲chuckadams 294 days ago

Even null in programming languages isn't so bad if it's a distinct type. The problem with null in languages like Java is that null is part of every reference type (C's pointers are another world of broken, null being basically just another unsafe invalid address).

Most languages nowadays do get nulls right, even PHP of all things.

▲PhilipRoman 293 days ago

>unsafe invalid address

Ironically NULL is probably the safest pointer value in C, as any dereferences from it (and thousands of pages that follow it) are guaranteed to crash.

▲bregma 293 days ago

Well, that's a problem on many of the devices I've seen: zero is a valid memory address and dereferencing it does not cause any kind of crash. In fact some hardware requires reading or maybe even writing to that address. In an age of virtual memory there's no reason why zero should cause a crash and it wastes an entire page of memory for every application to make that happen, if it does.

▲PhilipRoman 293 days ago

I haven't had the chance to work on any MMU-less devices, but I don't quite follow your remark about wasting a page. Crashing is just the default behaviour in the absence of a mapping and consumes no resources.

▲bregma 293 days ago

The granularity of virtual address mapping is usually a page. On many systems, that's 4 kilobytes of address space. In order to trigger a fault when the address 0x0000000000000000 is dereferenced, it's necessary to map the entire address range from 0x0000000000000000 to 0x0000000000000fff to the same faulting behaviour.

That's a waste of a page.

▲PhilipRoman 293 days ago

Yeah but there is no "page" there. Assuming a process starts with an empty page table (meaning every single address will segfault), you don't have to do anything else to get the crashing behaviour for null.

Unless you're talking about some kind of device which supports virtual memory, but also by default sets up mappings, including at 0 which seems weird to me.

▲bregma 292 days ago

So if I have an empty page table 0x00000001 can be referenced as a valid address?

Memory mapping on all hardware with which I am familiar (which is not a small integer) is done by pages. If you have to fault on one address in a page (eg. zero page, stack canaries, heap guards) you have to fault on every single address in that page. But "memory is cheap". Until it's not.

▲chuckadams 291 days ago

Any system where losing one page would be too dear is likely not using a MMU in the first place.

Besides, memory access is twice as fast on the zero page! Wait, you're telling me the 6502 isn't the standard anymore? ;p

▲poincaredisk 294 days ago

We should start adjusting that billion for inflation.

▲iefbr14 293 days ago

When the null concept was introduced to me in the seventies, the only thing I could say was that it would be causing a lot of unnecessary confusion in the future. If you have missing values in your datarecord then that datarecord belongs in an exception-queue. And now some 45 years later people are still discussing it like we did then..

▲acuozzo 293 days ago

> If you have missing values in your datarecord then that datarecord belongs in an exception-queue.

This depends on the context, no? I doubt there exists someone with a contact list on their phone which has every single field for every single contact populated.

There needs to be some way to codify that a field in a datarecord is unpopulated. Using the "zero value" for the type of the field (e.g., the empty string) is reasonable, but is this necessarily better than NULL? I reckon an argument can be made that this approach is just as likely to lead to bugs.

I'm not necessarily in favor of NULL, for what it's worth, but I can't think of an adequate replacement which doesn't reduce to "NULL in sheep's clothing".

▲thfuran 289 days ago

I did last week, before I added the first contact.

▲DangitBobby 292 days ago

Missing values are not always an exception. There's a reason modern languages almost universally include an Option type which may contain a Null and allow you to pass it around as a first class value. Good data representations allow you to express missing values.

▲jfb 293 days ago

Sometimes you want UNKNOWN, sometimes you want MISSING.

▲iefbr14 292 days ago

Just give the data item a status field, don't fix it by medling with a designated (non)value. And while you are at it you can add some valid-from and valid-to fields for the item. That's how you do it proper.

▲zokier 294 days ago

SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.

▲giraffe_lady 294 days ago

NaN is cool because it's almost like a type that constrains uncertainty. What do we know about this entity? not much! but it's definitely not a number. Calling it anything else would have been a mistake.

Null is more confusing because it means different things in different languages. Sometimes it's a more constrained uncertainty, eg this definitely doesn't exist. But in sql it's a less constrained uncertainty, like "undefined" in math. The value of this thing couldn't make sense in this context, but we can't make assertions about its existence.

▲mplanchard 294 days ago

There's another comment in here that talks about thinking of NULL as UNKNOWN, and I quite like that. It makes a lot more sense, and I think it would've been a better choice to standardize on. UNDEFINED would also be an improvement.

▲reshlo 293 days ago

UNDEFINED would not be accurate. If your signup form has an optional field for a full name which I don’t fill in, I still have a name. Just because a value is not known by your database doesn’t mean it isn’t defined.

E. F. Codd thought about this issue.[0]

> Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' domain. It remains an active field of research though, with numerous papers still being published.

[0] https://en.wikipedia.org/wiki/Null_(SQL)

▲masklinn 293 days ago

> UNDEFINED would not be accurate. If your signup form has an optional field for a full name which I don’t fill in, I still have a name.

If your signup form has an optional field for middle name which I don’t fill, it can absolutely be because I don’t have a middle name. It’s undefined and known to be so.

> E. F. Codd thought about this issue.[0]

And because four value logic was left out, nulls have to fulfil multiple incompatible roles, and ends up being weird.

▲reshlo 293 days ago

> If your signup form has an optional field for middle name which I don’t fill, it can absolutely be because I don’t have a middle name.

The fact that some NULLs are indeed undefined doesn’t contradict my point that it would be inaccurate to refer to all NULLs as UNDEFINED.

> It’s undefined and known to be so… And because four value logic was left out, nulls have to fulfil multiple incompatible roles, and ends up being weird.

You know it to be undefined, but if you just leave that field blank on the form, the database still can’t know whether that’s because you don’t have one or because you didn’t provide the one you have, unless there was a field on the form that allowed you to explicitly indicate that you don’t have one.

▲benzayb 293 days ago

> If your signup form has an optional field for a full name which I don’t fill in, I still have a name. Just because a value is not known by your database doesn’t mean it isn’t defined.

There's the Closed World Assumption in a database.

▲reshlo 280 days ago

It’s often a bad idea to make that assumption.

▲adhamsalama 294 days ago

Except it is of type number, at least in JavaScript.

▲dunham 294 days ago

Also similar to the bottom value in haskell and exceptions in other languages.

▲galaxyLogic 293 days ago

In Object Oriented Context "null" is useful to indicate that some object doesn't have value for that property.

What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?

Does null mean

a) We don't know the value of this attribute for this object, or

b) We do know that there is no value for this attribute in the real-world object represented by our data.

In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.

EXAMPLE: The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.

In practical terms we can say null means "We don't know" which includes the case that there is no spouse.

▲Izkata 293 days ago

> In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.

Javascript objects have two kinds of undefined that are both represented by the same value. You have to use another method to see which it is, and I've seen "foo" in this example used for the same thing as "null" in your example:

  >> z = {foo: undefined}
  Object { foo: undefined }

  >> z.foo
  undefined

  >> z.bar
  undefined

  >> z.hasOwnProperty('foo')
  true

  >> z.hasOwnProperty('bar')
  false

This is something you have to account for because the key is still there if you try to remove a value by just setting it to undefined:

  >> Object.keys(z)
  Array [ "foo" ]

  >> for (let k in z) { console.info(k); }
  foo

This is the right way to remove the key:

  >> delete z.foo
  true

  >> z
  Object {  }

  >> Object.keys(z)

▲mnsc 293 days ago

But in a _relational_ database lack of spouse would not be modeled with a nullable column "spouse" but rather an absence of a spouse row/relation. Which is very real-world-like.

▲afiori 293 days ago

As a sort of challenge I had an idea of building an app using SQL as a pseudo ECS system where every table was either a 1 column table with only an id or a 2 column table with an id and a value.

▲DangitBobby 292 days ago

And yet when you do a join because you need to actually use that data, the resulting table will have a column with nulls in it. Any way you squeeze it, you need a way to represent empty values for your database to be useful.

▲mnsc 289 days ago

An inner join? Then there wouldn't be any nulls.

▲DangitBobby 287 days ago

IDK, do you want to exclude people without a middle name or a second address line when examining a product order list? I'll leave that one for you to decide.

▲andai 293 days ago

I remember from my databases course at university that NULL means that the database doesn't contain that data, and empty string means that it is known to be empty.

▲zo1 293 days ago

Let's also all be reminded about how Oracle DB doesn't let you insert empty strings, and instead treats them as NULLS even if you gave it an empty string initially.

https://stackoverflow.com/questions/203493/why-does-oracle-9...

That was a fun bug to find out, after having dealt with quite a few other DBs over the years. It was one of those "No, but surely" and "This can't be! This is Oracle!" moments. Found it while porting some old code that needed to store an empty string as being distinct from a NULL in that same column.

▲feoren 293 days ago

That's your professor's opinion, and probably one that does not come from industry experience. Look in 4 different databases and you'll see 9 different conventions. A common one is to have all strings non-null with a default value of empty string. And not all columns are strings; there is no "obviously empty" integer or boolean.

▲ 292 days ago

▲niij 293 days ago

What is the type is something other than a string?

age: null? married: null?

▲bogeholm 293 days ago

How about Option<bool> for that? In SQL you could have a nullable Boolean column

▲ 293 days ago

▲kopirgan 293 days ago

Recall this really funny dialogue in one of the Blackadder episodes.

The princess eyes are as blue as the stone of Galveston

Have you seen the princess eyes?

No!

Have you seen the blue stone of Galveston?

No!

So you're comparing something you've never seen with something else you've never seen!

That's NULL comparison

▲ludwik 293 days ago

Even if we set three-value logic aside for a moment, this behavior of NULL still makes sense intuitively.

The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.

This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.

▲Recursing 293 days ago

Surprised that this doesn't mention "IS DISTINCT FROM" ( https://modern-sql.com/caniuse/is-distinct-from )

(Although in rare cases that is even weirder: https://stackoverflow.com/a/58998043 )

▲ziml77 293 days ago

I'm glad SQL Server finally got this, but I wish the syntax was nicer. It's a multi-word infix operator that gets tough to read. I've been using Snowflake SQL recently and I like that they just made it a function called EQUAL_NULL

▲lolpanda 293 days ago

I actually like how NULLs behave in SQL. They mean "I don't know" In the modern programming language we all care about Null safety. But no matter how you model your data, you will always run into the situations when you don't know everything. So I believe NOT NULL is not very practical. NULLs in SQL handle these case very well - when the input is unknown your output is unknown

▲int_19h 293 days ago

Except they don't consistently behave that way. If NULL means "unknown", why do they show up in outer joins, or when you SUM an empty table?

▲cglace 293 days ago

The most annoying is having to order by DESC NULLS LAST to get the largest value from an aggregation.

▲dalton_zk 293 days ago

I feel like the same, Null equal null is null is totally right

▲afiori 293 days ago

I feel like a select for:

- col1 = 1 should not return NULLS

- !(col1 = 1) should return NULLS

- col1 <> 1 should not return NULLS

▲kijin 293 days ago

Agreed. If SQL didn't have NULL, we'd have other special values meaning "I don't know" or "no data" all over the place.

Too many newbies hear that NULL is bad, so they declare all columns as NOT NULL and end up inserting ad hoc values like 0, -1, '', or {} when they inevitably come across cases where they don't have data. Which is even worse than NULL.

▲branko_d 293 days ago

NULLs are weird because they are basically two different types under the same name. The 3-value logic type is useful for representing "missing" foreign keys, but 2-value logic type is arguably more useful when searching/sorting/aggregating.

I think we would have been better-off by treating FKs (and maybe outer JOINs) as a special case, and using 2-value logic everywhere else.

▲al2o3cr 294 days ago

FWIW, you can explicitly change this behavior in Postgres as of version 15 - include "NULLS NOT DISTINCT" when creating the unique index.

▲kurtbuilds 294 days ago

If you want equality testing with nulls, you want to use `is (not) distinct from` instead of `=` and `<>` / `!=`.

`1 is not distinct from NULL` => false

`NULL is not distinct from NULL` => true

`0 is not distinct from 1` => false

▲blast 293 days ago

Having that is much better than not having it, but man is it verbose and confusing.

▲getnormality 293 days ago

I don't see why this is weird. Unique means no duplicates. Nulls can't be duplicates of each other because they're not equal to each other.

If you don't like null semantics, you're free to use sentinel values. You can make all the sentinel values the same, or you can make them all different. Either way, you or someone who has to use your system will be back here tomorrow complaining about how weird it is.

▲otteromkram 293 days ago

There's another good, technical write-up on NULL behavior in SQL at modern-sql.com

https://modern-sql.com/concept/null

(Note: I am not affiliated with that bloh/website in any way, shape, or form.)

▲gxt 293 days ago

This has always made queries unpredictable in many scenarios and it should be a feature to turn nulls off entirely and swap them out with Option<T> instead.

▲solumunus 293 days ago

How would you handle unmatched outer joins?

▲benzayb 293 days ago

By having a default value (non-null) for each declared type of those columns.

Or, the user must define a default value in the query itself.

Yes, tedious; but, precise and forces the programmer to really prepare for the "unknown" scenario.

▲masklinn 293 days ago

a left outer join b yields tuples of (A, Option<B>), a full outer join b yields tuples of (Option<A>, Option<B>)

▲hot_gril 293 days ago

Weird as they seem at first, SQL null handling ends up being convenient the way it is. Part of this is because left/right join give you nulls.

▲tzury 293 days ago

For Postgres specific approach, you may refer to

https://blog.rustprooflabs.com/2022/07/postgres-15-unique-im...

Practically speaking, I go with not null, and always set default value.

▲whartung 293 days ago

This reminds me back in the day when I was writing a DSL for a project.

Since the data we were getting was sourced from an RDBMS, I wanted NULL to be a first class concept in the DSL, with similar traits.

Early on, I simply made any expression that involved a NULL result in NULL. Naively this was all well and good, but it failed spectacularly in condition statements.

Instead of A = NULL == false, I had A = NULL == NULL. And, as you can imagine, a single NULL in the expression would just pollute the entire thing, and since NULL was considered as FALSE for conditionals, any NULL in an expression made the entire thing, eventually, FALSE.

Naturally I went back and made the comparison operators always return booleans. But it was a fun little side effect at the time.

▲at_a_remove 293 days ago

I have deep but vague thoughts around the concept. My first intuition is that we have put too many things under NULL and None and such.

Partially, we use ... and I'll be very broad here ... "variables" as boxes we look in for answers. Answers to questions, answers to "Hey I put something in there for the time being to reference later." If I went into programming terms rather than just SQL, sometimes we get meta-answers.

You haven't made the box yet (declared the variable).

You haven't decided how the box is structured (picked a type or a length or something).

Okay, you did those but the box is virgin (nothing has been placed in the box yet).

That kind of thing. An empty set for "yes, you asked but nothing meets those criteria."

▲criloz2 293 days ago

It is not supposed that null is the bottom value in the universe of all the values that your program can recognize? Why people need to complicate it?, and yeah in that definition `null == null`, but a `null_pointer != null` because null pointer is at the bottom of all the possible pointer value, and null by itself is not a pointer. The same for (0,null), (false, null) and ("", null). null should only be equal to itself.

And lastly undefined != null, because undefined is related with structures indicating that a field was not defined when the structure was created

▲jmyeet 294 days ago

NULL is the absence of a value. If you try and treat it as a value, you're going to have a bad time. So an attempted UNIQUE(email_address, deleted_at) constraint is fundamentally flawed. If you treated NULL as a value that could be unique, you're going to break foreign keys.

But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:

    SELECT /* ... */
    FROM accounts
    WHERE email_address = '...'
    AND deleted_at IS NOT NULL

Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.

If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.

Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.

Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.

▲chuckadams 294 days ago

> Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.

Then don't do that. It's kind of a leap to say soft deletes are categorically bad because someone might confuse "deleted" with "inactive". My users table does the super-advanced thing of having both columns. The ORM also doesn't forget to add the not-null criterion. There's also zero databases in active use where it poses a problem to indexing.

Soft deletes suck in their own way, but none of the alternatives are perfect either.

▲giraffe_lady 294 days ago

You put the "is not null" on the index itself and then simply don't use it for the much rarer queries that are on deleted accounts. Or just use a view for active accounts.

Overloading timestamps to carry a boolean on null is awesome as long as you decide that's what you're doing and use one of the several standard techniques to dodge the easily avoided potential downside.

This isn't a valid security concern, more than any other incorrect sql query would be anyway. A dev can always write a bad Q, you need another way to address that it's not more likely here because of the null.

▲indeed30 294 days ago

That's interesting - I believe this is exactly how Sequelize implements soft-deletion.

▲qwertydog 293 days ago

SQL NULL is also coerced to different boolean values depending on context e.g. in a WHERE clause NULL is coerced to false, whereas in a CHECK constraint NULL is coerced to true

https://dbfiddle.uk/C5JqMP8O

▲jakubmazanec 292 days ago

That's why I like EdgeDB and EdgeQL: no null [1]

[1] https://www.edgedb.com/blog/we-can-do-better-than-sql

▲trollbridge 293 days ago

And SQL null shares the feature with many other languages that any type can be a NULL (although a column can be set NOT NULL). Much like Java, it is no end of grief that a type that claims to be, say, a “NUMBER” is actually “NUMBER | NULLType”.

▲boxed 293 days ago

All of this would be avoided if NULL in sql was just called "UNKNOWN". Which is what it is. Terrible name :/

Imo, SQL should add "NOTHING", add "UNKNOWN" as a synonym for "NULL", and deprecate "NULL".

▲exabrial 293 days ago

null != null is pretty bizarre at first, until you understand the reason the did it was to try to make sense of null-able indexed columns. Not sure why we couldnt have our cake and eat it, but instead we got IS NOT NULL is not the same as != NULL

▲xd 293 days ago

To me, "Unknown" almost implies the possiblity of a value, whereas I've always thought of NULL as being an absence of a value.

edit: an empty string, false, 0 are all values.

▲anonnon 293 days ago

The simplest end-run around this is to avoid NULLs entirely, which normalization (even just the first normal form) requires.

▲osigurdson 293 days ago

Weirder still are floating point numbers in SQL.

▲wruza 293 days ago

I think (blasphemous hot take ahead) that the standards of implementation of relational models are wrong.

NULLs still have their (rare) place, but the foremost issue with query results is that they are tabular rather than hierarchical. The main culprits being (1) outer joins that represent or induce nonsensical operations and (2) lack of non-null “zero” values for types like date. Of course hierarchies can make querying more complex, but mostly in cases where the relational logic goes crazy itself and you had to go tabular anyway.

If you think of it, distinct, group by and windowing feel like workarounds in tabular mode but would be natural to hierarchies, because everything is naturally distinct and grouped-by by design and windows are basically subtables in these rows.

Bonus points you could fetch “SELECT FROM a, b_rows LEFT JOIN b AS b_rows …” in a single query without duplicating `a`s and nullifying `b`s when N <> 1. And when you aggregate through a column in `b`, there’s no headache what to do with join-produced NULLs (unless `b` columns are nullable by your design, then it’s on you). And when it all arrives to a client, it’s already well-shaped for ui, processing, etc. No more:

  last_a_id = undefined

  for (row of rows) {
    if (row.id != last_a_id) {
      …
      last_a_id = row.id
    }
    …
  }

I’m pretty sure you recognize this programming idiom immediately.

Before you criticize, I’m not talking about hierarchical/OO tables. Only about ways of getting and handling query results. You still can reshape a relation like you want. The difference is that a database engine doesn’t have to put it all onto a (N x M x …) table and instead creates sort of a subset of relations which is efficient in space and natural to walk through. It already does that when walking through indexes, selects are naturally hierarchical. All it has to do is to track relations it went through rather than just dumping rows from a set of cursors that it knows the start and end points of, but loses this knowledge by writing into a plain table.

▲drzaiusx11 293 days ago

Ah yes, someone discovering the existence of three value logic in SQL and expecting 2VL behavior. Classic. We've all been there, right?

Personally I wish more languages were like python or ruby and had chosen None or Nil over Null which alleviates the confusion a bit, as those names better indicates that it's NOT an "unknown" (1 unknown value != 1 other unknown, which intuitively makes sense.) In ruby or python it's more obvious that None and Nil are "nothing" types and therefore equivalence makes sense (nil == nil, None == None are both true)

▲ungut 294 days ago

The NULLs in unique constraints quirk actually works differently in ORACLE databases, which is infuriating to say the least. Apparently this comes from some ambiguity in some sql standard, anyone know more about this?

▲MathMonkeyMan 292 days ago

All I know is from this SQLite article: <https://www.sqlite.org/nulls.html>

▲khana 293 days ago

[dead]