Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know if this explanation is a good one, but I'll try using Haskell syntax. In Haskell, you can have product types like:

  data CartesianCoordinate = Coord Float Float
where an element of this type is expressed as `Coord x y` where x and y are both floats. Examples of elements of this type are `Coord 1.1 0.9` or `Coord -2.9 10.0`, etc. Product types are equivalent to structs in C, if you're familiar with C.

But you can also have sum types. Instead of starting with the general idea, I'll point out that C enums are a special case of sum types:

  data Day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday
and then point out linked lists are more representative of the general idea:

  data ListOfFloats = Node Float ListOfFloats | End
where an element of `ListOfFloats` is for instance `Node 1.0 (Node 2.0 End)`, or `Node 0.3 End`, or just `End`.

The pipe symbol | is what makes a sum type a sum type. It means that an element of the type is either one possibility or another possibility or another possibility.

One final example is a type consisting of all possible mathematical expressions. This is also a sum type:

  data Expr = Add Expr Expr | Times Expr Expr | Negate Expr | Inverse Expr | Const Float
An element of this type can be something like `Add (Const 1.5) (Times (Const 0.2) (Const 2.8))`, which is supposed to represent the expression "1.5 + 0.2*0.8". Interestingly, you can't easily express this type in most OOP languages.

In simple set theory parlance, product types refer to Cartesian products, and sum types are set-theoretic unions.

The relevance to relational databases is that each row of a table corresponds to an element of some product type. Each row of the same table has the same product type. But there is no means defining a "table" whose elements belong to a sum type as opposed to a product type. Why is that?



As the parent mentioned, you can encode your sum type by providing all the columns and constraining exactly one to be non-null.


Yes, you can encode it, but you shouldn't have to invent and apply this encoding manually. It's error-prone, it's less efficient, and you lose information.

It's like saying you don't need foreign key constraints as built-in concept as long as you have triggers, because you can encode such integrity checks as triggers. Technically true, but no one is going to buy that is a legitimate argument against foreign key constraints.


I guess what I wanted to say is that there is nothing in the relational model per se that prevents encoding sum types. Of course actual implementations (i.e. SQL) might lack the required syntactic sugar.


I agree, I'm just pointing out that when you have some abstraction X you almost always want to also provide its categorical dual ~X, or you end up having to write awkward encodings to simulate it.

Databases provide product types, and the dual of product types are sum types. As you point out, you can encode this in various ways, but it's not natural and very error-prone.


That’s the most sensible solution. AFAIK however there neither a cross-dialect way to specify that specific constraint, nor a simple way to SELECT the column name and the value of the unique non-null value.


but wouldn't that open up possibility of having 2 columns checked? (when in a proper sum type you can't be 2 values at the same time)


No, the constraint can require exactly one column. PostgreSQL even has an optimized builtin function for this, `num_nonnulls`.

In a recent feature I had an object field modelled as:

      type Destination = Customer of Customer | Supplier of Supplier | Warehouse of Warehouse
and the table representation was

      customer_id uuid null
    , supplier_id uuid null
    , warehouse_id uuid null
    , constraint unique_destination check (num_nonnulls(customer_id, supplier_id, warehouse_id) = 1)
It's not first class support, but manageable enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: