This is something we saw at Chartio very early on. It's why we built the concept of layers directly into our product to support cross-DB networked joins:
What if the limits on the lateral subqueries were 2 instead of 1, and they were doing select * instead on select sum() in the outer query? How would you recreate that with correlated SCALAR subqueries? There's no such thing as non-scalar correlated subqueries is there?
SELECT unnest(ar).* FROM
(SELECT ARRAY(SELECT tbl FROM tbl
WHERE .. ORDER BY .. LIMIT 2) AS ar
FROM .. OFFSET 0) ss;
If you want a specific set of columns instead of *, you'd need to create a custom composite type to create an array of, since it's not possible to "unpack" anonymous records.
The `sum(1)` and `order by...limit` approach really isn't the best way to build the funnel. And if you take another approach then this could have easily been built with normal left joins.
Also, you should probably show some explain plans before making this claim: "Without lateral joins, we would need to resort to PL/pgSQL to do this analysis. Or, if our data set were small, we could get away with complex, inefficient queries."
Here's a comparison of the explain plan from your query without the `sum(1)` and `order by...limit` business and a query using only left joins (no use of lateral): [link redacted]. Note, I ran this against an empty copy of your exact table (no data, no statistics). However, the explain plans are the same.
My understanding is that lateral was really meant for set returning functions like generate_series as others have already mentioned.
Edit: I should mention I know you were just trying to demonstrate how lateral works and that it is always good to see people writing about new Postgres features!
Looking at this a little further, the outer nested loop could cause issues in the left join vs. left join lateral version, depending on how many use_demo events there are in the week following the user's first view_homepage event. I added another query that uses CTEs which allows for intermediate aggregation which should make the size of the nested loops similar between both versions. However, I wouldn't be surprised if the CTEs take more memory than the lateral joins because CTEs are basically temp tables that only last for the duration of the query. Lateral may indeed be the best option but ideally I would populate this table with real data, gather statistics, and then run explain analyze on each query.
Yet they say their revenue is only "well into the 6-figures"? How is this possible? If only 5000 of those customers are paying the lowest per rep per month rate of $25 than that amounts to $1.5 million over the course of a year. And I have to believe a lot of their customers have more than one rep. Churn couldn't be making that much of a difference could it? Knowing Dan from his writing I'm sure he's going out of his way to please customers.
Not trying to call them out. I really like Dan and have learned a lot from his writing. I hope they do actually have that many customers paying $25-99 per rep per month. Just wondering how it adds up?
Along the same lines it makes it easier to comment out items when debugging. Especially if you add in tricks like SELECT 0.
For example multiple lines in this query would cause an error:
SELECT a.field1,
a.field2,
-- b.field1,
-- b.field2,
c.field1,
-- b.field3
FROM a,
-- b,
c
WHERE --a.field1 = b.field1
AND a.field2 = c.field1
This query would not cause an error (Note that DUAL is a dummy table in Oracle that contains 1 column and row):
SELECT 0
, a.field1
, a.field2
--, b.field1
--, b.field2
, c.field1
--, b.field3
FROM dual
, a
--, b
, c
WHERE 0=0
-- AND a.field1 = b.field1
AND a.field2 = c.field1
It does add a 0 column to the result set but that can be dealt with or removed after development.