> you get the added benefit of writing queries in JSON instead of raw SQL. I’m s...

fhkatari · 2025-05-16T22:45:39 1747435539

You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?

mritchie712 · 2025-05-17T01:33:08 1747445588

The JSON compiles to SQL. Have you used a semantic layer? You might have a different opinion if you tried one.

e3bc54b2 · 2025-05-17T05:18:07 1747459087

As someone who actually wrote a JSON to (limited) SQL transpiler at $DAYJOB, as much fun as I had designing and implementing that thing and for as many problems it solved immediately, 'tail wagging the dog' is the perfect description.

ljm · 2025-05-18T13:22:50 1747574570

    SELECT email FROM users WHERE deleted_at IS NOT NULL OR status = 'active'

seems more semantic to me at first glance than piping this into a JSON->SQL library

    {
      "_select": "email",
      "_table": "users",
      "_where": { 
        "deleted_at": { "_is": { "_not": SQL_NULL_VALUE } },
        "_or": [
          { "status": "inactive" },
        ]
      }
    }

which is usually how these things end up looking.

IncreasePosts · 2025-05-17T02:52:17 1747450337

You're right, it's a bit ridiculous. This is a perfect time to use xml instead of json.

meindnoch · 2025-05-17T10:02:19 1747476139

Clearly the right solution is to use XML Object Notation, aka XON™!

JSON:

  {"foo": ["bar", 42]}

XON:

  <Object>
    <Property>
      <Key>foo</Key>
      <Value>
        <Array>
          <String>bar</String>
          <Number>42</Number>
        </Arra>
      </Value>
    </Property>
  </Object>

It gives you all the flexibility of JSON with the mature tooling of XML!

Edit: jesus christ, it actually exists https://sevenval.gitbook.io/flat/reference/templating/oxn

indymike · 2025-05-17T11:28:18 1747481298

We had an IT guy who once bought an XML<->JSON server for $12,000. Very proud of his rack of "data appliances". It made XML like XON out of JSON and JSON that was a soup of elements attributes and ___content___, thus giving you the complexity of XML in JSON. I don't think it got used once by our dev team, and I'm pretty sure it never processed a byte of anything of value.

indymike · 2025-05-17T00:43:13 1747442593

This may be the best comment on Hacker News ever.

sgarland · 2025-05-17T14:42:45 1747492965

I think that honor still belongs to "Did you win the Putnam?" [0] but this is definitely still in the top 5.

[0]: https://news.ycombinator.com/item?id=35079

mritchie712 · 2025-05-17T11:41:35 1747482095

LLMs are far more reliable at producing something like this:

    {
      "dimensions": [
        "users.state",
        "users.city",
        "orders.status"
      ],
      "measures": [
        "orders.count"
      ],
      "filters": [
        {
          "member": "users.state",
          "operator": "notEquals",
          "values": ["us-wa"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "orders.created_at",
          "dateRange": ["2020-01-01", "2021-01-01"]
        }
      ],
      "limit": 10
    }

than this:

    SELECT
      users.state,
      users.city,
      orders.status,
      sum(orders.count)
    FROM orders
    CROSS JOIN users
    WHERE
      users.state != 'us-wa'
      AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;

sgarland · 2025-05-17T15:16:03 1747494963

This doesn't make sense.

From a schema standpoint, table `orders` presumably has a row per order, with columns like `user_id`, `status` (as you stated), `created_at` (same), etc. Why would there be a `count` column? What does that represent?

From a query standpoint, I'm not sure what this would accomplish. You want the cartesian product of `users` and `orders`, filtered to all states except Washington, and where the order was created in 2020? The only reason I can think of to use a CROSS JOIN would be if there is no logical link between the tables, but that doesn't make any sense for this, because users:orders should be a 1:M relationship. Orders don't place themselves.

I think what you might have meant would be:

    SELECT
      users.state,
      users.city,
      orders.status,
      COUNT(*)
    FROM users
    JOIN orders ON user.id = orders.user_id
    WHERE
      users.state != 'us-wa' AND
      orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;

Though without an ORDER BY, this has no significant meaning, and is a random sampling at best.

Also, if you or anyone else is creating a schema like this, _please_ don't make this denormalized mess. `orders.status` is going to be extremely low cardinality, as is `users.state` (to a lesser extent), and `users.city` (to an even lesser extent, but still). Make separate lookup tables for `city` and/or `state` (you don't even need to worry about pre-populating these, you can use GeoNames[0]). For `status`, you could do the same, or turn them into native ENUM [1] if you'd like to save a lookup.

[0]: https://www.geonames.org

[1]: https://www.postgresql.org/docs/current/datatype-enum.html

Agraillo · 2025-05-17T12:39:52 1747485592

The programming languages are more predictable than human. So the rules are much easier to be "compressed" after they're basically detected when fed with big data. Your two examples imho are easily interchangeable during follow-up conversation with a decent LLM. Tested this with the following prompt and fed a c fragment and an SQL-fragment, got in both cases something like your first one

> Please convert the following fragment of a programming language (auto-detect) into a json-like parsing information when language construct is represented like an object, fixed branches are represented like properties and iterative clauses (statement list for example) as array.