More

scottcodie · 2025-11-02T22:23:47 1762122227

One thing the other blog post missed and this post misses too is that you don't need Kafka to use Debezium with Postgres. This gives you a pretty seamless onramp to event streaming tools as you scale.

gunnarmorling · 2025-11-03T08:49:09 1762159749

Are you referring to using Debezium embedded as a library? If so, yes, it absolutely has its place; for instance, it's used by Flink CDC. There's pros and cons to either way of running Debezium. Seeing embedded Debezium a lot for in-app use cases, for instance cache invalidation. Going through Kafka allows for reply and setting up multiple independent consumers for the same change event stream.

scottcodie · 2025-10-19T03:25:27 1760844327

I've spent my entire career developing databases (oracle, cassandra, my own database startup). Knowing if your workload is read or write heavy is one of the first questions when evaluating database choice, and is critical for tuning options. I would give this article hate just because it feels partially written by AI and the title needs a possessive 'your' in it, but its core ideas are sound and frame the issue correctly.

EGreg · 2025-10-19T06:31:54 1760855514

What about asking whether it should be a row database (sqlite) or column database (duckdb)...

for a data lake or analytics prefer columns? / read heavy

what would a row based db be better for? OLTP / write heavy?

gigatexal · 2025-10-19T06:35:29 1760855729

General rule of thumb is OLAP (DuckDB, BigQuery, redshift, etc) db’s are better at reads (think analytics) and OLTP (Postgres and MySQL and salute) ones are better for writes (think order systems, point of sale).

Things get muddied when things like the HTAP stuff are bandied about where they promise the best of both worlds.

ownagefool · 2025-10-19T07:16:47 1760858207

Ordering System is a good example because you typically want both. Your base logic will probably exist in OLTP with joins and normalised data, and you'll generally have local on-device OLTP databases.

Reporting on your Ordering System is an OLAP problem though. Generally an OLAP database stores data on disk in a way that it only needs to read the selected columns and the performance is better with wider columns, i.e. lots of duplicated data ( JOINs are slow ).

So like, you select * from Customer, Order, Items, Device, Staff, stick it in your OLAP database that's where customers should generate reports. This both makes reporting more performant, but it also removes the problem from the critical path of your POS device syncing and working.

This has the added benefit that updating your product name won't update the historical log of what was done at the time, because what was done at the time was done at the time ( but you can still map on like productId if you think the data is relevant. )

At scale you want to pop the writes on a queue and design those devices to be as async as possible.

This is what happens when you just build it pure OLTP.

https://www.linkedin.com/pulse/nobody-expects-thundering-her...

This was an ~£19m ARR POS company dying because of architecture, now doing £150m+ ARR. ( the GTV of the workloads are multiple times that, but I can't remember them ).

gigatexal · 2025-10-19T19:05:11 1760900711

> Reporting on your Ordering System is an OLAP problem though. Generally an OLAP database stores data on disk in a way that it only needs to read the selected columns and the performance is better with wider columns, i.e. lots of duplicated data ( JOINs are slow ).

This sounds like the one big table approach. Which in my experience is very difficult to do right and only makes sense in the data mart sense.

Google’s Adsense data model I’m BigQuery is like this and works well but gets so wide it’s difficult. Then again when you imbed things like arrays and structs and can unnest as needed avoiding joins can be nice.

I’ve found star schemas to work out just fine in data marts. Just do them properly. Join as needed. And a good engine will handle the rest. We’ve has no issues with a similar model in Snowflake for example. Of course YMMV.

mike_hearn · 2025-10-19T09:29:46 1760866186

Right, you want both, which is why databases like Oracle can store data in both forms. You can enable columnar formats on tables for both on disk and in-memory modes, where those columns can then be processed at high speed with lots of SIMD operations, but the data is kept consistent between them.

https://www.oracle.com/database/in-memory/

https://www.oracle.com/database/technologies/exadata/softwar...

That eliminates the complexity of running two databases and keeping them in sync.

Disclosure: work part time for them, own stock

gigatexal · 2025-10-19T16:53:24 1760892804

Getting into bed with Oracle is like selling your soul to loan sharks. Probably good in the beginning and only if you have a lot of money.

gigatexal · 2025-10-19T15:07:06 1760886426

FWIW SQLserver can do the same with its column store tables. Idk though. I stopped using such when I moved to data Eng and we just use open things (clickhouse, DuckDB, etc) except for snowflake.

citrin_ru · 2025-10-19T08:33:54 1760862834

Column storage (e.g. ClickHouse) is well suited for storing logs and log-like data even if writes (insert-only) > reads.

biehl · 2025-10-19T09:06:50 1760864810

Do you happen to know similar queries for Oracle?

scottcodie · 2025-03-31T01:53:00 1743385980

Stream processing/materialization engine written in rust that can be compiled to wasm.

Graffiti art.

scottcodie · on Sept 5, 2023

Is there a definition of a "security update"? Software has an infinite number of bugs and it is cost infeasible to fix them all. If it's years down the road, the engineers that wrote the code may be long gone.

MarcoPerazaFCC · on Sept 5, 2023

I think you're right that it would be difficult for the FCC to precisely define exactly when security updates are required. This is a problem in law generally, one that is usually resolved by imposing a reasonableness standard. Maybe here, a vulnerability needs to be patched if it might reasonably be expected to allow an attacker to take control of a device, or to do so when combined with other known or unknown vulnerabilities. Or maybe a different standard. Then when enforcement/lawsuits come around, the judge/jury/regulator has to evaluate the reasonableness of the manufacturer's actions in light of that standard. We'd love to see commentary on the record as to what the right legal standard might be.

(originally posted at https://news.ycombinator.com/item?id=37394188)

scottcodie · on Aug 29, 2023

Paying poorly would bias you to candidates who will accept poor wages. Good candidates, even if they are young and inexperienced, can have a high market rate.

hiAndrewQuinn · on Aug 29, 2023

This isn't paying poorly for the long term. Firms would be willing to try out riskier hires on the margin, because there's less downside if someone turns out to be a bad fit and gets let go after a couple of weeks.

Percieved risk is higher for people with less industry experience. So if you believe, as I do, that people systematically overestimate the risk they take on bringing juniors on board, and hence systematically underhire them, then this would likely end up being good on net for juniors, as more of them would then be offered a chance to prove the bias wrong.

hotnfresh · on Aug 29, 2023

The main cost of juniors is the time it takes other people to get them going. Good onboarding and mentoring of juniors takes a ton of time creating structure and training material, then working with them once they’re on. If any of that’s messed up or gets derailed by execs pushing for MORE PRODUCTIVITY RIGHT NOW, and taking away time folks were using to train juniors, your onboarding will suck and more juniors will founder. Either way, we’re talking a few months, not weeks, and the junior’s salary isn’t the cost companies with any amount of sense are worried about.

scottcodie · on July 18, 2023

You could also use a temporal table.

scottcodie · on June 17, 2023

SQL in the context of a single query struggles with a composable features. But looking slightly outside the scope of queries with assignment statements and view/materialization dags then you start getting some of that composition back. I think SQRL is an good example of SQL with a bunch of composable feature bolt-ons.

https://www.datasqrl.com/blog/sqrl-high-level-data-language-...

scottcodie · on March 14, 2023

Most technology advances are not labor replacing but rather labor augmenting. For example, LLMs could make teachers much more productive in the classroom but it would be unlikely to replace teachers entirely.

scottcodie · on Feb 22, 2023

Price gouging is often seen as immoral because it involves high prices without a justifiable reason. It is often considered in absence of welfare analysis, even if the middle class can afford the price increase, it can still be considered price gouging. In a competitive market, prices are based on what the market can bear, not on moral considerations. In such markets, prices must remain low to remain in business. However, high prices without justifiable reasons could be indication of monopolistic behavior from too much consolidation, and should be called out and investigated.

scottcodie · on Jan 26, 2023

Usually you don't consider health insurance to be a moral hazard in this way. However, in healthcare, a moral hazard can occur when people use more healthcare services because the cost for them is low.