More

mgradowski · 2025-09-12T23:04:23 1757718263

Isn't it trivially [1]?

zahlman · 2025-09-13T16:53:22 1757782402

Perhaps what is meant is "maximize the difference between the optimal result and the one calculated by the naive greedy algorithm".

Jun8 · 2025-09-13T22:28:39 1757802519

Thanks for clarifying my poorly worded description, that’s exactly what I meant. Like in the example given, the difference is 10-4=6, let’s call this the naive_greedy_miss_factor. Can we choose three other denominations so that NGMF is > 6?

mgradowski · 2025-08-16T11:34:43 1755344083

I wouldn't trust a taxi driver's predictions about the future of economics and society, why would I trust some database developer's? Actually, I take that back. I might trust the taxi driver.

antirez · 2025-08-16T11:38:54 1755344334

The point is that you don't have to "trust" me, you need to argue with me, we need to discuss about the future. This way, we can form ideas that we can use to understand if a given politician or the other will be right, when we will be called to vote. We can also form stronger ideas to try to influence other people that right now have a vague understanding of what AI is and what it could be. We will be the ones that will vote and choose our future.

antidog · 2025-08-16T16:01:39 1755360099

Life is too short to have philosophical debates with every self promoting dev. I'd rather chat about C style but that would hurt your feelings. Man I miss the days of why the lucky stiff, he was actually cool.

mgradowski · 2025-08-16T16:26:58 1755361618

Sorry boss, I'm just tired of the debate itself. It assumes a certain level of optimism, while I'm skeptical that meaningfully productive applications of LLMs etc. will be found once hype settles, let alone ones that will reshape society like agriculture or the steam engine did.

palmfacehn · 2025-08-16T19:19:44 1755371984

Whether it is a taxi driver or a developer, when someone starts from flawed premises, I can either engage and debate or tune out and politely humor them. When the flawed premises are deeply ingrained political beliefs it is often better to simply say, "Okay buddy. If you say so..."

We've been over the topic of AI employment doom several times on this site. At this point it isn't a debate. It is simply the restating of these first principles.

nicce · 2025-08-16T17:16:57 1755364617

You shouldn't care about the "who" at all. You should see their arguments. If taxi driver doesn't know anything real, it should be plain obvious and you can state it easily with arguments rather than attacking the background of the person. Actually, your comment is one of the most common logical flaws (Ad Hominem), combining even multiple at the same time.

mgradowski · 2025-08-16T19:13:41 1755371621

I jokingly alluded to antirez as HN crowd pars pro toto. I agree it doesn't pass as an intellectually honest argument.

mgradowski · 2025-07-07T11:05:06 1751886306

I'm really confused because I had to scroll half the comment section for the word `semaphore`.

This seems to be an interview question about JS esoterica, not concurrent programming.

mgradowski · on Jan 10, 2024

UNIX philosophy is alive and well.

mgradowski · on June 7, 2023

The "4j" suffix literally means "for Java" and is commonly used to indicate that the project is a Java library, e.g. log4j, slf4j, &c.

rc_mob · on June 7, 2023

baxuz' joke still stands

mgradowski · on Sept 24, 2022

What you're really saying is that the database presented in OP is not useful because it only handles DQL.

1. SQL can be thought of as being composed of several smaller lanuages: DDL, DQL, DML, DCL.

2. columnq-cli is only a CLI to a query engine, not a database. As such, it only supports DQL by design.

3. I have the impression that outside of data engineering/DBA, people are rarely taught the distinction between OLTP and OLAP workloads [1]. The latter often utilizes immutable data structures (e.g. columnar storage with column compression), or provides limited DML support, see e.g. the limitations of the DELETE statement in ClickHouse [2], or the list of supported DML statements in Amazon Athena [3]. My point -- as much as this tool is useless for transactional workloads, it is perfectly capable of some analytical workloads.

[1] Opinion, not a fact.

[2] https://clickhouse.com/docs/en/sql-reference/statements/dele...

[3] https://docs.aws.amazon.com/athena/latest/ug/functions-opera...

mgradowski · on Aug 14, 2022

Excellent name.

t6jvcereio · on Aug 14, 2022

I hate puns in names of tools. Makes my brain stutter.

mgradowski · on Feb 13, 2022

I'm curious why do you think of DAX as a virtue. My poor SQL-shaped peg brain has never really fit the DAX hole of MS software.

Also, it always struck me as something too complex for the non-technical folks, and not expressive enough for tech-literate analysts/data engineers &c.

Psyladine · on Feb 14, 2022

Just options. VBA is there as well. Excel's virtue is not specializing in any specific task, but being versatile enough to express a multitude of business solutions. 'Excel is my database' wasn't always a punchline.

That's more than an equally domain specific process like Qlik, and more than a specific vendor tool like tableau. And anyway if PowerBI didn't have a pain point it wouldn't be a MS product.

mgradowski · on Feb 11, 2022

This book exposed me to proper math back in high school, I liked it.

mgradowski · on Jan 17, 2022

DuckDB and Polars are my bets in the Python data-wrangling space. I grew tired of Pandas' weird-ass API.

sweezyjeezy · on Jan 17, 2022

I would love to switch to something else, but it feels like pandas is lingua-franca in data science now, to switch puts a burden on everyone else.

mytherin · on Jan 17, 2022

You can use DuckDB as a processing engine on top of Pandas [1], while continuing to use Pandas as a data storage/data interchange format.

[1] https://duckdb.org/2021/05/14/sql-on-pandas.html

mgradowski · on Jan 17, 2022

That's what I do at $dayjob whenever I have to do windowing &c. Figuring out this stuff in Pandas is a waste of time. Before I discovered DuckDB, I would re-learn the API every damn time. I came up with a little utility function, which you can implement yourself :)

``` def sqldf(df: DataFrame, query: str) -> DataFrame: ... ```

kristjansson · on Jan 17, 2022

Years of unpicking others use of Rs sqldf (which by default used to copy the entire data frame to a SQLite db, run the query, the copy the result set back) when they complained their R code was to slow has taught me a visceral, negative to the name and pattern.

Glad to to see duckDB delivering, finally, on the promise of running SQL against in-memory dataframes

mgradowski · on Jan 17, 2022

TIL there's an actual 'botched' library with the same name; I actually came up with it independently on a lazy office afternoon :^)

mgradowski · on Jan 17, 2022

I like interface-only packages in the Julia ecosystem e.g. Tables.jl enables the development of several packages for querying tabular data that work across many concrete implementations; Plots.jl separates the high-level plotting interface from the plotting backend.

sweezyjeezy · on Jan 17, 2022

Hah - I'm saying switching libraries is a headache - switching languages is absolutely not an option...

mrtranscendence · on Jan 17, 2022

It's true. I've spent a small but nontrivial amount of time learning and using Polars, but it's just a nonstarter for most work projects. Not only does no one else know it exists, let alone how to use it, but it doesn't integrate with (to my knowledge) any ETL or ML Python library. You have to convert to pandas or NumPy, which is costly and to some extent defeats the purpose.

elforce002 · on Jan 17, 2022

It says here: https://github.com/pola-rs/polars/issues/580#issuecomment-82... , that Polars has zero copy for arrow and numpy.

ritchie46 · on Jan 18, 2022

The to numpy conversion is free if you don't have missing data. Which is most of the cases if you send it over to a ML library.

If its not zero copy. It is still not a big deal. Pandas make a lot more copies internally. I truly wouldn't worry about that single copy if you have a order of magnitude speedup overall.

mrtranscendence · on Jan 18, 2022

I stand corrected. The conversion felt relatively slow to me, but it was a large dataset and there were definitely missing values. Overall the benefits to speed and API cleanliness might be worth it, though it feels a bit gross to convert Spark to pandas to Polars to NumPy to DMatrix.

That said, it’s so much better than pandas for data manip that I’ll probably still try to use it.

Are you the author? If so, thanks for being so responsive on GitHub. You fixed basically every issue I had almost immediately back when I was learning Polars. It was awesome.

ritchie46 · on Jan 18, 2022

Yep, Thats me. Glad to help. :) There still room for parallelization when converting to a matrix. I will take a look. Haven't given that conversion any effort yet because that's often a one time conversion at the end of a pipeline.

But I will improve it. ;)

anonymousDan · on Jan 17, 2022

Yes I used it for the first time in ages recently and I have to say I found the whole thing a mess. There are about 5 ways to do everything.

elforce002 · on Jan 17, 2022

I don't know DuckDB but polars could dethrone pandas. We're planning on using it to create our pipeline. Ibis-project is another solution if anyone wants to check it out.

mgradowski · on Jan 17, 2022

Huh, even though I would prefer a universal SQL layer, ibis looks quite nice.

elforce002 · on Jan 17, 2022

I mean I haven't heard about DuckDB.

spaniard89277 · on Jan 17, 2022

I haven't touched pandas in months, but I also found quite tiring to deal with pandas.

Does your setup allow for an end-to-end solution? I mean, can I sink time into that setup and feel like I have everything I need to for regular data-wrangling?

I'm sure Pandas is amazing, but as a newbie I found myself doing many transformation logic with python data structures because it's just so much easier.

Maybe I'm dumb but going around the docs sometimes was like :/

closed · on Jan 17, 2022

Author of the post and siuba here. I'm pretty interested in exploring supporting polars as a backend, and if it works well supporting versions of the SQL backends that translate to SQL based on the polars method API :).

(I haven't really used it, but it looks promising)

mrtranscendence · on Jan 17, 2022

Hey, I love siuba. Haven't had a chance to use it much but it scratches an itch for me. For years I've grumbled about how Python isn't flexible enough to accommodate tidyverse style libraries, as it lacks pipes and lazy evaluation (or macros), but siuba has managed to be very nice to use.

Maybe someday Python'll get a macro system ...