Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can only use the first part of that index, the position, to jump to HR Manager resumes; then you would need to manually go through them one-by-one to grab only the ones starting with J for each years-of-experience subgroup (if any).

You're phrasing that like this situation — relying on an index formed by (key 1 that you know, key 2 that you don't know, key 3 that you want to depend on) — necessarily implies a sequential walk of everything within the HR manager subgroup.

But it doesn't; within each key-1 partition of the index, you can binary-search to find the boundaries of the key-2 partitions; and within those, you can binary-search to find the all (or, more commonly, the first) last_name[0] = "J" entry/entries. It's actually not overly slow (i.e. is often still a win over a naive index-partial seq scan) if the cardinality ratio of of key-2 : key-3 isn't too extreme (i.e. if you're saving useful amounts of time by eliminating enough key-3 entries from consideration per key-2 partition.)

(We use this approach quite a bit at $WORK — MySQL and Postgres people like to call this a https://wiki.postgresql.org/wiki/Loose_indexscan. [See the "Making use of a non-leading column of an index" section.])



Looking at the documentation, MySQL definitely has this feature now.

I don't think it understands when to use it.

I have a table where the primary key (A,B) is two integers. The first one is a single byte and only has a dozen distinct values. Any query I do based on just B ends up doing a table scan and being extremely slow. But if I add "WHERE A IN (1,2,3,4,5,6,7,8,9,10,11) AND" suddenly it's super fast.

So I'm stuck with a redundant index on just B if I don't want to add that cruft all over.

Anyway, yes, optimization is often possible in this kind of situation but don't trust your database engine to figure it out.


Well, yes, I omitted anything that may or may not happen to explain the general principle. Cardinality-based optimizations can and do take place, but they depend on the db being aware of the shape of your data (you need to analyze the tables often) and depend on internal factors and heuristics that are subject to change between releases, can't be assumed across different databases, and may or may not actually speed up the query (pathological cases certainly exist, even in the real world).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: