I had a chat today with @xblasco about hash maps, and wether they really are O(1) on insertion, deletion and get value. Did anyone ever had this thought of "no way it can actually be O(1)"? Or the problem that in benchmarks the O(1) doesn't seem to hold.

#algorithms #datastructures #programming

@nosherwan @xblasco This article for #Python #dicts is interesting: tenthousandmeters.com/blog/pyt

For one, it reiterates the time complexity as O(1), but then you can see the graph where it doesn't follow theory when benchmarked.

The article explains why. CPU Caches.

But the underlying question is, should be considered O(1) if in practice doesn't follow it?

Follow

@deavid @nosherwan @xblasco

The algorithm is O(1) in theory and practice.

If we say that an algorithm is O(1) that for any number of items in the dictionary, we can find a constant C which is an upperbound on the time it takes to to do the look-up.

As n grows we get more cache-misses and 100% of look-ups will be cache-misses.

In that case I would expect the lookup time to be capped by the time it takes to find the key from RAM. (a little bit below 500 ns)

For larger dictionaries you might even find a second jump. This will happen when the dictionary doesn't fit into RAM anymore. Most modern operating systems will start writing parts of the dictionary to the hard-drive. This is called swapping.

@erikdesmedt @nosherwan @xblasco I think you're right but the upperbound on time doesn't seem good enough reason to prove, as with a high upper bound you could claim O(log n) to be O(1). For example, we could compare a bad hashmap implementation vs a specialized btree, in a scenario where btree excels. We put the same upper bound and btree looks like is O(1) under the same rules. Which is wrong.

I'd like to hear more ideas on how to prove hashmaps are O(1) in practice

@erikdesmedt @nosherwan @xblasco wait a second... If we make a btree, several accesses would be out of cache, while hashmaps would have only one ram access. There's no way to make a tree perform around a single memory access. I guess we would always get an order of magnitude higher timings for the right side of the graph

@erikdesmedt @nosherwan @xblasco And circling more, now it makes even more sense why B-Trees (not binary, just B) are used in #Postgres en.wikipedia.org/wiki/B-tree

It seems to me that they will perform not that bad compared to hashmaps in real life even being O(log n) for access instead of O(1).

enterprisedb.com/postgres-tuto

(couldn't find a better article for PostgreSQL comparing both indexes, client count is not the metric I want)

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.