**David Martínez Martí** @deavid@techhub.social · Nov 29, 2022, 22:57

**David Martínez Martí** @deavid@techhub.social · Nov 29, 2022, 22:57

David Martínez Martí @deavid@techhub.social

Nov 29, 2022, 22:57

David Martínez Martí @deavid@techhub.social

I had a chat today with @xblasco about hash maps, and wether they really are O(1) on insertion, deletion and get value. Did anyone ever had this thought of "no way it can actually be O(1)"? Or the problem that in benchmarks the O(1) doesn't seem to hold.

#algorithms #datastructures #programming

**nosh (ʘ‿ʘ)** @nosherwan@fosstodon.org · Nov 30, 2022, 00:07

**nosh (ʘ‿ʘ)** @nosherwan@fosstodon.org · Nov 30, 2022, 00:07

Nov 30, 2022, 00:07

nosh (ʘ‿ʘ) @nosherwan@fosstodon.org

@xblasco @deavid what was the outcome

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 00:21

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 00:21

Nov 30, 2022, 00:21

David Martínez Martí @deavid@techhub.social

@nosherwan @xblasco we are still discussing... It's not an easy topic.

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 08:13

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 08:13

Nov 30, 2022, 08:13

David Martínez Martí @deavid@techhub.social

@nosherwan @xblasco This article for #Python #dicts is interesting: https://tenthousandmeters.com/blog/python-behind-the-scenes-10-how-python-dictionaries-work/

For one, it reiterates the time complexity as O(1), but then you can see the graph where it doesn't follow theory when benchmarked.

The article explains why. CPU Caches.

But the underlying question is, should be considered O(1) if in practice doesn't follow it?

30a9789241d9a0d7.png

**Erik** @erikdesmedt@qoto.org · 2022-11-30T08:45:09Z

Erik @erikdesmedt@qoto.org

@deavid @nosherwan @xblasco

The algorithm is O(1) in theory and practice.

If we say that an algorithm is O(1) that for any number of items in the dictionary, we can find a constant C which is an upperbound on the time it takes to to do the look-up.

As n grows we get more cache-misses and 100% of look-ups will be cache-misses.

In that case I would expect the lookup time to be capped by the time it takes to find the key from RAM. (a little bit below 500 ns)

For larger dictionaries you might even find a second jump. This will happen when the dictionary doesn't fit into RAM anymore. Most modern operating systems will start writing parts of the dictionary to the hard-drive. This is called swapping.

Nov 30, 2022, 08:45 · · · ·

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 09:42

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 09:42

Nov 30, 2022, 09:42

David Martínez Martí @deavid@techhub.social

@erikdesmedt @nosherwan @xblasco I think you're right but the upperbound on time doesn't seem good enough reason to prove, as with a high upper bound you could claim O(log n) to be O(1). For example, we could compare a bad hashmap implementation vs a specialized btree, in a scenario where btree excels. We put the same upper bound and btree looks like is O(1) under the same rules. Which is wrong.

I'd like to hear more ideas on how to prove hashmaps are O(1) in practice

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 09:45

**David Martínez Martí** @deavid@techhub.social · Nov 30, 2022, 09:45

Nov 30, 2022, 09:45

David Martínez Martí @deavid@techhub.social

@erikdesmedt @nosherwan @xblasco wait a second... If we make a btree, several accesses would be out of cache, while hashmaps would have only one ram access. There's no way to make a tree perform around a single memory access. I guess we would always get an order of magnitude higher timings for the right side of the graph