When looking up a string, what is the fastest strategy?
A hash map, a sorted list or an unsorted list?
Of course it depends on how many strings you have, but where are the cutoff points?
Here is a quick test, and an interesting tidbit is uncovered…
Without further ado, here is the result chart.
Time is the vertical axis (lower is better), horizontally is the number of items, note that it is logarithmic.
Those are results for random, 12 characters strings, but the curves where almost unchanged if the strings were “less random” (with 6 identical characters at the start f.i.)
“TDictionary” and “Hash FNV1a” are both hash-maps, but “Hash FNV1a” uses a fast hash, while “TDictionary” uses the slower Bob-Jenkins Hash (which is also more prone to collisions [2], but these did not affect that particular test)
Observations
- avoid using an unsorted list š
- if you have a few dozen strings, a simple sorted list can be preferable
- computing a hash code is hardly innocuous, with a slow hash you need to have hundreds of items just to get even on hash-code computations
Tidbit of note
Sorted & Unsorted lists are both using TStringList, with just an overloaded CompareStrings method hardcoded to a case-insensitive comparison, and the difference is just whether Sorted is set to true or not… so why the difference even for the case where you have 1 or 2 elements?
In both cases IndexOf() was called, but IndexOf merely reroutes the execution to TStrings.IndexOf orĀ TStringList.Find depending on the value of Sorted, and that makes all the difference…
- TStringList.Find accesses theĀ FListĀ private field directly, and so involves no reference counting, and no exception frames.
- TStrings.IndexOf on the other hand goes through the Get method, which means a reference count and an implicit exception frame, which when you’re comparing a single string or two bites a lot.
When there are a couple elements, an unsorted search in TStringList is actually spent doing a fair share of Automatic Reference Counting tasks (there is also a bit of range checking, but it’s negligible in comparison).