When we think of data structures and performance, the primary metrics to keep in mind are the time complexities of various operations like insertion, deletion, and lookup. Amongst all .NET collections, HashSet stands out as particularly efficient when it comes to membership checks. Here’s why:

1. Underlying Data Structure: The Hash Table

The HashSet<T> in C# is implemented using a hash table. At a high level, a hash table uses a hash function to map keys (or, in the case of HashSet, values) to indices of an array.

For membership checks, this process boils down to:

  • Computing the hash code of the item.
  • Using this hash code to find an index in the underlying array.
  • Checking if the item exists at that location.

Because this process does not depend on the number of elements in the hash table, it has an average time complexity of O(1) thus making it excellent when storing large amounts of data in memory.

2. Collision Resolution

Collisions occur when two different elements have the same hash code. The .NET implementation of HashSet uses a technique called chaining to manage these collisions, which involves maintaining a list of items that hash to the same index.

Even with this, thanks to good hash functions and optimal load factors, HashSet usually keeps the average search time close to O(1).

3. No Extra Data

Unlike other collections like Dictionary or SortedSet, HashSet only deals with unique elements without any additional data like key-value pairs or sorted order. This reduces overhead during membership checks.

4. Dynamic Resizing

As elements are added to a HashSet, it doesn’t just keep growing uncontrollably. Once a certain load factor (ratio of the number of items to the size of the underlying array) is reached, the HashSet resizes itself, ensuring that the average search time remains O(1).

Caveats

It’s important to note that the O(1) time complexity is an average case. In the worst-case scenario (like when all elements collide to the same index), the performance can degrade to O(n). However, with a well-distributed hash function, this is a rare occurrence.

Let’s see a simple benchmark:

[MemoryDiagnoser]
public class ContainsSetVsListVsArray
{
    private HashSet<int> _set;
    private List<int> _list;
    private int[] _array;

    [Params(10, 100, 1000, 10000)]
    public int Number;

    public int ContainedNumber => Number / 2;

    [GlobalSetup]
    public void GlobalSetup()
    {
        _set = new HashSet<int>(Enumerable.Range(1, Number));
        _list = new List<int>(Enumerable.Range(1, Number));
        _array = Enumerable.Range(1, Number).ToArray();
    }

    [Benchmark]
    public bool ContainsSet()
    {
        return _set.Contains(ContainedNumber);
    }
    
    [Benchmark]
    public bool ContainsList()
    {
        return _list.Contains(ContainedNumber);
    }    
    
    [Benchmark]
    public bool ContainsArray()
    {
        return _array.Contains(ContainedNumber);
    }
}

MethodNumberMeanErrorStdDevAllocated
ContainsSet103.588 ns0.0559 ns0.0523 ns
ContainsList103.942 ns0.0884 ns0.0827 ns
ContainsArray104.536 ns0.0691 ns0.0646 ns
ContainsSet1003.667 ns0.0283 ns0.0236 ns
ContainsList1005.661 ns0.1316 ns0.1099 ns
ContainsArray1006.050 ns0.0553 ns0.0490 ns
ContainsSet10003.661 ns0.0357 ns0.0317 ns
ContainsList100021.461 ns0.4565 ns0.4885 ns
ContainsArray100023.524 ns0.4211 ns0.3733 ns
ContainsSet100003.465 ns0.0334 ns0.0313 ns
ContainsList10000178.155 ns3.3170 ns4.3131 ns
ContainsArray10000236.917 ns4.1383 ns3.8710 ns
Benchmark results in .NET 7

Conclusion

HashSet in C# is an excellent tool when you need a collection that supports rapid membership checks. Thanks to its underlying hash table structure, minimized data overhead, and dynamic resizing, it typically performs membership tests in constant time. However, as with any data structure, it’s essential to understand its strengths and limitations to use it effectively.

*Keep in mind that in .NET 8 FrozenSets are going the be the best option when you need to squeeze that extra performance. We ‘ll talk about those when the time comes.

**Don’t forget to set initial capacity for Hashsets since they use an array internally.

Leave a comment

Trending