The java.util.HashSet class is a collection that uses a Hash Table for storage. It is the most widely used implementation of the Set interface because it offers constant-time performance for basic operations like add, remove, and contains.
One of the most fascinating "secrets" of Java is that a HashSet isn't actually a brand-new data structure. Internally, every HashSet uses a HashMap to store its elements.
When you add an element E to a HashSet, Java actually does this: map.put(E, PRESENT). The element you add becomes a Key in the internal Map, and a dummy object called PRESENT is used as the value. Since Map keys must be unique, the HashSet remains unique!
Why is HashSet so fast? It uses a technique called Hashing. Instead of searching through a list from start to finish, Java calculates a "Hash Code" (a number) for your object. This number acts like a coordinate that points directly to a "Bucket" in memory.
Java calls obj.hashCode() to get a numeric representation of the object.
Java uses the formula index = hash & (n-1) to find the specific bucket in the internal array.
Sometimes, two different objects produce the same Hash Code or map to the same bucket. This is called a Collision. Java handles this by turning that bucket into a Linked List (and eventually a Balanced Tree if the list gets too long).
For almost every operation, HashSet is exceptionally efficient. Here is the breakdown:
| Operation | Time Complexity | Explanation |
|---|---|---|
| Add | $O(1)$ | Calculate hash and drop into bucket. |
| Remove | $O(1)$ | Direct access via hash. |
| Contains | $O(1)$ | The fastest way to check existence in Java. |
| Iteration | $O(n + h)$ | $n$ = elements, $h$ = bucket capacity. |
In this example, we use a HashSet to track unique IP addresses hitting a server. Notice how duplicates are automatically ignored without any if statements.
To keep the $O(1)$ performance, the HashSet must not get too crowded. It uses two parameters to manage this:
When the set is 75% full, it doubles its capacity and rehashes all existing elements. This is a heavy operation, so if you know you have 1,000 items, initialize it with a higher capacity!
If you use custom objects (like a User class) in a HashSet, you MUST override both hashCode() and equals(). If two objects are equal, they must return the same hash code. If they don't, your HashSet will allow "duplicates" because it will look in the wrong buckets.
Q: Is HashSet ordered?
A: No. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
Q: What is the difference between HashSet and HashMap?
A: HashSet implements the Set interface and stores objects. HashMap implements the Map interface and stores Key-Value pairs. However, HashSet uses a HashMap internally.
Q: Can we store null in a HashSet?
A: Yes, HashSet allows a single null element.
Q: Why is HashSet faster than a List for searching?
A: A List requires an $O(n)$ linear scan (checking every item). A HashSet uses the object's hash to jump directly to the memory location ($O(1)$).
The HashSet is the ultimate tool for uniqueness and performance. By trading a bit of memory (for buckets and pointers) and accepting a lack of order, you gain the ability to process massive amounts of data with constant-time efficiency. Whether you are building a search engine or a simple data filter, HashSet is an essential part of your Java toolkit.