The JavaTM Tutorial
Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Trail: Collections
Lesson: Interfaces

The Set Interface

A Set (in the API reference documentation) is a Collection (in the API reference documentation) that cannot contain duplicate elements. It models the mathematical set abstraction. The Set interface contains only methods inherited from Collection, and adds the restriction that duplicate elements are prohibited. Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set instances to be compared meaningfully even if their implementation types differ. Two Set instances are equal if they contain the same elements.

The Set interface is:

public interface Set<E> extends Collection<E> {
    // Basic Operations
    int size();
    boolean isEmpty();
    boolean contains(Object element);
    boolean add(E element);         // Optional
    boolean remove(Object element); // Optional
    Iterator iterator();

    // Bulk Operations
    boolean containsAll(Collection<?> c);
    boolean addAll(Collection<? extends E> c); // Optional
    boolean removeAll(Collection<?> c);        // Optional
    boolean retainAll(Collection<?> c);        // Optional
    void clear();                              // Optional

    // Array Operations
    Object[] toArray();
    <T> T[] toArray(T[] a);
}
The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet HashSet (in the API reference documentation), which stores its elements in a hash table, is the best-performing implementation but it makes no guarantees concerning the order of iteration. TreeSet (in the API reference documentation), which stores its elements in a red-black tree, orders its elements based on their values, but is substantially slower than HashSet. LinkedHashSet, which is implemented as a hash table with a linked list running through it, orders its elements based on the order in which they were inserted into the set (insertion-order). LinkedHashSet spares its clients from the unspecified, generally chaotic ordering provided by HashSet, at a cost that is only slightly higher.

Here's a simple but useful Set idiom. Suppose you have a Collection, c, and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick:

Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set, which by definition, cannot contain duplicate,) initially containing all the elements in c. It uses the standard conversion constructor described in the section The Collection Interface (in the Collections trail).

Here is a minor variant of this idiom that preserves the order of the original collection while removing duplicate element:

Collection<Type> noDups = new HashSet<Type>(c);

Here is a generic method that encapsulates the above idiom, returning a set of the same generic type as the one passed in:

public static <E> Set<E> removeDups(Collection<E> c) {
    return new LinkedHashSet<E>(c);
}

Set Interface Basic Operations

The size operation returns the number of elements in the Set (its cardinality). The isEmpty method does exactly what you think it does. The add method adds the specified element to the Set if it's not already present, and returns a Boolean indicating whether the element was added. Similarly, the remove method removes the specified element from the Set if it's present and returns a Boolean indicating whether the element was present. The iterator method returns an Iterator over the Set.

Here's a program (in a .java source file) that takes the words in its argument list and prints out any duplicate words, the number of distinct words, and a list of the words with duplicates eliminated:

import java.util.*;
public class FindDups {
    public static void main(String args[]) {
        Set<String> s = new HashSet<String>();
        for (String a : args)
            if (!s.add(a))
                System.out.println("Duplicate: " + a);
        System.out.println(s.size()+" distinct words: "+s);
    }
}
Now let's run the program:
java FindDups i came i saw i left
The following output is produced:
Duplicate: i
Duplicate: i
4 distinct words: [i, left, saw, came]
Note that the code always refers to the collection by its interface type (Set), rather than by its implementation type (HashSet). This is a strongly recommended programming practice, as it gives you the flexibility to change implementations merely by changing the constructor. If either the variables used to store a collection or the parameters used to pass it around are declared to be of the collection's implementation type rather than its interface type, all such variables and parameters must be changed in order to change the collection's implementation type. Furthermore, there's no guarantee that the resulting program will work. If the program uses any nonstandard operations that are present in the original implementation type but not in the new one, the program will fail. Referring to collections only by their interface prevents you from using any nonstandard operations.

The implementation type of the Set in the preceding example is HashSet, which makes no guarantees as to the order of the elements in the Set. If you want the program to print the word list in alphabetical order, merely change the set's implementation type from HashSet to TreeSet. Making this trivial one-line change causes the command line in the previous example to generate the following output:

java FindDups i came i saw i left
Duplicate word: i
Duplicate word: i
4 distinct words: [came, i, left, saw]

Set Interface Bulk Operations

The bulk operations are particularly well suited to Sets; when applied to sets, they perform standard set-algebraic operations. Suppose s1 and s2 are Sets. Here's what the bulk operations do: To calculate the union, intersection, or set difference of two sets nondestructively (without modifying either set), the caller must copy one set before calling the appropriate bulk operation. The resulting idioms follow:
Set<Type> union = new HashSet<Type>(s1);
union.addAll(s2);

Set<Type> intersection = new HashSet<Type>(s1);
intersection.retainAll(s2);

Set<Type> difference = new HashSet<Type>(s1);
difference.removeAll(s2);
The implementation type of the result Set in the preceding idioms is HashSet, which is, as already mentioned, the best all-around Set implementation in the Java platform. However, any general-purpose Set implementation could be substituted.

Let's revisit the FindDups program. Suppose that you want to know which words in the argument list occur only once and which occur more than once but that you do not want any duplicates printed out repeatedly. This effect can be achieved by generating two sets, one containing every word in the argument list and the other containing only the duplicates. The words that occur only once are the set difference of these two sets, which we know how to compute. Here's how the resulting program (in a .java source file) looks:

import java.util.*;
public class FindDups2 {
    public static void main(String args[]) {
        Set<String> uniques = new HashSet<String>();
        Set<String> dups = new HashSet<String>();

        for (String a : args) 
            if (!uniques.add(a)) 
                dups.add(a);

        // Destructive set-difference
        uniques.removeAll(dups); 
        System.out.println("Unique words:    " + uniques);
        System.out.println("Duplicate words: " + dups);
    }
}
When run with the same same argument list used earlier (i came i saw i left), the program yields the output:
Unique words:    [left, saw, came]
Duplicate words: [i]
A less common set-algebraic operation is the symmetric set difference: the set of elements contained in either of two specified sets but not in both. The following code calculates the symmetric set difference of two sets nondestructively:
Set<Type> symmetricDiff = new HashSet<Type>(s1);
symmetricDiff.addAll(s2);
Set<Type> tmp = new HashSet<Type>(s1);
tmp.retainAll(s2));
symmetricDiff.removeAll(tmp);

Set Interface Array Operations

The array operations don't do anything special for Sets beyond what they do for any other Collection. These operations are described in the section The Collection Interface (in the Collections trail).

Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Copyright 1995-2005 Sun Microsystems, Inc. All rights reserved.