1. 程式人生 > >深入java 的equals方法和hashCode方法

深入java 的equals方法和hashCode方法

Every Java object inherits a set of base methods from java.lang.Object that every client can use:

  • Creational methods
    Object()
    Default no-argument constructor
    clone()
    Returns a new instance of the class
  • Synchronizing methods
    notify()F
    Sends a signal to a waiting thread (on the current instance)
    notifyAll()F
    Sends a signal to all waiting threads (on the current instance)
    wait()F
    Forces the current thread to wait for a signal (on the current instance)
  • Equality methods
    equals(Object)
    Returns true if this instance is equal to the argument
    hashCode()
    Returns a hash code based on the instance data
  • Other methods
    toString()
    Returns a string representation of the object
    finalize()
    Performs garbage-collection duties
    getClass()F
    Returns the Class object associated with the instance


Each of these methods has a sensible default behavior that can be overridden in the subclasses (except for final

methods, marked above with F). This article discusses overriding the equals() and hashCode() methods for data objects.

So, what do the equals() and hashCode() methods do?

The purpose of the equals() method is to determine whether the argument is equal to the current instance. This method is used in virtually all of the java.util collections classes, and many other low-level libraries (such as RMI (Remote Method Invocation), JDBC (Java Database Connectivity), etc.) implicitly depend on the correct behavior. The method should return true if the two objects can be considered equal and false otherwise. Of course, what data is considered equal is up to each individual class to define.

Since computing an object's equality is a time-consuming task, Java also provides a quick way of determining if an object is equal or not, using hashCode(). This returns a small number based on the object's internal datastructure; if two objects have different hash codes, then they cannot be equal to each other. (Think of it like searching for two words in a dictionary; if they both begin with "A" then they may be equal; however, if one begins with "A" and the other begins with "B" then they cannot be equal.)

The purpose of computing a hash code is that the hash should be quicker to calculate and compare than computing full object equality. Datastructures such as the HashMap implicitly use the hash code to avoid computing equality of objects where possible. One of the reasons why a HashMap looks up data faster than a List is because the list has to search the entire datastructure for a match, whereas the HashMap only searches those that have the same hash value.

Importantly, it is an error for a class to have an equals() method without overriding the default hashCode() method. In an inheritance hierarchy, only the top class needs to provide a hashCode() method. This is discussed further below.

Implementing equals()

The equals() method's signature must be:

public boolean equals(Object)


Note: Regardless of which class contains the equals() method, the signature must always declare an Object argument type. Since Java's libraries look for one with an Object argument, if the signature is not correct, then the java.lang.Object method will be called instead, leading to incorrect behavior.

 

Fortunately, it is fairly easy to write a method that has this behavior. The equals method should compare:

  1. If the argument is this; if so, return true (reflexivity)
  2. If the argument is null; if so, return false (non-null)
  3. If the argument is of a different type; if so, return false (symmetry)
  4. Non-static and non-transient fields are equal, or both null in the case of reference types (symmetry, transitivity)


The reason why static fields are not compared is because they are the same for all class instances and thus clearly identical in an instance-based comparison. transient fields are not compared because the transient keyword's purpose is to prevent fields from being written during serialization; and these fields should therefore not play a part in equality testing. Otherwise, if an object is serialized and then de-serialized, then it will not be equal to itself.

Let's consider a simple point in 2D space to see how a simple equals() method looks; the comparison is shown graphically in Figure 1:

public class Point {
  private static double version = 1.0;
  private transient double distance;
  private int x, y;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Point point = (Point)other;
    return (x == point.x && y == point.y);
  }
}


Figure 1. Comparing fields in the Point class

The code above shows a simple example of an equals() method. Note that only the two non-static and non-transient fields (x and y) are being compared; the others (distance and version) are not relevant to instance-based comparison.

Note: In this case, the getClass() method is being used to determine whether the class is the correct type; this is the more correct choice than instanceof, as discussed below.

To compare whether two instances are of the same type, their getClass() method is invoked. Each instance has a single Class instance associated with it; classes of the same instance share exactly the same Class instance. In the VM, there is only a single Class instance for each class name (visible within a single ClassLoader, anyway). As a result, these Class instances can be compared with the identity test == instead of having to use an equals() on the Class instances.

Comparing reference types

So what happens when a reference type is declared within an object? The answer is that their equals() methods are used to compare the references directly; the only extra work involved is determining if the references are both null. The logic is:

  1. If the reference is null in this, then it must be null in the other
  2. If the reference is non-null in this, then it must be equals() in the other

Here's an example of a Person class that performs equality checking on two reference types, name and birth:

public class Person {
  private String name;
  private Date birth;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Person person = (Person)other;
    return (
      (name == person.name || 
        (name != null && name.equals(person.name))) &&
      (birth == person.birth || 
        (birth != null && birth.equals(person.birth)))
    );
  }
}

In this case, the identity check (name == person.name) captures whether both references are null. It also removes the equality test when the two fields are identical, thus not requiring a recursive call to equals() in the other instance.

Note: It is possible to ignore the test for null in the second part if you know the value can never be null; i.e., it is assigned non-null in the constructor. However, it is better to be safe than have numerous NullPointerExceptions coming out of datastructures because your equals() method does not handle null data correctly.

A similar implementation holds for other reference types, such as arrays. However, since arrays lack an equals() implementation, you either have to write the for loop manually to test for element equality or use the Arrays class to perform the equality check (see Resources for link to the Javadoc).

Implementing hashCode()

The hash code allows collections such as HashMap to organize and access data quicker than searching through the entire datastructure. It does this by partitioning the data into different buckets, then searching through the single bucket corresponding to the hash code required. As a result, if an object's hash code differs, it won't be searched; just like you wouldn't expect the word "Banana" to appear in a dictionary's "A" section. Figure 2 shows a simplified example of how a hash table is structured, using initial letters as the hash:

Figure 2. Simplified example of how a hash table works

The hash code is just an int calculated from the instance data. Importantly, if two instances are considered equal by equals(), then they must have the same hash code. As a consequence, the hash code can only be computed on those fields compared in the equals() method. It does not have to use all of the equality fields:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
    Point point = (Point)other;
    return (x == point.x && y == point.y);
  }
  public int hashCode() {
    return x;
  }
}

The code above is a correct (if not optimal) implementation of hashCode(), since it only relies on fields compared by the equals() method. In this case, all points that have the same x value have the same hash code and are deposited into the same bucket for hash comparisons. Thus, when searching for a point in a Map, only the Points with the same x value will be compared.

Of course, it is desirable for hash codes to be distributed, which means that where possible, two different objects should have different hash codes. Additionally, if two objects are "close" to each other, they should have a very different hash code.

We can improve on our hash function by using other variables in the hash computation:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return x + y;
  }
}

Now, our hash function only returns the same for points that lie on a diagonal line. However, this is still not very distributed, because most likely, it may occur in some applications. We can improve the function by using some of the following options:

  • Use multiplication instead of addition
  • Use bitwise xor to combine values (the ^ operator) instead of addition
  • Multiply integral values with a prime number to distribute the values

We can now add a hashCode() method to the Point class:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return x*31 ^ y*37;
  }
}

Of course, the hash code's computation should be quick, which may often negate multiplication's use.

Using hashCode() with reference types

If the class uses a reference type, then the hashCode() method can delegate the work to the enclosed reference type. However, care must be taken when the reference may be null, because as with equals(), it would be undesirable to generate NullPointerExceptions. In the case of the reference type being null, a fixed number can return. (The number should be non-zero and non-negative, since these values may have special meanings in some hash map implementations.)

Although the method can be implemented with multiple if blocks, it is more compact using the ternary if operator:

public class Person {
  private String name;
  private Date birth;
  public boolean hashCode() {
    return
     (name  == null ? 17 : name.hashCode()) ^ 
     (birth == null ? 31 : name.hashCode());
  }
}

Default behavior of equals() and hashCode()

The default behavior for these two methods gives answers that work only for simple cases. The equals() method returns true only when it is being compared to itself (i.e., the identity check). The hashCode() method returns an int based on a unique instance hash (such as the object's location in memory and may be calculated differently for different VM implementations).

Because the default hashCode() gives an answer based on the instance's identity, it is incorrect to implement equals() without implementing a corresponding hashCode() method. Otherwise, two objects may possibly be equal, but have different hash codes—a violation of the equality contract, which will manifest itself in a number of odd ways. It is much better, if a suitable hash code cannot be calculated, to return a constant value (such as 7) rather than use the default implementation. Of course, using a constant value will degrade the performance of a Map into that of a List, so even a simple implementation that returns the value of an integer field will prove beneficial.   

Advanced strategies

The computation of both equals() and hashCode() must be as quick as possible, since they will be called repeatedly on objects. In a List with 1,000 elements, it is likely that 10,000 comparisons will be done in sorting the list. For each of these comparisons, equals() is called once. Thus, optimizing the comparison speed is a highly desirable goal. Although dynamic comparison utilities (such as the Apache EqualsBuilder) make writing equals() methods easier, because they are based on dynamic field comparisons using introspection, the execution is much slower and less beneficial than an implemented method.

Since some parts of the equals method run faster than others, it makes sense to order the statements so the quicker ones are tested before the slower ones. For example, comparing the values of a primitive is much faster than invoking the equals() method; so primitives are compared first. Similarly, if the object is of the wrong type, there is no point in comparing any of the fields first.

There are some situations when using floating-point data types that can cause some odd behavior; see the sidebar, "When All is Not Equal," for more information.

Read-only objects

If an object is read-only (immutable), then the hash code can be computed ahead of time. When the object is created, all of the values will be passed in via a constructor, and the hash code can be calculated on that data. The hashCode() method can then just return the pre-computed value:

public class Point {
  private final int x, y;
  private final int hashCode;
  public Point(int x, int y) {
    this.x = x;
    this.y = y;
    this.hashCode = x*31 ^ y*37;
  }
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return hashCode;
  }
}

Of course, care must be taken to ensure that the variables are not changed after their initialization; the final keyword helps to ensure that cannot occur.

Using instanceof (or not)

In Joshua Bloch's otherwise excellent book Effective Java, he recommends using instanceof as the test to determine type of a class. Whilst on the surface, this may seem a good idea, in fact it has a fatal flaw in that instanceof is not symmetric. Bloch's book recommends:

public class BadPoint {
  private int x, y;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (!(other instanceof BadPoint)) return false; // BAD
    BadPoint point = (BadPoint)other;
    return (x == point.x && y == point.y);
  }
  public int hashCode() {
    return x + y;
  }
}

Unfortunately, because this code is shorter (it combines the test for other == null and the other.getClass() into the same statement; null instanceof X is always false) and because it was recommended in Effective Java, it has become ingrained into many Java developers' style. This has become one of the most controversial points in the book, with many on-going discussions; see the interview at Artima for more.

The big problem with using instanceof is that it isn't symmetric. It surfaces when creating subclasses:

public class BadPoint3D extends BadPoint {
  private int z;
  public boolean equals(Object other) {
    if (!super.equals(other)) return false;
    if (!(other instanceof BadPoint3D)) return false; // BAD
    BadPoint3D point = (BadPoint3D)other;
    return (z == point.z);
  }
}

The problem occurs in that given instances badPoint and badPoint3D, we have point instanceof BadPoint3D == false, but point3D instanceof BadPoint == true.

This example shows a point in 3 dimensions, subclassing the 2D point we've already seen. By using the instanceof implementation, we can break symmetry:

BadPoint p1 = new BadPoint(1,1);
BadPoint p2 = new BadPoint3D(1,1,1);
// p1.equals(p2) == true;  // incorrect
// p2.equals(p1) == false; // correct

The assumption that all subclasses of BadPoint are equal if they have the same x and y values causes this problem. Using instanceof hides the fact that we no longer compare like-for-like; instead, we compare like-for-subtype.

Using the getClass() implementation gives the correct answer of false in both cases. It isn't possible in general cases to compare a 3D point with a 2D point, and this is the key factor in the test. Using the getClass() implementation ensures that you only compare like-for-like tests.

Note: The weak argument that hiding subtype information is desirable is based on the assumption that you would want to create a subclass of the data type for the sole purpose of adding/changing some methods (but not data) of the superclass. Whilst this happens in classes like Applet, this is a false argument; an Applet is not a data object—it's a code object. In fact, you never need to create a subclass of a datastructure for the purpose of embellishing any methods; this is a common mistake by those new to object-orientation, referred to as the "is-a/has-a" argument. In this case, you're not creating a subtype of the datastructure, you just want to add/modify its behavior. The correct solution is to write a separate class that delegates to the contained object. The rest of that discussion reaches beyond this article's scope.

Practical problems with symmetry

Does breaking the equals() method's reflexivity matter in practice? Well, as noted earlier, it is used in many of the low-level libraries that make up the collections classes, and they depend implicitly on this behavior. If we use the instanceof variant, many odd behaviors can occur during use:

Set data = new HashSet();
data.add(new BadPoint3D(1,1,1));
data.add(new BadPoint(1,1));
data.add(new BadPoint3D(1,1,2));
data.add(new BadPoint3D(1,2,3));
data.size(); // gives 3, not 4
data.contains(new BadPoint(1,2)); // returns true, not false

Of course, this behavior occurs when the BadPoint3D does not define its own hashCode(). However, it is not required to define one; provided the default hashCode() has been overridden, it fulfils the hashCode() method's contractual obligations.

Further, the order in which the BadPoint and BadPoint3D instances are added determines which ones are in the final set. So although it appears trivial that the symmetry is broken, it can actually cause some deep-rooted erroneous behavior. You can see this from the samples in Resources.

Total violation

In fact, it's not just symmetry that's broken in the comparisons. If the subclass also defines a hashCode() method, then you can end up with a situation in which two objects are equals() with each other, but give different hash codes—a total violation of the equality contact. Instead of just breaking one method, using instanceof can actually break two of them:

public class BadPoint3D extends BadPoint {
  private int z;
  public boolean equals(Object other) {
    if (!super.equals(other)) return false;
    if (!(other instanceof BadPoint3D)) return false; // BAD
    BadPoint3D point = (BadPoint3D)other;
    return (z == point.z);
  }
  public int hashCode() {
    return super.hashCode() + z;
  }
}
// BadPoint p1 = new BadPoint(1,2);
// BadPoint p2 = new BadPoint3D(1,2,3);
// p1.equals(p2) == true
// p1.hashCode() == 3;
// p2.hashCode() == 6;

Summary

Writing an implementation of equals() and hashCode() is usually the case of following a sample and adapting it to the fields defined in the data object. The correct implementation can provide performance benefits with many collections and other low-level libraries.

It has also been conclusively shown that using instanceof is bad practice because of the erroneous situations that can occur when using subclasses and any other objects that rely on the correct implementation of equals() and hashCode().

For completeness, the Point and Point3D classes and their full correct implementations are provided for comparison and convenience:

Point

public class Point {
  private static double version = 1.0;
  private transient double distance;
  private String name;
  private int x, y;
  public Point(String name, int x, int y) {
    this(x,y);
    this.name = name;
  }
  public Point(int x, int y) {
    this.x = x;
    this.y = y;
  }
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Point point = (Point)other;
    return (
      x == point.x &&
      y == point.y &&
      (name == point.name || 
        (name != null && name.equals(point.name)))
    );
  }
  public int hashCode() {
    return x ^ y;
  }
}

Point3D

Author Bio

Alex Blewitt is CEO of International Object Solutions Limited and Chartered Engineer of the Institute of Electrical Engineers. He has been working with Java since 1995 and is currently working towards a PhD at the University of Edinburgh involving verification of design patterns in Java. He currently lives in Milton Keynes with his wife Amy and two dogs Milly and Kea. For more on Alex and Amy, see http://www.bandlem.com/.