You'd think that Math.abs would return a non-negative integer. You'd be wrong. In fact, Math.abs(Integer.MIN_VALUE) = Integer.MIN_VALUE. The problem is that Integer.MIN_VALUE is -2147483648, and +2147483648 cannot be represented as a 32-bit signed integer (Integer.MAX_VALUE = 2147483647).
How can this bite you? It is not uncommon to see code such as
Object x = bucket[ Math.abs(y.hashCode()) % bucket.length]
Why is the call to Math.abs there? Because essentially all processors implement integer division as rounding towards zero, rather than towards negative infinity (as most mathmeticians and computer scientists would prefer). Java defines integer division the same way. As a result, a remainder computation can give a negative result if the dividend is negative (e.g., (-3)%2 == -1).
If we had just written
Object x = bucket[ y.hashCode() % bucket.length]
then roughly half the time, the remainder would be negative and we'd get an array index out of bound exception. So people will call Math.abs to obtain a non-negative dividend.
But as we've discussed, Math.abs doesn't guarantee a non-negative dividend. if the hashCode is Integer.MIN_VALUE, then the result of the Math.abs will be negative and we will likely get a negative remainder and in array index out of bounds exception. The same pattern/problem comes up when using random 32 bit signed integers rather than hashCodes.
I'm not sure whether to be relieved or worried that this problem would occur only about one time out of 4 billion. An error that occurs half the time is a lot easier to find via testing. An error that comes up once out of 4 billion will be almost impossible to reproduce or find through testing.
FindBugs 1.1 searches for this bug pattern. How often do we find it?
- 7 times in eclipse-SDK-3.2-solaris-gtk
- 6 times in jboss-4.0.2
- 6 times in eclipse-SDK-3.2-solaris-gtk
- twice in tomcat-4.1.31
- 9 times in BEA Weblogic 9.0
- 10 times in IBM WebSphere 6.0.3
The (quick) recommended way to fix this simply use a integer bitwise and to clear the high bit, ensuring that the result is non-negative.
Object x = bucket[ (y.hashCode() & Integer.MAX_VALUE) % bucket.length]
If you know that the length of the array is a power of two, you can also use:
Object x = bucket[ y.hashCode() & ( bucket.length - 1) ]
One quick word of warning: you might not want to assume that the hashCode function gives uniformly distributed values. Sun's HashMap using the following method to stir the bits of a hashCode to try to ensure a more uniformly distributed hash value:
/**
* Returns a hash value for the specified object. In addition to
* the object's own hashCode, this method applies a "supplemental
* hash function," which defends against poor quality hash functions.
* This is critical because HashMap uses power-of two length
* hash tables.
*
* The shift distances in this function were chosen as the result
* of an automated search over the entire four-dimensional search space.
*/
static int hash(Object x) {
int h = x.hashCode();
h += ~(h << 9);
h ^= (h >>> 14);
h += (h << 4);
h ^= (h >>> 10);
return h;
}
So, back to my original question: Is Math.abs broken? I'm not sure how it could be any different. There isn't any value that could be returned for
Math.abs(Integer.MIN_VALUE)
that would be correct. And making
Math.abs(Integer.MIN_VALUE)
throw exception wouldn't really improve the situation. So I wouldn't say it is broken; just dangerous.
Bill Pugh