How would this value help in decreasing collisions?
Hash collision is primarily achieved by a good distribution across the whole hash range (here the integer type).
By defining 0 as the initial value for calculating the hash result, you have a somewhat restricted distribution in a small range. Objects that differ in a minor way - maybe in some field only - produce hash codes that are not far away from each other. This makes hash collisions more likely.
By defining a non-zero initial value, you simply increase the gaps between calculated hash codes for objects that differ only in a minor way. So you better utilize the hash range and effectively make hash collisions more unlikely.
What does he means by saying that the exact value returned by hashCode is a function of the instance value?
It simply means that you should calculate the hash code by using the object's value, i.e. the values of its fields. You already did it in your example, and I think that you already implicitly understood it.
But: Joshua Bloch intended to say something else with this paragraph: He wanted to warn you about not documenting the exact function how the hash code is calculated. If you do so, you restrict yourself to not being able anymore to change the implementation in future releases because some users might expect a specific implementation, and you would break some code depending on yours.
Answer from Seelenvirtuose on Stack OverflowCan someone ELI5 the hashcode() method for me?
Designing hashCode method Java - Stack Overflow
java - What is the hashCode for a custom class having just two int properties? - Stack Overflow
What is the use of hashCode in Java? - Stack Overflow
Videos
I get the equals() method, and have used it to specify a more specifc equality check on objects, but I still can't grasp why the need for hashcode!
Any good explanations appreciated.
How would this value help in decreasing collisions?
Hash collision is primarily achieved by a good distribution across the whole hash range (here the integer type).
By defining 0 as the initial value for calculating the hash result, you have a somewhat restricted distribution in a small range. Objects that differ in a minor way - maybe in some field only - produce hash codes that are not far away from each other. This makes hash collisions more likely.
By defining a non-zero initial value, you simply increase the gaps between calculated hash codes for objects that differ only in a minor way. So you better utilize the hash range and effectively make hash collisions more unlikely.
What does he means by saying that the exact value returned by hashCode is a function of the instance value?
It simply means that you should calculate the hash code by using the object's value, i.e. the values of its fields. You already did it in your example, and I think that you already implicitly understood it.
But: Joshua Bloch intended to say something else with this paragraph: He wanted to warn you about not documenting the exact function how the hash code is calculated. If you do so, you restrict yourself to not being able anymore to change the implementation in future releases because some users might expect a specific implementation, and you would break some code depending on yours.
See this example:
String a = "Abc";
String b = "Abc";
String c = "Pqr";
System.out.println(" "+a.hashCode()+" "+b.hashCode()+" "+c.hashCode());
Output: 65602 65602 80497
Which clearly shows that hashCode() of string depends on values.
Extract from hashCode() documentation:
int java.lang.String.hashCode()
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
You can't change the type of hashCode, nor should you want to.
I'd just go with something like:
public int hashCode() {
return x * 31 + y;
}
Note that this means that (a, b) is different to (b, a) for most cases (unlike e.g. adding or XOR-ing). This can be useful if you often end up with keys for the "switched" values in real life.
It isn't unique - but hash codes don't have to be. They just have to be the same for equal values (for correctness), and (for efficiency) "usually" different for non-equal values, with a reasonable distribution.
In general, I usually follow the same kind of pattern as Josh Bloch suggests in Effective Java:
public int hashCode() {
int hash = 17;
hash = hash * 31 + field1Hash;
hash = hash * 31 + field2Hash;
hash = hash * 31 + field3Hash;
hash = hash * 31 + field4Hash;
...
return hash;
}
Where field1Hash would be the hash code for reference type fields (or 0 for a null reference), the int itself for int values, some sort of hash from 64 bits to 32 for long etc.
EDIT: I can't remember the details of why 31 and 17 work well together. The fact that they're both prime may be useful - but from what I remember, the maths behind why hashes like this are generally reasonable (though not as good as hashes where the distribution of likely values is known in advance) is either difficult or not well understood. I know that multiplying by 31 is cheap (shift left 5 and subtract the original value)...
I know that it is ok for non-equal objects to have the same hashcodes. However, the more collisions, the worse the performance will be (for example, in a hash table).
As far as I know, the best mapping from Zยฒ โ Z is the "elegant pairing function" (google it). Here is the implementation
// x,y must be non-negative
int elegant(int x, int y) {
return x < y ? y * y + x : x * x + x + y;
}
// returns a unique number for every x,y pair
int elegantSigned(int x, int y) {
if (x < 0) {
if (y < 0)
return 3 + 4 * elegant(-x - 1, -y - 1);
return 2 + 4 * elegant(-x - 1, y);
}
if (y < 0)
return 1 + 4 * elegant(x, -y - 1);
return 4 * elegant(x, y);
}
This will begin to overlap as soon as you get multiplication overflow. If the absolute value of x and y is less than about 46000, then this will have zero hash collisions.
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by
java.util.Hashtable.The general contract of
hashCodeis:
Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCodemethod must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.If two objects are equal according to the
equals(Object)method, then calling thehashCodemethod on each of the two objects must produce the same integer result.It is not required that if two objects are unequal according to the
equals(java.lang.Object)method, then calling thehashCodemethod on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
You're breaking one of the hashCode() taboos; you're using its output as a key identifier. That's wrong.
You see, the output of hashCode() (in its default implementation) is a 32-bit unsigned integer, that's roughly 4 billion unique hashCodes. Sounds quite a lot? Well, not so much. Applying the birthday problem in this case shows as that with about 77000 objects, you have about 50% chance of collision. 50% chance of two objects having the the same hashCode.
Another issue is that the implementation of hashCode() can change from one Java version to the other. It's not meant to be permanent identifier of an object, so there's nothing forcing it to be consistent across versions.
If you insist on using hashes as object identifiers, then it's much better to have your own method instead of hashCode() to use for your key identifiers (for example, getMySpecialHashKey(). You can uses something like MessageDigest.getInstance("SHA-256") to digest the object into a nice 256-bit key.
My recommendation: Ditch the whole idea of hashing the object and rather generate a random identifier for your object when you construct it. Something along the lines of (in Java)
SecureRandom secRand = new SecureRandom();
byte[] objIdBytes = new byte[16]; //128-bit
secRand.nextBytes(objIdBytes);
String objId = Base64.encodeBase64String(objIdBytes); //Here's your ID
You also seem to bind access to an object only to knowledge of its key. That's also wrong. A proper permission-based model with proper authentication and authorization is needed.
Java hashCode() was never intended to be used like this. Don't do ever it! It is even legal (while not recommended) for all instances of a class to return the same hashCode. The contract in Java is "two objects that are considered equal must have same hashcode". No more, no less. It would for example be valid to return hashcode 1 for all uneven numbers and 0 for even ones.
hashCode is used for faster lookup in Collections such as HashMap and collisions are expected.
Many classes define their own version of hashCode(), and if you know the code, you can often easily tell or guess the hashcode.
So, this is absolutely insecure.