The absolute quantity of information that you can store in 64 bit is of course the same.
What changes is the meaning you assign to the bits.
In an integer or long variable, the codification used is the same you use for decimal numbers in your normal life, with the exception of the fact that number two complement is used, but this doesn't change that much, since it's only a trick to gain an additional number (while storing just one zero instead that a positive and a negative).
In a float or double variable, bits are split in two kinds: the mantissa and the exponent. This means that every double number is shaped like XXXXYYYYY where it's numerical value is something like XXXX*2^YYYY. Basically you decide to encode them in a different way, what you obtain is that you have the same amount of values but they are distribuited in a different way over the whole set of real numbers.
The fact that the largest/smallest value of a floating number is larger/smaller of the largest/smalles value of a integer number doesn't imply anything on the amount of data effectively stored.
Answer from Jack on Stack OverflowThe absolute quantity of information that you can store in 64 bit is of course the same.
What changes is the meaning you assign to the bits.
In an integer or long variable, the codification used is the same you use for decimal numbers in your normal life, with the exception of the fact that number two complement is used, but this doesn't change that much, since it's only a trick to gain an additional number (while storing just one zero instead that a positive and a negative).
In a float or double variable, bits are split in two kinds: the mantissa and the exponent. This means that every double number is shaped like XXXXYYYYY where it's numerical value is something like XXXX*2^YYYY. Basically you decide to encode them in a different way, what you obtain is that you have the same amount of values but they are distribuited in a different way over the whole set of real numbers.
The fact that the largest/smallest value of a floating number is larger/smaller of the largest/smalles value of a integer number doesn't imply anything on the amount of data effectively stored.
A double can store a larger number by having larger intervals between the numbers it can store, essentially. Not every integer in the range of a double is representable by that double.
More specifically, a double has one bit (S) to store sign, 11 bits to store an exponent E, and 52 bits of precision, in what is called the mantissa (M).
For most numbers (There are some special cases), a double stores the number (-1)^S * (1 + (M * 2^{-52})) * 2^{E - 1023}, and as such, when E is large, changing M by one will make a much larger change in the size of the resulting number than one. These large gaps are what give doubles a larger range than longs.
Videos
I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
Good question. What you're seeing when you see 63823 and 1.0 is a representation of the underlying data, you are not seeing the underlying data. It is specially formatted so that you can read it, but it is not how the machine sees it.
Java uses very special formats for representing int and double. You need to look at those representations to understand why 63823 takes thirty-two bits when represented as a Java int and 1.0 takes sixty-four bits when represented as a Java double.
In particular, 63823 as an int in Java is represented as:
00000000000000001111100101001111
and 1.0 as a double is represented in Java as:
0011111111110000000000000000000000000000000000000000000000000000
If you want to explore more, I recommend Two's Complement and What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Not exactly. The double 1.0 represents more information because, by the definition of a double as a 64 bit float, there are more values that it could be. To use your example, if you had a special data type that could only have two values, 63823 and 98321234213474932, then it would only take 1 bit to represent the number 63823, though it would be far less useful than an int.
In terms of implementation, it's often a lot easier and faster to work with fixed-size data types, so that you can allocate a fixed chunk of memory (that's what a variable is) without having to know it's value and constantly reallocate space. Examples of a variables with a different approach would be String and BigInteger, which do allocate space to accommodate their values. Note that both are immutable in Java -- that's not a coincidence.
See @Frank Kusters' answer, below!
(My original answer here was for Java versions < 8.)
Since Java 8, all wrapper classes of primitive types (except Boolean) have a BYTES field. So in your case:
int size = numDouble * Double.BYTES + numInt * Integer.BYTES;
Documentation: http://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html
Or even simpler,
import java.nio.ByteBuffer;
public static byte[] toByteArray(double value) {
byte[] bytes = new byte[8];
ByteBuffer.wrap(bytes).putDouble(value);
return bytes;
}
public static double toDouble(byte[] bytes) {
return ByteBuffer.wrap(bytes).getDouble();
}
long bits = Double.doubleToLongBits(myDouble);
UTF 8 characters byte length can be between 1 to 4 bytes. So your code is printing whatever is the correct byte length for the input japanese character.
I believe the code point for that character is 0x5927, which when represented as UTF-8 is the three bytes E5 A4 A7. (Not all non-ASCII characters take 3 bytes in UTF-8, only those with code points in the range of 0x0800 and 0xFFFF.)
With the possible exception of "short", which arguably is a bit of a waste of space-- sometimes literally, they're all horses for courses:
- Use an int when you don't need fractional numbers and you've no reason to use anything else; on most processors/OS configurations, this is the size of number that the machine can deal with most efficiently;
- Use a double when you need fractional numbers and you've no reason to use anything else;
- Use a char when you want to represent a character (or possibly rare cases where you need two-byte unsigned arithmetic);
- Use a byte if either you specifically need to manipulate a signed byte (rare!), or when you need to move around a block of bytes;
- Use a boolean when you need a simple "yes/no" flag;
- Use a long for those occasions where you need a whole number, but where the magnitude could exceed 2 billion (file sizes, time measurements in milliseconds/nanoseconds, in advanced uses for compacting several pieces of data into a single number);
- Use a float for those rare cases where you either (a) are storing a huge number of them and the memory saving is worthwhile, or (b) are performing a massive number of calculations, and can afford the loss in accuracy. For most applications, "float" offers very poor precision, but operations can be twice as fast -- it's worth testing this on your processor, though, to find that it's actually the case! [*]
- Use a short if you really need 2-byte signed arithmetic. There aren't so many cases...
[*] For example, in Hotspot on Pentium architectures, float and double operations generally take exactly the same time, except for division.
Don't get too bogged down in the memory usage of these types unless you really understand it. For example:
- every object size is rounded to 16 bytes in Hotspot, so an object with a single byte field will take up precisely the same space as a single object with a long or double field;
- when passing parameters to a method, every type takes up 4 or 8 bytes on the stack: you won't save anything by changing a method parameter from, say, an int to a short! (I've seen people do this...)
Obviously, there are certain API calls (e.g. various calls for non-CPU intensive tasks that for some reason take floats) where you just have to pass it the type that it asks for...!
Note that String isn't a primitive type, so it doesn't really belong in this list.
A java int is 32 bits, while a long is 64 bits, so when you need to represent integers larger than 2^31, long is your friend. For a typical example of the use of long, see System.currentTimeMillis()
A byte is 8 bits, and the smallest addressable entity on most modern hardware, so it is needed when reading binary data from a file.
A double has twice the size of a float, so you would usually use a double rather than a float, unless you have some restrictions on size or speed and a float has sufficient capacity.
A short is two bytes, 16 bits. In my opinion, this is the least necessary datatype, and I haven't really seen that in actual code, but again, it might be useful for reading binary file formats or doing low level network protocols. For example ip port numbers are 16 bit.
Char represents a single character, which is 16 bits. This is the same size as a short, but a short is signed (-32768 to 32767) while a char is unsigned (0 to 65535). (This means that an ip port number probably is more correctly represented as a char than a short, but this seems to be outside the intended scope for chars...)
For the really authorative source on these details, se the java language specification.
Okay, there's been a lot of discussion and not a lot of code :)
Here's a quick benchmark. It's got the normal caveats when it comes to this kind of thing - testing memory has oddities due to JITting etc, but with suitably large numbers it's useful anyway. It has two types, each with 80 members - LotsOfBytes has 80 bytes, LotsOfInts has 80 ints. We build lots of them, make sure they're not GC'd, and check memory usage:
class LotsOfBytes
{
byte a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aa, ab, ac, ad, ae, af;
byte b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf;
byte c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, ca, cb, cc, cd, ce, cf;
byte d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, da, db, dc, dd, de, df;
byte e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, ea, eb, ec, ed, ee, ef;
}
class LotsOfInts
{
int a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aa, ab, ac, ad, ae, af;
int b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf;
int c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, ca, cb, cc, cd, ce, cf;
int d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, da, db, dc, dd, de, df;
int e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, ea, eb, ec, ed, ee, ef;
}
public class Test
{
private static final int SIZE = 1000000;
public static void main(String[] args) throws Exception
{
LotsOfBytes[] first = new LotsOfBytes[SIZE];
LotsOfInts[] second = new LotsOfInts[SIZE];
System.gc();
long startMem = getMemory();
for (int i=0; i < SIZE; i++)
{
first[i] = new LotsOfBytes();
}
System.gc();
long endMem = getMemory();
System.out.println ("Size for LotsOfBytes: " + (endMem-startMem));
System.out.println ("Average size: " + ((endMem-startMem) / ((double)SIZE)));
System.gc();
startMem = getMemory();
for (int i=0; i < SIZE; i++)
{
second[i] = new LotsOfInts();
}
System.gc();
endMem = getMemory();
System.out.println ("Size for LotsOfInts: " + (endMem-startMem));
System.out.println ("Average size: " + ((endMem-startMem) / ((double)SIZE)));
// Make sure nothing gets collected
long total = 0;
for (int i=0; i < SIZE; i++)
{
total += first[i].a0 + second[i].a0;
}
System.out.println(total);
}
private static long getMemory()
{
Runtime runtime = Runtime.getRuntime();
return runtime.totalMemory() - runtime.freeMemory();
}
}
Output on my box:
Size for LotsOfBytes: 88811688
Average size: 88.811688
Size for LotsOfInts: 327076360
Average size: 327.07636
0
So obviously there's some overhead - 8 bytes by the looks of it, although somehow only 7 for LotsOfInts (? like I said, there are oddities here) - but the point is that the byte fields appear to be packed in for LotsOfBytes such that it takes (after overhead removal) only a quarter as much memory as LotsOfInts.
Yes, a byte variable in Java is in fact 4 bytes in memory. However this doesn't hold true for arrays. The storage of a byte array of 20 bytes is in fact only 20 bytes in memory.
That is because the Java Bytecode Language only knows two integer number types: ints and longs. So it must handle all numbers internally as either type and these types are 4 and 8 bytes in memory.
However, Java knows arrays with every integer number format. So the storage of short arrays is in fact two bytes per entry and one byte per entry for byte arrays.
The reason why I keep saying "the storage of" is that an array is also an object in Java and every object requires multiple bytes of storage on its own, regardless of the storage that instance variables or the array storage in case of arrays require.