Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
doublehas 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
floathas 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Can someone provide examples in Layman's terms?
The basic types in C are just there to standardize a single unit of memory (a block of bits) that can be used to store a value. We all know that a computer uses bits as its basic unit of information, however a bit is too small to be useful in and of itself, by defining different types we define different blocks of bits and how to interpret them. When we say "int a", the compiler knows that we are dealing with a basic, signed binary number, and knows the amount of memory it needs to set aside to store that number.
The biggest issue is that the exact size definitions are system dependent and not defined in the specifications.
-
A float and double are both implementations of floating point values in C. floating point numbers are a way to implement decimal and fractional values, of any magnitude in binary and also streamline their arithmetic. It is essentially a binary version of scientific notation. Imagine writing the number 3.14159 In decimal, The mantissa here would be 314159 and the exponent would be -5. You could define a simple function that takes the two numbers (stored in binary) and prints out their combined decimal value, your program could therefore store all decimal numbers as two different numbers in binary, and use its own functions to add them together, print them etc. However, C streamlines this process by giving you a simple container that does all that work for you behind the scene. A floating point standard describes how you would encode the mantissa and exponent into a single binary number. In C, the only difference between a float and a double is the amount of memory set aside to store your number, and thus the greater range and precision of the numbers that you can store. The benefit of this is that you can also describe large integer numbers, for instance, you would need 32 bits to write the number "4 billion" explicitly in binary, but you only need 8 to do the same in floating point (4 x 109). However, you would not be able to write the number 4,123,456,789 as an 8 bit floating point or even a 32 bit floating point number (as you still need to encode the exponent), but it will fit perfectly well in a 32 bit int.
-
A char and int are just basic binary numbers, their only difference being in length (potentially). A character is defined as the smallest unit of data necessary to hold a single text character for that architecture, this is a bit abstractly defined, but for most computer today running x86 platforms, a char is defined as 8 bits (although, on unicode systems it could be 16 bits). An integer is another basic binary number, but it is required to be at least 16 bits long (typically it is either 32 or 64 on modern systems). So a 8 bit char is capable of encoding 256 unique characters, or a number between 0-255 (or a number between -127 and 128) the difference is only in how you interpret the collection of bits. An 32 bit int can store a number between 0 and slightly over 4 billion, or you could half that range and use it to represent a number between (roughly) -2 billion and 2 billion
On some systems, the size of an int may be the same as the size of float, from the memory point of view they are just collections of bits, however the processor will interpret the values differently, and will actually use different circuits to add two numbers together if they are floats vs if they ints.
I don't know about examples, but they are simply different primitive types in C. Both double and float are for floating point numbers (e.g., numbers with fractional parts) and int and char are for whole numbers.
The reason there is more than one type for each class of number is because they take up a different amount of memory, and can therefore me bigger / more precise. There are actually quite a few more besides the four you've listed.
c++ - What is the difference between float and double? - Stack Overflow
What is the difference between int and double and how do you know when to use them?
[C] How to know when to use a double vs float or a long vs int?
floating point - Difference between decimal, float and double in .NET? - Stack Overflow
Videos
Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
doublehas 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
floathas 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:
There are three floating point types:
float,double, andlong double. The typedoubleprovides at least as much precision asfloat, and the typelong doubleprovides at least as much precision asdouble. The set of values of the typefloatis a subset of the set of values of the typedouble; the set of values of the typedoubleis a subset of the set of values of the typelong double.
The C++ standard adds:
The value representation of floating-point types is implementation-defined.
I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.
Sorry if this seems like a basic question, but I've been learning C over the past several weeks. In some of the tutorials I've been using, I've noticed some of the instructors just use int/float while others tend to default to double when declaring floating point variables.
Is there any sort of best practice in terms of deciding what size variable to use?
float (the C# alias for System.Single) and double (the C# alias for System.Double) are floating binary point types. float is 32-bit; double is 64-bit. In other words, they represent a number like this:
10001.10010110011
The binary number and the location of the binary point are both encoded within the value.
decimal (the C# alias for System.Decimal) is a floating decimal point type. In other words, they represent a number like this:
12345.65789
Again, the number and the location of the decimal point are both encoded within the value – that's what makes decimal still a floating point type instead of a fixed point type.
The important thing to note is that humans are used to representing non-integers in a decimal form, and expect exact results in decimal representations; not all decimal numbers are exactly representable in binary floating point – 0.1, for example – so if you use a binary floating point value you'll actually get an approximation to 0.1. You'll still get approximations when using a floating decimal point as well – the result of dividing 1 by 3 can't be exactly represented, for example.
As for what to use when:
For values which are "naturally exact decimals" it's good to use
decimal. This is usually suitable for any concepts invented by humans: financial values are the most obvious example, but there are others too. Consider the score given to divers or ice skaters, for example.For values which are more artefacts of nature which can't really be measured exactly anyway,
float/doubleare more appropriate. For example, scientific data would usually be represented in this form. Here, the original values won't be "decimally accurate" to start with, so it's not important for the expected results to maintain the "decimal accuracy". Floating binary point types are much faster to work with than decimals.
Precision is the main difference.
Float - 7 digits (32 bit)
Double-15-16 digits (64 bit)
Decimal -28-29 significant digits (128 bit)
Decimals have much higher precision and are usually used within financial applications that require a high degree of accuracy. Decimals are much slower (up to 20X times in some tests) than a double/float.
Decimals and Floats/Doubles cannot be compared without a cast whereas Floats and Doubles can. Decimals also allow the encoding or trailing zeros.
float flt = 1F/3;
double dbl = 1D/3;
decimal dcm = 1M/3;
Console.WriteLine("float: {0} double: {1} decimal: {2}", flt, dbl, dcm);
Result :
float: 0.3333333
double: 0.333333333333333
decimal: 0.3333333333333333333333333333
New to programming and curious: what's the main difference between float and double? When should I use one over the other?
Thanks! 🙏😊
There are two reasons why you should be concerned with the different numerical data types.
1. Saving memory
for(long k=0;k<=10;k++)
{
//stuff
}
Why use a long when it could just as easily be an integer, or even a byte? You would indeed save several bytes of memory by doing so.
2. Floating point numbers and integer numbers are stored differently in the computer
Suppose we have the number 22 stored in an integer. The computer stores this number in memory in binary as:
0000 0000 0000 0000 0000 0000 0001 0110
If you're not familiar with the binary number system this can be represented in scientific notation as: 2^0*0+2^1*1+2^2*1+2^3*0+2^4*1+2^5*0+...+2^30*0. The last bit may or may not be used to indicate if the number is negative (depending if the data type is signed or unsigned).
Essentially, it's just a summation of 2^(bit place)*value.
This changes when you are referring to values involving a decimal point. Suppose you have the number 3.75 in decimal. This is referred to as 11.11 in binary. We can represent this as a scientific notation as 2^1*1+2^0*1+2^-1*1+2^-2*1 or, normalized, as 1.111*2^2
The computer can't store that however: it has no explicit method of expressing that binary point (the binary number system version of the decimal point). The computer can only stores 1's and 0's. This is where the floating point data type comes in.
Assuming the sizeof(float) is 4 bytes, then you have a total of 32 bits. The first bit is assigned the "sign bit". There are no unsigned floats or doubles. The next 8 bits are used for the "exponent" and the final 23 bits are used as the "significand" (or sometimes referred to as the mantissa). Using our 3.75 example, our exponent would be 2^1 and our significand would be 1.111.
If the first bit is 1, the number is negative. If not, positive. The exponent is modified by something called "the bias", so we can't simply store "0000 0010" as the exponent. The bias for a single precision floating point number is 127, and the bias for a double precision (this is where the double datatype gets its name) is 1023. The final 23 bits are reserved for the significand. The significand is simply the values to the RIGHT of our binary point.
Our exponent would be the bias (127) + exponent (1) or represented in binary
1000 0000
Our significand would be:
111 0000 0000 0000 0000 0000
Therefore, 3.75 is represented as:
0100 0000 0111 0000 0000 0000 0000 0000
Now, let's look at the number 8 represented as a floating point number and as an integer number:
0100 0001 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 1000
How in the world is the computer going to add 8.0 and 8? Or even multiply them!? The computer (more specifically, x86 computers) have different portions of the CPU that add floating point numbers and integer numbers.
Back before we had gigabyte systems (or on modern embedded systems like Arduino), memory was at a premium and so shorthand methods were implemented to specify how much memory a particular number would take up - BIT is straightforward - it would originally occupy only 1 bit of memory.
The other data sizes and names vary between systems. On a 32-bit system, INT (or MEDIUMINT) would generally be 2 bytes, LONGINT would be 4 bytes, and SMALLINT would be a single byte. 64-bit systems can have LONGINT set at 8-bytes.
Even now - especially in databases applications, or programs that have multiple instances running on servers (like server side scripts on websites) - you should be careful about what you choose. Picking a 2, 4, or 8-byte wide integer to store values between 0 and 100 (which can fit in one byte) is incredibly wasteful if you have a database table with millions of records.
More information: https://en.wikipedia.org/wiki/Integer_(computer_science)