Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
doublehas 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
floathas 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Can someone provide examples in Layman's terms?
The basic types in C are just there to standardize a single unit of memory (a block of bits) that can be used to store a value. We all know that a computer uses bits as its basic unit of information, however a bit is too small to be useful in and of itself, by defining different types we define different blocks of bits and how to interpret them. When we say "int a", the compiler knows that we are dealing with a basic, signed binary number, and knows the amount of memory it needs to set aside to store that number.
The biggest issue is that the exact size definitions are system dependent and not defined in the specifications.
-
A float and double are both implementations of floating point values in C. floating point numbers are a way to implement decimal and fractional values, of any magnitude in binary and also streamline their arithmetic. It is essentially a binary version of scientific notation. Imagine writing the number 3.14159 In decimal, The mantissa here would be 314159 and the exponent would be -5. You could define a simple function that takes the two numbers (stored in binary) and prints out their combined decimal value, your program could therefore store all decimal numbers as two different numbers in binary, and use its own functions to add them together, print them etc. However, C streamlines this process by giving you a simple container that does all that work for you behind the scene. A floating point standard describes how you would encode the mantissa and exponent into a single binary number. In C, the only difference between a float and a double is the amount of memory set aside to store your number, and thus the greater range and precision of the numbers that you can store. The benefit of this is that you can also describe large integer numbers, for instance, you would need 32 bits to write the number "4 billion" explicitly in binary, but you only need 8 to do the same in floating point (4 x 109). However, you would not be able to write the number 4,123,456,789 as an 8 bit floating point or even a 32 bit floating point number (as you still need to encode the exponent), but it will fit perfectly well in a 32 bit int.
-
A char and int are just basic binary numbers, their only difference being in length (potentially). A character is defined as the smallest unit of data necessary to hold a single text character for that architecture, this is a bit abstractly defined, but for most computer today running x86 platforms, a char is defined as 8 bits (although, on unicode systems it could be 16 bits). An integer is another basic binary number, but it is required to be at least 16 bits long (typically it is either 32 or 64 on modern systems). So a 8 bit char is capable of encoding 256 unique characters, or a number between 0-255 (or a number between -127 and 128) the difference is only in how you interpret the collection of bits. An 32 bit int can store a number between 0 and slightly over 4 billion, or you could half that range and use it to represent a number between (roughly) -2 billion and 2 billion
On some systems, the size of an int may be the same as the size of float, from the memory point of view they are just collections of bits, however the processor will interpret the values differently, and will actually use different circuits to add two numbers together if they are floats vs if they ints.
I don't know about examples, but they are simply different primitive types in C. Both double and float are for floating point numbers (e.g., numbers with fractional parts) and int and char are for whole numbers.
The reason there is more than one type for each class of number is because they take up a different amount of memory, and can therefore me bigger / more precise. There are actually quite a few more besides the four you've listed.
c++ - What is the difference between float and double? - Stack Overflow
What is the difference between int and double and how do you know when to use them?
floating point - Difference between decimal, float and double in .NET? - Stack Overflow
[C] How to know when to use a double vs float or a long vs int?
Videos
Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
doublehas 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
floathas 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:
There are three floating point types:
float,double, andlong double. The typedoubleprovides at least as much precision asfloat, and the typelong doubleprovides at least as much precision asdouble. The set of values of the typefloatis a subset of the set of values of the typedouble; the set of values of the typedoubleis a subset of the set of values of the typelong double.
The C++ standard adds:
The value representation of floating-point types is implementation-defined.
I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.
float (the C# alias for System.Single) and double (the C# alias for System.Double) are floating binary point types. float is 32-bit; double is 64-bit. In other words, they represent a number like this:
10001.10010110011
The binary number and the location of the binary point are both encoded within the value.
decimal (the C# alias for System.Decimal) is a floating decimal point type. In other words, they represent a number like this:
12345.65789
Again, the number and the location of the decimal point are both encoded within the value – that's what makes decimal still a floating point type instead of a fixed point type.
The important thing to note is that humans are used to representing non-integers in a decimal form, and expect exact results in decimal representations; not all decimal numbers are exactly representable in binary floating point – 0.1, for example – so if you use a binary floating point value you'll actually get an approximation to 0.1. You'll still get approximations when using a floating decimal point as well – the result of dividing 1 by 3 can't be exactly represented, for example.
As for what to use when:
For values which are "naturally exact decimals" it's good to use
decimal. This is usually suitable for any concepts invented by humans: financial values are the most obvious example, but there are others too. Consider the score given to divers or ice skaters, for example.For values which are more artefacts of nature which can't really be measured exactly anyway,
float/doubleare more appropriate. For example, scientific data would usually be represented in this form. Here, the original values won't be "decimally accurate" to start with, so it's not important for the expected results to maintain the "decimal accuracy". Floating binary point types are much faster to work with than decimals.
Precision is the main difference.
Float - 7 digits (32 bit)
Double-15-16 digits (64 bit)
Decimal -28-29 significant digits (128 bit)
Decimals have much higher precision and are usually used within financial applications that require a high degree of accuracy. Decimals are much slower (up to 20X times in some tests) than a double/float.
Decimals and Floats/Doubles cannot be compared without a cast whereas Floats and Doubles can. Decimals also allow the encoding or trailing zeros.
float flt = 1F/3;
double dbl = 1D/3;
decimal dcm = 1M/3;
Console.WriteLine("float: {0} double: {1} decimal: {2}", flt, dbl, dcm);
Result :
float: 0.3333333
double: 0.333333333333333
decimal: 0.3333333333333333333333333333
Sorry if this seems like a basic question, but I've been learning C over the past several weeks. In some of the tutorials I've been using, I've noticed some of the instructors just use int/float while others tend to default to double when declaring floating point variables.
Is there any sort of best practice in terms of deciding what size variable to use?
New to programming and curious: what's the main difference between float and double? When should I use one over the other?
Thanks! 🙏😊