Can someone provide examples in Layman's terms?
The basic types in C are just there to standardize a single unit of memory (a block of bits) that can be used to store a value. We all know that a computer uses bits as its basic unit of information, however a bit is too small to be useful in and of itself, by defining different types we define different blocks of bits and how to interpret them. When we say "int a", the compiler knows that we are dealing with a basic, signed binary number, and knows the amount of memory it needs to set aside to store that number.
The biggest issue is that the exact size definitions are system dependent and not defined in the specifications.
-
A float and double are both implementations of floating point values in C. floating point numbers are a way to implement decimal and fractional values, of any magnitude in binary and also streamline their arithmetic. It is essentially a binary version of scientific notation. Imagine writing the number 3.14159 In decimal, The mantissa here would be 314159 and the exponent would be -5. You could define a simple function that takes the two numbers (stored in binary) and prints out their combined decimal value, your program could therefore store all decimal numbers as two different numbers in binary, and use its own functions to add them together, print them etc. However, C streamlines this process by giving you a simple container that does all that work for you behind the scene. A floating point standard describes how you would encode the mantissa and exponent into a single binary number. In C, the only difference between a float and a double is the amount of memory set aside to store your number, and thus the greater range and precision of the numbers that you can store. The benefit of this is that you can also describe large integer numbers, for instance, you would need 32 bits to write the number "4 billion" explicitly in binary, but you only need 8 to do the same in floating point (4 x 109). However, you would not be able to write the number 4,123,456,789 as an 8 bit floating point or even a 32 bit floating point number (as you still need to encode the exponent), but it will fit perfectly well in a 32 bit int.
-
A char and int are just basic binary numbers, their only difference being in length (potentially). A character is defined as the smallest unit of data necessary to hold a single text character for that architecture, this is a bit abstractly defined, but for most computer today running x86 platforms, a char is defined as 8 bits (although, on unicode systems it could be 16 bits). An integer is another basic binary number, but it is required to be at least 16 bits long (typically it is either 32 or 64 on modern systems). So a 8 bit char is capable of encoding 256 unique characters, or a number between 0-255 (or a number between -127 and 128) the difference is only in how you interpret the collection of bits. An 32 bit int can store a number between 0 and slightly over 4 billion, or you could half that range and use it to represent a number between (roughly) -2 billion and 2 billion
On some systems, the size of an int may be the same as the size of float, from the memory point of view they are just collections of bits, however the processor will interpret the values differently, and will actually use different circuits to add two numbers together if they are floats vs if they ints.
I don't know about examples, but they are simply different primitive types in C. Both double and float are for floating point numbers (e.g., numbers with fractional parts) and int and char are for whole numbers.
The reason there is more than one type for each class of number is because they take up a different amount of memory, and can therefore me bigger / more precise. There are actually quite a few more besides the four you've listed.
c++ - What is the difference between float and double? - Stack Overflow
java - What is the difference between the float and integer data type when the size is the same? - Stack Overflow
[C] How to know when to use a double vs float or a long vs int?
int vs float ? - Unity Engine - Unity Discussions
Videos
Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
doublehas 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
floathas 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
while
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:
There are three floating point types:
float,double, andlong double. The typedoubleprovides at least as much precision asfloat, and the typelong doubleprovides at least as much precision asdouble. The set of values of the typefloatis a subset of the set of values of the typedouble; the set of values of the typedoubleis a subset of the set of values of the typelong double.
The C++ standard adds:
The value representation of floating-point types is implementation-defined.
I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.
floatstores floating-point values, that is, values that have potential decimal placesintonly stores integral values, that is, whole numbers
So while both are 32 bits wide, their use (and representation) is quite different. You cannot store 3.141 in an integer, but you can in a float.
Dissecting them both a little further:
In an integer, all bits except the leftmost one are used to store the number value. This is (in Java and many computers too) done in the so-called two's complement, which support negatives values. Two's complement uses the leftmost bit to store the positive (0) or negative sign (1). This basically means that you can represent the values of −231 to 231 − 1.
In a float, those 32 bits are divided between three distinct parts: The sign bit, the exponent and the mantissa. They are laid out as follows:
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
There is a single bit that determines whether the number is negative or non-negative (zero is neither positive nor negative, but has the sign bit set to zero). Then there are eight bits of an exponent and 23 bits of mantissa. To get a useful number from that, (roughly) the following calculation is performed:
M × 2E
(There is more to it, but this should suffice for the purpose of this discussion)
The mantissa is in essence not much more than a 24-bit integer number. This gets multiplied by 2 to the power of the exponent part, which, roughly, is a number between −128 and 127.
Therefore you can accurately represent all numbers that would fit in a 24-bit integer but the numeric range is also much greater as larger exponents allow for larger values. For example, the maximum value for a float is around 3.4 × 1038 whereas int only allows values up to 2.1 × 109.
But that also means, since 32 bits only have 4.2 × 109 different states (which are all used to represent the values int can store), that at the larger end of float's numeric range the numbers are spaced wider apart (since there cannot be more unique float numbers than there are unique int numbers). You cannot represent some numbers exactly, then. For example, the number 2 × 1012 has a representation in float of 1,999,999,991,808. That might be close to 2,000,000,000,000 but it's not exact. Likewise, adding 1 to that number does not change it because 1 is too small to make a difference in the larger scales float is using there.
Similarly, you can also represent very small numbers (between 0 and 1) in a float but regardless of whether the numbers are very large or very small, float only has a precision of around 6 or 7 decimal digits. If you have large numbers those digits are at the start of the number (e.g. 4.51534 × 1035, which is nothing more than 451534 follows by 30 zeroes – and float cannot tell anything useful about whether those 30 digits are actually zeroes or something else), for very small numbers (e.g. 3.14159 × 10−27) they are at the far end of the number, way beyond the starting digits of 0.0000...
Floats are used to store a wider range of number than can be fit in an integer. These include decimal numbers and scientific notation style numbers that can be bigger values than can fit in 32 bits. Here's the deep dive into them: http://en.wikipedia.org/wiki/Floating_point
Sorry if this seems like a basic question, but I've been learning C over the past several weeks. In some of the tutorials I've been using, I've noticed some of the instructors just use int/float while others tend to default to double when declaring floating point variables.
Is there any sort of best practice in terms of deciding what size variable to use?
The default choice for a floating-point type should be double. This is also the type that you get with floating-point literals without a suffix or (in C) standard functions that operate on floating point numbers (e.g. exp, sin, etc.).
float should only be used if you need to operate on a lot of floating-point numbers (think in the order of thousands or more) and analysis of the algorithm has shown that the reduced range and accuracy don't pose a problem.
long double can be used if you need more range or accuracy than double, and if it provides this on your target platform.
In summary, float and long double should be reserved for use by the specialists, with double for "every-day" use.
There is rarely cause to use float instead of double in code targeting modern computers. The extra precision reduces (but does not eliminate) the chance of rounding errors or other imprecision causing problems.
The main reasons I can think of to use float are:
- You are storing large arrays of numbers and need to reduce your program's memory consumption.
- You are targeting a system that doesn't natively support double-precision floating point. Until recently, many graphics cards only supported single precision floating points. I'm sure there are plenty of low-power and embedded processors that have limited floating point support too.
- You are targeting hardware where single-precision is faster than double-precision, and your application makes heavy use of floating point arithmetic. On modern Intel CPUs I believe all floating point calculations are done in double precision, so you don't gain anything here.
- You are doing low-level optimization, for example using special CPU instructions that operate on multiple numbers at a time.
So, basically, double is the way to go unless you have hardware limitations or unless analysis has shown that storing double precision numbers is contributing significantly to memory usage.
I am learning Java, and am confuse on when to choose double or float for my real numbers or int. It feels like, it doesn’t matter because from my limited experience (with Java) both of them deliver the same results, but I don’t want to go further down the learning curve with Java and have a bad habit of using either messing up my code, and not having a clue as to why. So, when should you use float and double?