Excerpt from the C99 standard, normative annex F (The C++-standard does not explicitly mention this annex, though it includes all affected functions without change per reference. Also, the types have to match for compatibility.):
IEC 60559 floating-point arithmetic
F.1 Introduction
1 This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989), previously designated IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove dependencies on radix and word length. IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that defines
__STDC_IEC_559__shall conform to the specifications in this annex.356) Where a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified behavior is adopted by reference, unless stated otherwise. Since negative and positive infinity are representable in IEC 60559 formats, all real numbers lie within the range of representable values.
So, include <math.h> (or in C++ maybe <cmath>), and test for __STDC_IEC_559__.
If the macro is defined, not only are the types better specified (float being 32bits and double being 64bits among others), but also the behavior of builtin operators and standard-functions is more specified.
Lack of the macro does not give any guarantees.
For x86 and x86_64 (amd64), you can rely on the types float and double being IEC-60559-conformant, though functions using them and operations on them might not be.
Excerpt from the C99 standard, normative annex F (The C++-standard does not explicitly mention this annex, though it includes all affected functions without change per reference. Also, the types have to match for compatibility.):
IEC 60559 floating-point arithmetic
F.1 Introduction
1 This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989), previously designated IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove dependencies on radix and word length. IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that defines
__STDC_IEC_559__shall conform to the specifications in this annex.356) Where a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified behavior is adopted by reference, unless stated otherwise. Since negative and positive infinity are representable in IEC 60559 formats, all real numbers lie within the range of representable values.
So, include <math.h> (or in C++ maybe <cmath>), and test for __STDC_IEC_559__.
If the macro is defined, not only are the types better specified (float being 32bits and double being 64bits among others), but also the behavior of builtin operators and standard-functions is more specified.
Lack of the macro does not give any guarantees.
For x86 and x86_64 (amd64), you can rely on the types float and double being IEC-60559-conformant, though functions using them and operations on them might not be.
Does not say anything about the size.
3.9.1.8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.
What does it mean in C that data type "Double" is more "expensive" than data type "Float"
Can someone explain about double and why it is only 8 bytes in size?
float vs. double – What's the Key Difference?"
Difference between double and long double
Excerpt from the C99 standard, normative annex F (The C++-standard does not explicitly mention this annex, though it includes all affected functions without change per reference. Also, the types have to match for compatibility.):
IEC 60559 floating-point arithmetic
F.1 Introduction
1 This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989), previously designated IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove dependencies on radix and word length. IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that defines
__STDC_IEC_559__shall conform to the specifications in this annex.356) Where a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified behavior is adopted by reference, unless stated otherwise. Since negative and positive infinity are representable in IEC 60559 formats, all real numbers lie within the range of representable values.
So, include <math.h> (or in C++ maybe <cmath>), and test for __STDC_IEC_559__.
If the macro is defined, not only are the types better specified (float being 32bits and double being 64bits among others), but also the behavior of builtin operators and standard-functions is more specified.
Lack of the macro does not give any guarantees.
For x86 and x86_64 (amd64), you can rely on the types float and double being IEC-60559-conformant, though functions using them and operations on them might not be.
Videos
I was wondering why you wouldn't just use a double all the time instead of a float. I googled and found this as an explanation:
Double is more precise than float and can store 64 bits; double the number of bits float can store. We prefer double over float if we need to do precision up to 15 or 16 decimal points; otherwise, we can stick to float in most applications, as double is more expensive.
Difference between float and double in C/C++ | Coding Ninjas Blog
Can anyone clarify what being "more expensive" means in this context?