Using the localization services
The default locale
The standard library locale module is Python's interface to C-based localization routines.
The basic usage is:
import locale
locale.atof('123,456')
In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.
However, by default, this will not work:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/locale.py", line 326, in atof
return func(delocalize(string))
ValueError: could not convert string to float: '123,456'
This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:
Initially, when a program is started, the locale is the
Clocale, no matter what the user’s preferred locale is. There is one exception: theLC_CTYPEcategory is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by callingsetlocale(LC_ALL, '').
That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.
Using a locale from the environment
Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:
>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789
The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:
>>> locale.atof('12,34,56.789')
123456.789
In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:
>>> locale.atof('12_34_56.789')
123456.789
On the other side, the string format method, and f-strings, are locale-aware if the n format is used:
>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'
Without the previous setlocale call, the output would not have the comma.
Setting a locale explicitly
It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.
Here are some examples:
>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'
The necessary locale strings may depend on your operating system, and may require additional work to enable.
To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').
Caveats
Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:
It is generally a bad idea to call
setlocale()in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with
time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-Clocale settings.
When the Python code is embedded within a C program, setting the locale can even affect the C code:
Extension modules should never call
setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale isC).
(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)
So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.
However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.
Using the localization services
The default locale
The standard library locale module is Python's interface to C-based localization routines.
The basic usage is:
import locale
locale.atof('123,456')
In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.
However, by default, this will not work:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/locale.py", line 326, in atof
return func(delocalize(string))
ValueError: could not convert string to float: '123,456'
This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:
Initially, when a program is started, the locale is the
Clocale, no matter what the user’s preferred locale is. There is one exception: theLC_CTYPEcategory is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by callingsetlocale(LC_ALL, '').
That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.
Using a locale from the environment
Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:
>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789
The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:
>>> locale.atof('12,34,56.789')
123456.789
In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:
>>> locale.atof('12_34_56.789')
123456.789
On the other side, the string format method, and f-strings, are locale-aware if the n format is used:
>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'
Without the previous setlocale call, the output would not have the comma.
Setting a locale explicitly
It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.
Here are some examples:
>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'
The necessary locale strings may depend on your operating system, and may require additional work to enable.
To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').
Caveats
Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:
It is generally a bad idea to call
setlocale()in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with
time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-Clocale settings.
When the Python code is embedded within a C program, setting the locale can even affect the C code:
Extension modules should never call
setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale isC).
(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)
So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.
However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.
Just remove the , with replace():
float("123,456.908".replace(',',''))
Videos
>>> float("123,000.12")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): 123,000.12
>>> Is there a standard Pythonic way to convert string like "123,000.12" to float?
Regular float() and locale.atof() do not handle "," well.
As an alternative to beerbajay's excellent answer, simple string formatting works in 2.7+, without requiring an import:
>>> '{0:,.2f}'.format(24322.34)
'24,322.34'
You can use the locale.format() function to do this:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.utf8')
'en_US.utf8'
>>> locale.format("%.2f", 100028282.23, grouping=True)
'100,028,282.23'
Note that you have to give the precision: %.2f
Alternatively you can use the locale.currency() function, which follow the LC_MONETARY settings:
>>> locale.currency(100028282.23)
'$100028282.23'
because I don't know the locale settings
You could look that up using the locale module:
>>> locale.nl_langinfo(locale.RADIXCHAR)
'.'
or
>>> locale.localeconv()['decimal_point']
'.'
Using that, your code could become:
import locale
_locale_radix = locale.localeconv()['decimal_point']
def read_float_with_comma(num):
if _locale_radix != '.':
num = num.replace(_locale_radix, ".")
return float(num)
Better still, the same module has a conversion function for you, called atof():
import locale
def read_float_with_comma(num):
return locale.atof(num)
You can use locale.atof
import locale
locale.atof('12.3')
http://docs.python.org/2/library/locale.html