You could use a function like this:

def nan_ints(df, convert_strings=False, subset=None):
    types = ["int64", "float64"]
    if subset is None:
        subset = list(df)
    if convert_strings:
        types.append("object")
    for col in subset:
        if df[col].dtype in types:
            df[col] = (
                df[col].astype(float, errors="ignore").astype("Int64", errors="ignore")
            )
    return df

It iterates through each column and coverts it to an Int64 if it is a int. If it's a float it will convert to a Int64 only if all of the values in the column could be converted to ints other than the NaN's. I've given you the option to convert strings to Int64 as well with the convert_strings argument.

df1 = pd.DataFrame({'a':[1.1,2,3,1],
                  'b':[1,2,3,np.nan],
                  'c':['1','2','3',np.nan],
                  'd':[3,2,1,np.nan]})


nan_ints(df1,convert_strings=True,subset=['b','c'])
df1.info()

Will return the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
a    4 non-null float64
b    3 non-null Int64
c    3 non-null Int64
d    3 non-null float64
dtypes: Int64(2), float64(2)
memory usage: 216.0 bytes

if you are going to use this on every DataFrame you could add the function to a module and import it every time you want to use pandas. from my_module import nan_ints Then just use it with something like: nan_ints(pd.read_csv(path))

Note: Nullable integer data type is New in version 0.24.0. Here is the documentation.

Answer from braintho on Stack Overflow
🌐
GitHub
github.com › pandas-dev › pandas › issues › 27731
Object dtype for int64 and Int64 values in column · Issue #27731 · pandas-dev/pandas
August 3, 2019 - What is the difference between "Int64" and "int64"? After concat it has "object" dtype.
Author   Zoynels
🌐
NumPy
numpy.org › doc › stable › user › basics.types.html
Data types — NumPy v2.4 Manual
Note that, above, we could have used the Python float object as a dtype instead of numpy.float64. NumPy knows that int refers to numpy.int_, bool means numpy.bool, that float is numpy.float64 and complex is numpy.complex128. The other data-types do not have Python equivalents. Sometimes the conversion can overflow, for instance when converting a numpy.int64 value 300 to numpy.int8.
Discussions

python - Making Int64 the default integer dtype instead of standard int64 in pandas - Stack Overflow
I would like all my dataframes, regardless of whether they're built up from any one of the constructor overloads, whether they're derived from .read_csv(), .read_xlsx(), .read_sql(), or any other method, to use the new nullable Int64 datatype as the default dtype for all integers, rather than int64. More on stackoverflow.com
🌐 stackoverflow.com
What's up with uint64 in numpy?
OP, I was genuinely SHOOK when you revealed the int casting to floats. It seems like the maintainers consider this some unfixable quirk. shō ga nai ¯_(ツ)_/¯ More on reddit.com
🌐 r/Python
53
122
May 17, 2022
python - What is the difference between native int type and the numpy.int types? - Stack Overflow
Can you please help understand what are the main differences (if any) between the native int type and the numpy.int32 or numpy.int64 types? More on stackoverflow.com
🌐 stackoverflow.com
Keep getting TypeError: data type 'Int64' not understood
Are you sure it's that line? return dtype in ["Int64", "Float64", "boolean"] should simply return True or False, since it's testing if whatever is in the variable "dtype" matches one of those strings. I mean you didn't do from numpy import *. You didn't do that, right? Anyway, numpy accepts the string "int64" but not "Int64" as an argument for its dtype function and method. >>> import numpy as np >>> >>> np.dtype("int64") dtype('int64') >>> >>> np.int64 == np.dtype('int64') True >>> >>> np.dtype("Int64") Traceback (most recent call last): File "", line 1, in TypeError: data type 'Int64' not understood >>> More on reddit.com
🌐 r/learnpython
6
2
February 14, 2024
People also ask

When should I use float32 instead of float64?
In deep learning and GPU computing, float32 is standard — GPUs are optimised for it and it halves memory usage. For scientific computing where precision matters, stick with float64. The practical rule: if your data goes to a neural network, use float32.
🌐
thecodeforge.io
thecodeforge.io › home › python › numpy dtype and memory layout — float32, int64 and c vs f order
NumPy dtype and Memory Layout — float32, int64 and C vs F order ...
What happens when you mix dtypes in an operation?
NumPy upcasts to the more precise type — int32 + float32 gives float64, float32 + float64 gives float64. This is called type promotion. To avoid unexpected upcasting, be explicit: (a + b).astype(np.float32).
🌐
thecodeforge.io
thecodeforge.io › home › python › numpy dtype and memory layout — float32, int64 and c vs f order
NumPy dtype and Memory Layout — float32, int64 and C vs F order ...
🌐
Librarycarpentry
librarycarpentry.github.io › library-python › 03-data-types-and-format
Python for Librarians: Data Types and Formats
August 1, 2018 - In pandas, we can check the type ... “object” which in Pandas’ world is a string (characters). ... The type int64 tells us that Python is storing each value within this column as a 64 bit integer....
🌐
Pandas
pandas.pydata.org › docs › user_guide › integer_na.html
Nullable integer data type — pandas 3.0.2 documentation
In [26]: df.sum(numeric_only=True) Out[26]: A 3 B 5 dtype: Int64 In [27]: df.sum() Out[27]: A 3 B 5 C aab dtype: object In [28]: df.groupby("B").A.sum() Out[28]: B 1 3 3 0 Name: A, dtype: Int64 · arrays.IntegerArray uses pandas.NA as its scalar missing value. Slicing a single element that’s missing will return pandas.NA · In [29]: a = pd.array([1, None], dtype="Int64") In [30]: a[1] Out[30]: <NA>
Top answer
1 of 2
5

You could use a function like this:

def nan_ints(df, convert_strings=False, subset=None):
    types = ["int64", "float64"]
    if subset is None:
        subset = list(df)
    if convert_strings:
        types.append("object")
    for col in subset:
        if df[col].dtype in types:
            df[col] = (
                df[col].astype(float, errors="ignore").astype("Int64", errors="ignore")
            )
    return df

It iterates through each column and coverts it to an Int64 if it is a int. If it's a float it will convert to a Int64 only if all of the values in the column could be converted to ints other than the NaN's. I've given you the option to convert strings to Int64 as well with the convert_strings argument.

df1 = pd.DataFrame({'a':[1.1,2,3,1],
                  'b':[1,2,3,np.nan],
                  'c':['1','2','3',np.nan],
                  'd':[3,2,1,np.nan]})


nan_ints(df1,convert_strings=True,subset=['b','c'])
df1.info()

Will return the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
a    4 non-null float64
b    3 non-null Int64
c    3 non-null Int64
d    3 non-null float64
dtypes: Int64(2), float64(2)
memory usage: 216.0 bytes

if you are going to use this on every DataFrame you could add the function to a module and import it every time you want to use pandas. from my_module import nan_ints Then just use it with something like: nan_ints(pd.read_csv(path))

Note: Nullable integer data type is New in version 0.24.0. Here is the documentation.

2 of 2
2

I would put my money on monkey patching. The easiest way would be to monkey patch the DataFrame constructor. That should go something like this:

import pandas
pandas.DataFrame.__old__init__ = pandas.DataFrame.__init__
def new_init(self, data=None, index=None, columns=None, dtype=pd.Int64Dtype(), copy=False):
    self.__old__init__(data=data, index=index, columns=None, dtype=dtype, copy=copy)

pandas.DataFrame.__init__ = new_init

Of course, you run the risk of breaking the world. Good luck!

🌐
Reddit
reddit.com › r/python › what's up with uint64 in numpy?
r/Python on Reddit: What's up with uint64 in numpy?
May 17, 2022 -

The other day I was writing some code using Numpy uint64 arrays and -- this isn't a joke -- I encountered two bugs with uint64 arrays, one of which has been known for TEN YEARS. Consider the following scenario:

a = np.ones(10, dtype=np.uint64)
a[1] <<= 1

What do you think happens? If you guessed "the second element of a is left shifted from 1 to 2, you'd be wrong! What actually happens is that your program crashes because << isn't implemented for np.uint64, int.

So that's a critical bug that has not been fixed in 10 years, which is utterly buck wild, but at least it's kind of understandable how such a bug could come to be since <<= is probably implemented on top of << in this case and << probably isn't constrained by its output type, meaning it wouldn't be unreasonable np.uint64 << int to return an int. Of course, this is being far too charitable for a number of reasons: np.uint64 << int does not in fact return an int since it crashes, we unequivocally know what the output type should be because we're using a compound assignment operator, and this bug has been known for ten years.

The next bug might be more recent, but I suspect both of these bugs have been present from the get go. This bug is also much more shocking. Consider the following:

np.uint64(1) + 1

What do you think the answer is? Remember, numpy wraps the C type system and in C, integers are promoted to the integer type with higher precision in arithmetic contexts. So we might expect this to yield 2 if numpy copies the C rules, or we might also reasonably expect it to yield np.uint64(2) because even though np.uint64 + int may overflow sometimes, this is usually what users will want.

Dear reader, numpy does neither of these things. It instead does something that is objectively wrong and no sane person would ever defend: it returns np.float64. This is completely bananas. 64 bit floating point numbers cannot represent all 64 bit integers exactly because they only have 53 bits of precision, with the rest used for sign and exponent. For this reason, languages tend to not randumbly convert ints into floats because it destroys information and forces the use of a slower type. The main exception to this is division, but numpy does this for addition and frankly as a C wrapper it should have the C behavior for dividing two integers.

It's just mind boggling to me that numpy, a project with 29619 commits, 1305 contributors, and millions of users, and which is a de facto part of the python standard library, could not only have fundamental bugs with how it handles wrapping C integer types, but that these bugs could be known for over a decade and not just remain unfixed but implicitly be something that will never be fixed. These bugs pretty much make np.uint64 unusable. While you can work around them by wrapping your other parameter in a constructor call np.uint64(b), this is pretty brittle and it's easy to envision a scenario in which a custom function that expects a python int is given a np.uint64 and produces a floating point number or crash.

Top answer
1 of 3
62

There are several major differences. The first is that python integers are flexible-sized (at least in python 3.x). This means they can grow to accommodate any number of any size (within memory constraints, of course). The numpy integers, on the other hand, are fixed-sized. This means there is a maximum value they can hold. This is defined by the number of bytes in the integer (int32 vs. int64), with more bytes holding larger numbers, as well as whether the number is signed or unsigned (int32 vs. uint32), with unsigned being able to hold larger numbers but not able to hold negative number.

So, you might ask, why use the fixed-sized integers? The reason is that modern processors have built-in tools for doing math on fixed-size integers, so calculations on those are much, much, much faster. In fact, python uses fixed-sized integers behind-the-scenes when the number is small enough, only switching to the slower, flexible-sized integers when the number gets too large.

Another advantage of fixed-sized values is that they can be placed into consistently-sized adjacent memory blocks of the same type. This is the format that numpy arrays use to store data. The libraries that numpy relies on are able to do extremely fast computations on data in this format, in fact modern CPUs have built-in features for accelerating this sort of computation. With the variable-sized python integers, this sort of computation is impossible because there is no way to say how big the blocks should be and no consistentcy in the data format.

That being said, numpy is actually able to make arrays of python integers. But rather than arrays containing the values, instead they are arrays containing references to other pieces of memory holding the actual python integers. This cannot be accelerated in the same way, so even if all the python integers fit within the fixed integer size, it still won't be accelerated.

None of this is the case with Python 2. In Python 2, Python integers are fixed integers and thus can be directly translated into numpy integers. For variable-length integers, Python 2 had the long type. But this was confusing and it was decided this confusion wasn't worth the performance gains, especially when people who need performance would be using numpy or something like it anyway.

2 of 3
36

Another way to look at the differences is to ask what methods do the 2 kinds of objects have.

In Ipython I can use tab complete to look at methods:

In [1277]: x=123; y=np.int32(123)

int methods and attributes:

In [1278]: x.<tab>
x.bit_length   x.denominator  x.imag         x.numerator    x.to_bytes
x.conjugate    x.from_bytes   x.real         

int 'operators'

In [1278]: x.__<tab>
x.__abs__           x.__init__          x.__rlshift__
x.__add__           x.__int__           x.__rmod__
x.__and__           x.__invert__        x.__rmul__
x.__bool__          x.__le__            x.__ror__
...
x.__gt__            x.__reduce_ex__     x.__xor__
x.__hash__          x.__repr__          
x.__index__         x.__rfloordiv__     

np.int32 methods and attributes (or properties). Some of the same, but a lot more, basically all the ndarray ones:

In [1278]: y.<tab>
y.T             y.denominator   y.ndim          y.size
y.all           y.diagonal      y.newbyteorder  y.sort
y.any           y.dtype         y.nonzero       y.squeeze   
...
y.cumsum        y.min           y.setflags      
y.data          y.nbytes        y.shape   

the y.__ methods look a lot like the int ones. They can do the same math.

In [1278]: y.__<tab>
y.__abs__              y.__getitem__          y.__reduce_ex__
y.__add__              y.__gt__               y.__repr__
...
y.__format__           y.__rand__             y.__subclasshook__
y.__ge__               y.__rdivmod__          y.__truediv__
y.__getattribute__     y.__reduce__           y.__xor__

y is in many ways the same as a 0d array. Not identical, but close.

In [1281]: z=np.array(123,dtype=np.int32)

np.int32 is what I get when I index an array of that type:

In [1300]: A=np.array([0,123,3])

In [1301]: A[1]
Out[1301]: 123

In [1302]: type(A[1])
Out[1302]: numpy.int32

I have to use item to remove all of the numpy wrapping.

In [1303]: type(A[1].item())
Out[1303]: int

As a numpy user, an np.int32 is an int with a numpy wrapper. Or conversely a single element of an ndarray. Usually I don't pay attention as to whether A[0] is giving me the 'native' int or the numpy equivalent. In contrast to some new users, I rarely use np.int32(123); I would use np.array(123) instead.

A = np.array([1,123,0], np.int32)

does not contain 3 np.int32 objects. Rather its data buffer is 3*4=12 bytes long. It's the array overhead that interprets it as 3 ints in a 1d. And view shows me the same databuffer with different interpretations:

In [1307]: A.view(np.int16)
Out[1307]: array([  1,   0, 123,   0,   0,   0], dtype=int16)

In [1310]: A.view('S4')
Out[1310]: array([b'\x01', b'{', b''],   dtype='|S4')

It's only when I index a single element that I get a np.int32 object.

The list L=[1, 123, 0] is different; it's a list of pointers - pointers to int objects else where in memory. Similarly for a dtype=object array.

Find elsewhere
🌐
Data-apis
data-apis.org › array-api › 2022.12 › API_specification › data_types.html
Data Types — Python array API standard 2022.12 documentation
The default array index data type may be int32 on 32-bit platforms, but the default should be int64 otherwise.
🌐
Thecodeforge
thecodeforge.io › home › python › numpy dtype and memory layout — float32, int64 and c vs f order
NumPy dtype and Memory Layout — float32, int64 and C vs F order | TheCodeForge
March 16, 2026 - NumPy arrays have a fixed dtype — the data type of every element. Default is float64 (8 bytes) for floating point and int64 (8 bytes) for integers. Use float32 to halve memory usage in ML applications.
🌐
DataCamp
campus.datacamp.com › courses › preprocessing-for-machine-learning-in-python › introduction-to-data-preprocessing
Working with data types | Python
The object type is what pandas uses to refer to a column that consists of string values or contains a mixture of types. int64 and float64 are equivalent to the Python integer and float types, where the 64 refers to the allocation of memory alloted for storing the values, in this case, the number ...
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.api.types.is_int64_dtype.html
pandas.api.types.is_int64_dtype — pandas 3.0.1 documentation
Check whether an array-like or dtype is of the object dtype. ... Numpy’s 64-bit integer type. ... Depending on system architecture, the return value of is_int64_dtype( int) will be True if the OS uses 64-bit integers and False if the OS uses 32-bit integers.
🌐
Python Data Science Handbook
jakevdp.github.io › PythonDataScienceHandbook › 02.01-understanding-data-types.html
Understanding Data Types in Python | Python Data Science Handbook
If we want to explicitly set the data type of the resulting array, we can use the dtype keyword: In [10]: np.array([1, 2, 3, 4], dtype='float32') Out[10]: array([ 1., 2., 3., 4.], dtype=float32) Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists: In [11]: # nested lists result in multi-dimensional arrays np.array([range(i, i + 3) for i in [2, 4, 6]]) Out[11]: array([[2, 3, 4], [4, 5, 6], [6, 7, 8]]) The inner lists are treated as rows of the resulting two-dimensional array.
🌐
Data-apis
data-apis.org › array-api › 2024.12 › API_specification › data_types.html
Data Types — Python array API standard 2024.12 documentation
The default array index data type may be int32 on 32-bit platforms, but the default should be int64 otherwise.
🌐
Reddit
reddit.com › r/learnpython › keep getting typeerror: data type 'int64' not understood
r/learnpython on Reddit: Keep getting TypeError: data type 'Int64' not understood
February 14, 2024 -

I have a rather simple dataframe I want to pass to PyCaret.

Datetime as index, then TOTAL which is is a float64 according to df.dtypes.

    from pycaret.time_series import *

init setup

s = setup(df, target="TOTAL", fh = 12, session_id = 123)

It points to this part:

def _is_nullable_numeric(dtype):

----> 9 return dtype in ["Int64", "Float64", "boolean"]

TypeError: data type 'Int64' not understood

I assume it´s related to Numpy, I have tried to set the dtype with .astype() etc, but I am still getting the same error. I have tried setting TOTAL as a float, int32, still same issue.

What gives?

🌐
Apache TVM
discuss.tvm.apache.org › t › int64-vs-int32-dtype-error › 8555
Int64 vs int32 dtype error - Apache TVM Discuss
December 1, 2020 - I am getting int32 vs int64 error with the following codebase. This is related to int64 indices. And the bug lies in between tensorize and codegen. Does anyone has ideas? @zhiics @kevinthesun @giuseros import tvm from tvm import relay x = relay.var("x", shape=(1, 512, tvm.tir.const(7, 'int64'), tvm.tir.const(7, 'int64')), dtype="int8") y = relay.var("y", shape=(2048, 512, 1, 1), dtype="int8") out = relay.qnn.op.conv2d(x, y, relay.const(-128, 'int32'), ...
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.Int64Dtype.html
pandas.Int64Dtype — pandas 3.0.1 documentation - PyData |
Int64Dtype · 64-bit nullable integer type. Examples · For Int8Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.Int8Dtype()) >>> ser.dtype Int8Dtype() For Int16Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.Int16Dtype()) >>> ser.dtype Int16Dtype() For Int32Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.Int32Dtype()) >>> ser.dtype Int32Dtype() For Int64Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.Int64Dtype()) >>> ser.dtype Int64Dtype() For UInt8Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.UInt8Dtype()) >>> ser.dtype UInt8Dtype() For UInt16Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.UInt16Dtype()) >>> ser.dtype UInt16Dtype() For UInt32Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.UInt32Dtype()) >>> ser.dtype UInt32Dtype() For UInt64Dtype: >>> ser = pd.Series([2, pd.NA], dtype=pd.UInt64Dtype()) >>> ser.dtype UInt64Dtype() On this page
🌐
Medium
medium.com › @pritioli › essential-numpy-data-types-a-must-know-guide-ad3657f708b7
Essential NumPy Data Types: A Must-Know Guide | by Code & Cognition | Medium
December 22, 2023 - On most modern systems, the default data type for integers in NumPy is int64 )is converted to float64 when specifying float (Python’s default floating-point numbers are typically 64-bit).
🌐
Practical Business Python
pbpython.com › pandas_dtypes.html
Overview of Pandas Data Types - Practical Business Python
If we want to see what all the data types are in a dataframe, use df.dtypes ... Customer Number float64 Customer Name object 2016 object 2017 object Percent Growth object Jan Units object Month int64 Day int64 Year int64 Active object dtype: object