df = df.replace({np.nan: None})
Note: For pandas versions <1.4, this changes the dtype of all affected columns to object.
To avoid that, use this syntax instead:
df = df.replace(np.nan, None)
Note 2: If you don't want to import numpy, np.nan can be replaced with native float('nan'):
df = df.replace({float('nan'): None})
Credit goes to this guy here on this Github issue, Killian Huyghe's comment and Matt's answer.
Answer from EliadL on Stack Overflowdf = df.replace({np.nan: None})
Note: For pandas versions <1.4, this changes the dtype of all affected columns to object.
To avoid that, use this syntax instead:
df = df.replace(np.nan, None)
Note 2: If you don't want to import numpy, np.nan can be replaced with native float('nan'):
df = df.replace({float('nan'): None})
Credit goes to this guy here on this Github issue, Killian Huyghe's comment and Matt's answer.
For pandas > 1.3.0 see this answer.
@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:
df1 = df.where(pd.notnull(df), None)
Note: this changes the dtype of all columns to object.
Example:
In [1]: df = pd.DataFrame([1, np.nan])
In [2]: df
Out[2]:
0
0 1
1 NaN
In [3]: df1 = df.where(pd.notnull(df), None)
In [4]: df1
Out[4]:
0
0 1
1 None
Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:
df1 = df.astype(object).replace(np.nan, 'None')
Unfortunately neither this, nor using replace, works with None see this (closed) issue.
As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.
However, in this specific case it seems you do (at least at the time of this answer).
Randomly replace values in a numpy array
# The dataset
data = pd.read_csv('iris.data')
mat = data.iloc[:,:4].as_matrix()
Set the number of values to replace. For example 20%:
# Edit: changed len(mat) for mat.size
prop = int(mat.size * 0.2)
Randomly choose indices of the numpy array:
i = [random.choice(range(mat.shape[0])) for _ in range(prop)]
j = [random.choice(range(mat.shape[1])) for _ in range(prop)]
Change values with NaN
mat[i,j] = np.NaN
Dropout for any array dimension
Another way to do that with an array of more than 2 dimensions would be to use the numpy.put() function:
import numpy as np
import random
from sklearn import datasets
data = datasets.load_iris()['data']
def dropout(a, percent):
# create a copy
mat = a.copy()
# number of values to replace
prop = int(mat.size * percent)
# indices to mask
mask = random.sample(range(mat.size), prop)
# replace with NaN
np.put(mat, mask, [np.NaN]*len(mask))
return mat
This function returns a modified array:
modified = dropout(data, 0.2)
We can verify that the correct number of values have been modified:
np.sum(np.isnan(modified))/float(data.size)
[out]:
0.2
Depending on the data structure you are keeping the values there might be different solutions.
If you are using Numpy arrays, you can employ np.insert method which is referred here:
import numpy as np
a = np.arrray([(122.0, 1.0, -47.0), (123.0, 1.0, -47.0), (125.0, 1.0, -44.0)]))
np.insert(a, 2, np.nan, axis=0)
array([[ 122., 1., -47.],
[ 123., 1., -47.],
[ nan, nan, nan],
[ 125., 1., -44.]])
If you are using Pandas you can use instance method replace on the objects of the DataFrames as referred here:
In [106]:
df.replace('N/A',np.NaN)
Out[106]:
x y
0 10 12
1 50 11
2 18 NaN
3 32 13
4 47 15
5 20 NaN
In the code above, the first argument can be your arbitrary input which you want to change.
A[A==NDV]=numpy.nan
A==NDV will produce a boolean array that can be used as an index for A
You can also use np.where to replace a number with NaN.
arr = np.where(arr==NDV, np.nan, arr)
For example, the following result can be obtained via
arr = np.array([[1, 1, 2], [2, 0, 1]])
arr = np.where(arr==1, np.nan, arr)

This creates a new copy (unlike A[A==NDV]=np.nan) but in some cases that could be useful. For example, if the array was initially an int dtype, it will have to converted into a float array anyway (because replacing values with NaN won't work otherwise) and np.where can handle that.