replace a value with nan in pandas

How to replace a value in pandas, with NaN?

stackoverflow.com › questions › 29247712 › how-to-replace-a-value-in-pandas-with-nan

You can replace this just for that column using replace:

df['workclass'].replace('?', np.NaN)

or for the whole df:

df.replace('?', np.NaN)

UPDATE

OK I figured out your problem, by default if you don't pass a separator character then read_csv will use commas ',' as the separator.

Your data and in particular one example where you have a problematic line:

54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K

has in fact a comma and a space as the separator so when you passed the na_value=['?'] this didn't match because all your values have a space character in front of them all which you can't observe.

if you change your line to this:

rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])

then you should find that it all works:

27      54               NaN  180211  Some-college             10

Answer from EdChum on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.replace.html

pandas.DataFrame.replace — pandas 3.0.2 documentation

For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in. For a DataFrame nested dictionaries, e.g., {'a': {'b': np.nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN.

Stack Overflow

stackoverflow.com › questions › 29247712 › how-to-replace-a-value-in-pandas-with-nan

python - How to replace a value in pandas, with NaN? - Stack Overflow

Top answer

1 of 8

You can replace this just for that column using replace:

df['workclass'].replace('?', np.NaN)

or for the whole df:

df.replace('?', np.NaN)

UPDATE

OK I figured out your problem, by default if you don't pass a separator character then read_csv will use commas ',' as the separator.

Your data and in particular one example where you have a problematic line:

54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K

has in fact a comma and a space as the separator so when you passed the na_value=['?'] this didn't match because all your values have a space character in front of them all which you can't observe.

if you change your line to this:

rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])

then you should find that it all works:

27      54               NaN  180211  Some-college             10

2 of 8

Use numpy.nan

Numpy - Replace a number with NaN

import numpy as np
df.applymap(lambda x: np.nan if x == '?' else x)

Discussions

python - How to replace NaN values in a dataframe column - Stack Overflow

I agree that fillna is the canonical ... to other pandas methods, namely where. 2024-04-04T15:27:07.59Z+00:00 ... Sorry I missed typed earlier, I meant - < df = df.replace(np.nan, 0) >> 2024-04-04T18:11:53.107Z+00:00 ... Using lambda expression, it is also possible to replace NaN with ... More on stackoverflow.com

stackoverflow.com

I need to replace NaN in one column with value for other col

I've seen this come up before. You want to use np.where. data['Grade'] = np.where(data['Grade'].isna(),data['Score'],data['Grade']) here's an example that sets null values in grade to values in score, and if it's not null, leaves the current value. More on reddit.com

r/learnpython

July 15, 2021

python - Replace invalid values with None in Pandas DataFrame - Stack Overflow

You can use df.replace('pre', 'post') ... want to replace with None value, which if you try, you get a strange result. ... Since I want to pour this data frame into MySQL database, I can't put NaN values into any element in my data frame and instead want to put None. Surely, you can first change '-' to NaN and then convert NaN to None, but I want to know why the dataframe acts in such a terrible way. Tested on pandas 0.12.0 dev ... More on stackoverflow.com

stackoverflow.com

python - How to replace a range of values with NaN in Pandas data-frame? - Stack Overflow

I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN? More on stackoverflow.com

stackoverflow.com

Videos

02:50

YouTube

Replace NaN Values by Column Mean of pandas DataFrame in Python ...

Pandas Replace Dash with NaN - YouTube

October 2, 2023

youtube.com

Replace Blank Values by NaN in pandas DataFrame in ...

02:57

YouTube

Replace NaN by Empty String in pandas DataFrame in Python (Example) ...

December 2, 2022

05:41

YouTube

How to replace NaN with 0 or any value using fillna method in python ...

April 12, 2020

04:43

YouTube

Pandas Change NaN to Zeros in DataFrame - YouTube

skytowner.com › explore › replacing_values_with_nans_in_pandas_dataframe

Replacing values with NaNs in Pandas DataFrame

To replace values with NaNs, use the Pandas DataFrame's replace(~) method.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.fillna.html

pandas.DataFrame.fillna — pandas 3.0.2 documentation

Only replace the first NaN element. >>> df.fillna(value=values, limit=1) A B C D 0 0.0 2.0 2.0 0.0 1 3.0 4.0 NaN 1.0 2 NaN 1.0 NaN 3.0 3 NaN 3.0 NaN 4.0

Stack Overflow

stackoverflow.com › questions › 13295735 › how-to-replace-nan-values-in-a-dataframe-column

python - How to replace NaN values in a dataframe column - Stack Overflow

Top answer

1 of 16

1020

DataFrame.fillna() or Series.fillna() will do this for you.

Example:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column.

In [12]: df[1] = df[1].fillna(0)

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

Or you can use the built in column-specific functionality:

df = df.fillna({1: 0})

2 of 16

200

It is not guaranteed that the slicing returns a view or a copy. You can do

df['column'] = df['column'].fillna(value)

reddit.com › r/learnpython › i need to replace nan in one column with value for other col

r/learnpython on Reddit: I need to replace NaN in one column with value for other col

July 15, 2021 -

I've been working on learning Python and for something to code, I picked some VBA that I had.

In VBA:

     If Cells(I, "C").Value <> "" And Cells(I, "B").Value = "" Then
       Cells(I, "B").Value = Cells(I, "C").Value
     End If

It simply checks if colC is not Null and colB is Null, then replaces colB with the value from colC.

I can read in the csv file, I was able to select and delete some rows I didn't want, but I can't seem to get the syntax right for this...

Top answer

1 of 5

2 of 5

You should use the fillna method df["B"] = df["B"].fillna(df["C"])

Stack Overflow

stackoverflow.com › questions › 17097236 › replace-invalid-values-with-none-in-pandas-dataframe

python - Replace invalid values with None in Pandas DataFrame - Stack Overflow

Top answer

1 of 10

149

Actually in later versions of pandas this will give a TypeError:

df.replace('-', None)
TypeError: If "to_replace" and "value" are both None then regex must be a mapping

You can do it by passing either a list or a dictionary:

In [11]: df.replace('-', df.replace(['-'], [None]) # or .replace('-', {0: None})
Out[11]:
      0
0  None
1     3
2     2
3     5
4     1
5    -5
6    -1
7  None
8     9

But I recommend using NaNs rather than None:

In [12]: df.replace('-', np.nan)
Out[12]:
     0
0  NaN
1    3
2    2
3    5
4    1
5   -5
6   -1
7  NaN
8    9

2 of 10

I prefer the solution using replace with a dict because of its simplicity and elegance:

df.replace({'-': None})

You can also have more replacements:

df.replace({'-': None, 'None': None})

And even for larger replacements, it is always obvious and clear what is replaced by what - which is way harder for long lists, in my opinion.

Find elsewhere

Google Bing Mojeek

Saturn Cloud

saturncloud.io › blog › how-to-replace-a-string-value-with-nan-in-pandas-data-frame-python

How to Replace a String Value with NaN in Pandas Data Frame Python | Saturn Cloud Blog

November 14, 2023 - We can replace a string value with NaN in Pandas data frame using the replace() method. The replace() method takes a dictionary of values to be replaced as keys and their corresponding replacement values as values.

Moonbooks

moonbooks.org › Articles › How-to-replace-NaN-values-in-a-pandas-dataframe-

How to replace NaN values in a pandas dataframe ?

August 25, 2022 - import pandas as pd import numpy as np data = {'Name':['Ben','Anna','Zoe','Tom','John','Steve'], 'Age':[20,27,43,30,np.nan,np.nan], 'Gender':['M',np.nan,'F','M','M','M']} df = pd.DataFrame(data) ... for index, value in df.dtypes.items(): if value == 'object': df[index] = df[index].fillna('') else: df[index] = df[index].fillna(0) print(df)

Note.nkmk.me

note.nkmk.me › home › python › pandas

pandas: Replace NaN (missing values) with fillna() | note.nkmk.me

February 1, 2024 - In pandas, the fillna() method allows you to replace NaN values in a DataFrame or Series with a specific value.

Stack Overflow

stackoverflow.com › questions › 40159763 › how-to-replace-a-range-of-values-with-nan-in-pandas-data-frame

python - How to replace a range of values with NaN in Pandas data-frame? - Stack Overflow

Top answer

1 of 2

dataframe

You can use pd.DataFrame.mask:

df.mask((df >= -200) & (df <= -100), inplace=True)

This method replaces elements identified by True values in a Boolean array with a specified value, defaulting to NaN if a value is not specified.

Equivalently, use pd.DataFrame.where with the reverse condition:

df.where((df < -200) | (df > -100), inplace=True)

series

As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A'], you can use pd.Series.mask with pd.Series.between:

df['A'].mask(df['A'].between(-200, -100), inplace=True)

For chaining, note inplace=False by default, so you can also use:

df['A'] = df['A'].mask(df['A'].between(-200, -100))

2 of 2

You can do it this way:

In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))

In [146]: df
Out[146]:
     a    b    c
0 -188  -63 -228
1  -59  -70  -66
2 -110   39 -146
3  -67 -228 -232
4  -22 -180 -140
5 -191 -136 -188
6  -59  -30 -128
7 -201 -244 -195
8 -248  -30  -25
9   11    1   20

In [148]: df[(df>=-200) & (df<=-100)] = np.nan

In [149]: df
Out[149]:
       a      b      c
0    NaN  -63.0 -228.0
1  -59.0  -70.0  -66.0
2    NaN   39.0    NaN
3  -67.0 -228.0 -232.0
4  -22.0    NaN    NaN
5    NaN    NaN    NaN
6  -59.0  -30.0    NaN
7 -201.0 -244.0    NaN
8 -248.0  -30.0  -25.0
9   11.0    1.0   20.0

Stack Exchange

datascience.stackexchange.com › questions › 30245 › is-there-a-way-to-replace-existing-values-with-nan

machine learning - Is there a way to replace existing values with NaN - Data Science Stack Exchange

Top answer

1 of 3

Randomly replace values in a `numpy` array

# The dataset
data = pd.read_csv('iris.data')
mat = data.iloc[:,:4].as_matrix()

Set the number of values to replace. For example 20%:

# Edit: changed len(mat) for mat.size
prop = int(mat.size * 0.2)

Randomly choose indices of the numpy array:

i = [random.choice(range(mat.shape[0])) for _ in range(prop)]
j = [random.choice(range(mat.shape[1])) for _ in range(prop)]

Change values with NaN

mat[i,j] = np.NaN

Dropout for any array dimension

Another way to do that with an array of more than 2 dimensions would be to use the numpy.put() function:

import numpy as np
import random
from sklearn import datasets

data = datasets.load_iris()['data']

def dropout(a, percent):
    # create a copy
    mat = a.copy()
    # number of values to replace
    prop = int(mat.size * percent)
    # indices to mask
    mask = random.sample(range(mat.size), prop)
    # replace with NaN
    np.put(mat, mask, [np.NaN]*len(mask))
    return mat

This function returns a modified array:

modified = dropout(data, 0.2)

We can verify that the correct number of values have been modified:

np.sum(np.isnan(modified))/float(data.size)

[out]:

0.2

2 of 3

Depending on the data structure you are keeping the values there might be different solutions.

If you are using Numpy arrays, you can employ np.insert method which is referred here:

import numpy as np
a = np.arrray([(122.0, 1.0, -47.0), (123.0, 1.0, -47.0), (125.0, 1.0, -44.0)]))
np.insert(a, 2, np.nan, axis=0)
array([[ 122.,    1.,  -47.],
       [ 123.,    1.,  -47.],
       [  nan,   nan,   nan],
       [ 125.,    1.,  -44.]])

If you are using Pandas you can use instance method replace on the objects of the DataFrames as referred here:

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

In the code above, the first argument can be your arbitrary input which you want to change.

reddit.com › r/learnpython › pandas .fillna() replacing every value with nan instead of replacing only nan values.

r/learnpython on Reddit: Pandas .fillna() replacing every value with NaN instead of replacing only NaN values.

October 10, 2021 -

Hello, I'm currently attempting the Kaggle housing prices challenge seen in this link. https://www.kaggle.com/c/house-prices-advanced-regression-techniques.

I have a concatenated table which combines the training and testing tables into one in order to handle all missing values at once.

combine_df = pd.concat([train, test], axis=0, sort=False)

combine_df.drop(['Id', 'SalePrice'], axis=1, inplace=True)

I then attempt to fill all NaN categorical values with the following lines below. Where null_columns is a list of columns that I want to replace NaN values.

combine_df[null_columns] = combine_df[null_columns].fillna('0', inplace=True)

However, this line changes every value in the columns into a NaN value instead of replacing NaN values with '0' as seen in the output below which shows the amount of NaN values for each column.

BsmtQual        2919
BsmtCond        2919
BsmtExposure    2919
BsmtFinType1    2919
BsmtFinType2    2919
GarageType      2919
GarageFinish    2919
GarageQual      2919
GarageCond      2919

I've tried using .replace, a lambda function, and also using .loc and all of them end up doing the same thing as the code above. What is going on with my code that causes this? I've also been unable to find anything regarding this on stack overflow. Any help would be greatly appreciated.

Top answer

1 of 1

You should look up the definition of inplace. Try to set it to false instead. (in place methods don't return anything -> None)

TechOverflow

techoverflow.net › 2021 › 04 › 24 › how-to-replace-pandas-values-by-nan-by-threshold

How to replace pandas values by NaN by threshold | TechOverflow

December 24, 2025 - When processing pandas datasets, often you need to remove values above or below a given threshold from a dataset. One way to “remove” values from a dataset is to replace them by NaN (not a number) values which are typically treated as “missing” values. For example: In order to replace values of the xcolumn by NaNwhere the x column is< 0.75 in a DataFrame df, use this snippet:

Stack Exchange

datascience.stackexchange.com › questions › 90029 › what-is-the-best-solution-to-replace-nan-values

python - What is the best solution to replace NaN values? - Data Science Stack Exchange

Top answer

1 of 3

There is no one size fits all. So you cannot assume that one technique will work the best for all the datasets.

That being said the goal of imputing missing values is to ensure that after imputation, the distribution of the column does not change. So if you have a feature that follows a left skewed distribution, then after imputation the distribution should not change much.

Following this logic use multiple imputation techniques to see which one retains the original distribution of the feature you are imputing the values for.

2 of 3

Mean is suitable when you have a Gaussian distribution of continuous data. Mode is suitable when your column has categorical data and one category is clearly more like to occur than others. Median is better when your data has outliers which can skew the mean. You can opt to remove rows with missing values if the numbers of rows is very small compared to the total number of rows. There are other techniques which can be useful depending on the situation like training a model to fill missing values, MICE (for missing at Random type data), KNNImputer and LOCF.

Alternatively, if you have a significant number of missing values, you can see how the results are different when you impute missing values and when you ignore rows with missing values.

Statology

statology.org › home › pandas: how to replace nan values with string

Pandas: How to Replace NaN Values with String

November 1, 2021 - This tutorial explains how to replace NaN values in a pandas DataFrame with a specific string, including several examples.

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas replace nan values with zero in a column

Pandas Replace NaN Values with Zero in a Column - Spark By {Examples}

June 26, 2025 - You can use the pandas.DataFrame.fillna() or pandas.DataFrame.replace() methods to replace all NaN or None values in an entire DataFrame with zeros (0).

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas replace blank values (empty) with nan

Pandas Replace Blank Values (Empty) with NaN - Spark By {Examples}

June 26, 2025 - In pandas, you can replace blank values (empty strings) with NaN using the replace() method. In this article, I will explain the replacing blank values or

Saturn Cloud

saturncloud.io › blog › how-to-replace-a-value-in-pandas-with-nan

How to Replace a Value in Pandas with NaN | Saturn Cloud Blog

August 25, 2023 - Replacing a value with NaN in pandas is a common task in data analysis and manipulation. In this article, we discussed three different methods for replacing a value with NaN in pandas: loc, replace, and where. These methods provide different ways to achieve the same result, depending on the use case.

Bobby Hadz

bobbyhadz.com › blog › replace-none-with-nan-in-pandas

How to replace None with NaN in Pandas DataFrame | bobbyhadz

April 11, 2024 - You can use the pandas.DataFrame.fillna() method to replace None with NaN in a pandas DataFrame. The method takes a value argument that is used to fill the holes.

Videos

dataframe

series

Randomly replace values in a numpy array

Dropout for any array dimension

Randomly replace values in a `numpy` array