Brave Search

numpy corrcoef - compute correlation matrix while ignoring missing data

stackoverflow.com › questions › 31619578 › numpy-corrcoef-compute-correlation-matrix-while-ignoring-missing-data

One of the main features of pandas is being NaN friendly. To calculate correlation matrix, simply call df_counties.corr(). Below is an example to demonstrate df.corr() is NaN tolerant whereas np.corrcoef is not.

import pandas as pd
import numpy as np

# data
# ==============================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE'))
df[df < 0] = np.nan
df

         A       B       C       D       E
0   1.7641  0.4002  0.9787  2.2409  1.8676
1      NaN  0.9501     NaN     NaN  0.4106
2   0.1440  1.4543  0.7610  0.1217  0.4439
3   0.3337  1.4941     NaN  0.3131     NaN
4      NaN  0.6536  0.8644     NaN  2.2698
5      NaN  0.0458     NaN  1.5328  1.4694
6   0.1549  0.3782     NaN     NaN     NaN
7   0.1563  1.2303  1.2024     NaN     NaN
8      NaN     NaN     NaN  1.9508     NaN
9      NaN     NaN  0.7775     NaN     NaN
..     ...     ...     ...     ...     ...
90     NaN  0.8202  0.4631  0.2791  0.3389
91  2.0210     NaN     NaN  0.1993     NaN
92     NaN     NaN     NaN  0.1813     NaN
93  2.4125     NaN     NaN     NaN  0.2515
94     NaN     NaN     NaN     NaN  1.7389
95  0.9944  1.3191     NaN  1.1286  0.4960
96  0.7714  1.0294     NaN     NaN  0.8626
97     NaN  1.5133  0.5531     NaN  0.2205
98     NaN     NaN  1.1003  1.2980  2.6962
99     NaN     NaN     NaN     NaN     NaN

[100 rows x 5 columns]

# calculations
# ================================
df.corr()

        A       B       C       D       E
A  1.0000  0.2718  0.2678  0.2822  0.1016
B  0.2718  1.0000 -0.0692  0.1736 -0.1432
C  0.2678 -0.0692  1.0000 -0.3392  0.0012
D  0.2822  0.1736 -0.3392  1.0000  0.1562
E  0.1016 -0.1432  0.0012  0.1562  1.0000


np.corrcoef(df, rowvar=False)

array([[ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan]])

Answer from Jianxun Li on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 31619578 › numpy-corrcoef-compute-correlation-matrix-while-ignoring-missing-data

python - numpy corrcoef - compute correlation matrix while ignoring missing data - Stack Overflow

Top answer

1 of 3

40

One of the main features of pandas is being NaN friendly. To calculate correlation matrix, simply call df_counties.corr(). Below is an example to demonstrate df.corr() is NaN tolerant whereas np.corrcoef is not.

import pandas as pd
import numpy as np

# data
# ==============================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE'))
df[df < 0] = np.nan
df

         A       B       C       D       E
0   1.7641  0.4002  0.9787  2.2409  1.8676
1      NaN  0.9501     NaN     NaN  0.4106
2   0.1440  1.4543  0.7610  0.1217  0.4439
3   0.3337  1.4941     NaN  0.3131     NaN
4      NaN  0.6536  0.8644     NaN  2.2698
5      NaN  0.0458     NaN  1.5328  1.4694
6   0.1549  0.3782     NaN     NaN     NaN
7   0.1563  1.2303  1.2024     NaN     NaN
8      NaN     NaN     NaN  1.9508     NaN
9      NaN     NaN  0.7775     NaN     NaN
..     ...     ...     ...     ...     ...
90     NaN  0.8202  0.4631  0.2791  0.3389
91  2.0210     NaN     NaN  0.1993     NaN
92     NaN     NaN     NaN  0.1813     NaN
93  2.4125     NaN     NaN     NaN  0.2515
94     NaN     NaN     NaN     NaN  1.7389
95  0.9944  1.3191     NaN  1.1286  0.4960
96  0.7714  1.0294     NaN     NaN  0.8626
97     NaN  1.5133  0.5531     NaN  0.2205
98     NaN     NaN  1.1003  1.2980  2.6962
99     NaN     NaN     NaN     NaN     NaN

[100 rows x 5 columns]

# calculations
# ================================
df.corr()

        A       B       C       D       E
A  1.0000  0.2718  0.2678  0.2822  0.1016
B  0.2718  1.0000 -0.0692  0.1736 -0.1432
C  0.2678 -0.0692  1.0000 -0.3392  0.0012
D  0.2822  0.1736 -0.3392  1.0000  0.1562
E  0.1016 -0.1432  0.0012  0.1562  1.0000


np.corrcoef(df, rowvar=False)

array([[ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan]])

2 of 3

31

This will work, using the masked array numpy module:

import numpy as np
import numpy.ma as ma

A = [1, 2, 3, 4, 5, np.NaN]
B = [2, 3, 4, 5.25, np.NaN, 100]

print(ma.corrcoef(ma.masked_invalid(A), ma.masked_invalid(B)))

It outputs:

[[1.0 0.99838143945703]
 [0.99838143945703 1.0]]

Read more here: https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

NumPy

numpy.org › doc › 2.1 › reference › generated › numpy.ma.corrcoef.html

numpy.ma.corrcoef — NumPy v2.1 Manual

These arguments had no effect on the return values of the function and can be safely ignored in this and previous versions of numpy. ... >>> import numpy as np >>> x = np.ma.array([[0, 1], [1, 1]], mask=[0, 1, 0, 1]) >>> np.ma.corrcoef(x) masked_array( data=[[--, --], [--, --]], mask=[[ True, ...

Discussions

numpy.corrcoef RuntimeWarning and NaN (wrong output)

I have found a weird behaviour for numpy.corrcoef . I reproduce with debian's squeeze python 2.6, on a compiled 2.7 python and in anaconda's 2.7 and 3.3 pythons on MacOSX. The bug is shown ... More on github.com

github.com

5

September 18, 2014

Why nan when calculating correlation?

List 2 is comprised of completely identical elements. Its standard deviation is therefore zero. My stats is rusty, but according to Wikipedia, the correlation coefficient is calculated by dividing by the SDs, and you can't divide by zero. More on reddit.com

r/learnpython

2

January 29, 2023

How to ignore NaN values in the CORR Function?

Hi Guys, My problem is the opposite of the other problems reported here between NAN values in CORR function. If I have a matrix A = [1;2;3;4] and a matrix B = [3;5;7;8], the correlation corr(... More on mathworks.com

mathworks.com

1

January 26, 2018

getting a NaN in correlation coefficient

Hi, i have a simple problem which unfortunately i am unable to understand. I have matrices and i am trying to calculate correlation coefficient between two variables. A simple example from my code... More on nl.mathworks.com

nl.mathworks.com

2

1

February 19, 2020

GitHub

github.com › numpy › numpy › issues › 14414

[Feature Request]: Nan Values for correlation and cross correlation · Issue #14414 · numpy/numpy

September 3, 2019 - In the case of corrcoef it is straight forward and can be solved by ignoring the nan values of both arrays, however in the convolution setting, it might have different lag on the two series which would create unwanted results.

Author numpy

NumPy

numpy.org › doc › 2.2 › reference › generated › numpy.corrcoef.html

numpy.corrcoef — NumPy v2.2 Manual

>>> R3 = np.corrcoef(xarr, yarr, rowvar=False) >>> R3 array([[ 1. , 0.77598074, -0.47458546, -0.75078643, -0.9665554 , 0.22423734], [ 0.77598074, 1. , -0.92346708, -0.99923895, -0.58826587, -0.44069024], [-0.47458546, -0.92346708, 1. , 0.93773029, 0.23297648, 0.75137473], [-0.75078643, -0.99923895, 0.93773029, 1.

Medium

medium.com › @amit25173 › understanding-pearson-correlation-in-numpy-step-by-step-guide-d8073425b5dd

Understanding Pearson Correlation in NumPy (Step-by-Step Guide) | by Amit Yadav | Medium

February 8, 2025 - import numpy as np # Sample data with NaN values x = np.array([10, 20, np.nan, 40, 50]) # One value is missing y = np.array([5, 15, 25, 35, 45]) # Remove NaN values before computing correlation mask = ~np.isnan(x) & ~np.isnan(y) # Create a mask for valid values correlation_matrix = np.corrcoef(x[mask], y[mask]) print("Pearson Correlation (after handling NaN values):") print(correlation_matrix) 🔹 What’s happening here?

Dontusethiscode

dontusethiscode.com › blog › 2023-06-28_pandas_slow_corr.html

Why is DataFrame.corr() so much slower than numpy.corrcoef?

from numpy import allclose assert allclose(df.corr(), corrcoef(df.to_numpy(), rowvar=False)) Seems that our outputs match up! Let's take a slightly deeper dive by profiling the code we ran. ... # 175 function calls (169 primitive calls) in 0.121 seconds # Ordered by: internal time # List reduced from 91 to 9 due to restriction <0.1> # ncalls tottime percall cumtime percall filename:lineno(function) # 1 0.095 0.095 0.095 0.095 {pandas._libs.algos.nancorr} # 1 0.023 0.023 0.023 0.023 {method 'copy' of 'numpy.ndarray' objects} # 1 0.002 0.002 0.002 0.002 missing.py:268(_isna_array) # 1 0.000 0.00

Real Python

realpython.com › numpy-scipy-pandas-correlation-python

NumPy, SciPy, and pandas: Correlation With Python – Real Python

October 21, 2023 - In this tutorial, you'll learn what correlation is and how you can calculate it with Python. You'll use SciPy, NumPy, and pandas correlation methods to calculate three different correlation coefficients. You'll also see how to visualize data, regression lines, and correlation matrices with ...

Spark Code Hub

sparkcodehub.com › numpy › data-analysis › correlation-coefficients

Mastering Correlation Coefficients with NumPy Arrays: A Comprehensive Guide

np.corrcoef() does not natively handle · np.nan, producing · nan outputs if missing values are present. To address this, you can preprocess the data using ·

Find elsewhere

Google Bing Mojeek

GitHub

github.com › numpy › numpy › issues › 5080

numpy.corrcoef RuntimeWarning and NaN (wrong output) · Issue #5080 · numpy/numpy

September 18, 2014 - wk=np.ones((400,))_0.00282490517428 print("\nCorrect output for values of ", wk[0]) print(" corrcoef=",np.corrcoef(wk,wk)[0,1]) wk2=wk_1.e13 print("\nCorrect output for values of ", wk2[0]) print(" corrcoef=",np.corrcoef(wk2,wk2)[0,1]) wk2=wk*1.e14 print("\nIncorrect output for values of ", wk2[0]) ... Incorrect output for values of 282490517428.0 /Users/nino/anaconda/envs/py3/lib/python3.3/site-packages/numpy/lib/function_base.py:1823: RuntimeWarning: invalid value encountered in true_divide return c/sqrt(multiply.outer(d, d)) corrcoef= nan

Author numpy

Oreate AI

oreateai.com › blog › handling-nan-values-in-numpys-corrcoef-a-practical-guide › 59fc6bc2b72657bdb9ff929f057afdbe

Handling NaN Values in NumPy's Corrcoef: A Practical Guide - Oreate AI Blog

January 8, 2026 - Remove Rows Containing NaNs: One straightforward approach is simply to drop any rows where one or both of your variables contain a NaN value before calculating the correlation coefficient.

MathWorks

de.mathworks.com › matlabcentral › answers › 379071-how-to-ignore-nan-values-in-the-corr-function

How to ignore NaN values in the CORR Function? - MATLAB Answers - MATLAB Central

January 25, 2018 - But, If there is a NaN value in B, such as: B = [3;5;7;NaN], the correlation corr(A,B) will be NaN instead of 1.0000 (that is the correlation of the not NaN values of A (1;2;3) and B(3;5;7). What can I do to make it calculate the corr function ignoring this NaN values making it give me answers different of "NaN"?

CopyProgramming

copyprogramming.com › howto › numpy-corrcoef-compute-correlation-matrix-while-ignoring-missing-data

Python: Calculate correlation matrix using Numpy corrcoef, with the ability to disregard missing information

August 5, 2023 - Compute correlation matrix with omission of missing data using Numpy corrcoef, NaN values are returned by Pandas df.corr() function instead of correlating coefficients when the dataset contains missing values, Output of Corrcoef results in NaN, Calculating Correlation Coefficient of Two Numpy Arrays with Missing Values

Nickmccullum

nickmccullum.com › python-correlation-statistics

A Guide to Python Correlation Statistics with NumPy, SciPy, & Pandas | Nick McCullum

At this point, you know how to use the corrcoef() and pearsonr() functions to calculate the Pearson correlation coefficient. ... Run the above command then access the values of r and p by typing them on the terminal. ... Note that if you pass an array with a nan value to the pearsonr() function, it will return a ValueError. There are a number of details that you should consider. First, remember that the np.corrcoef() function can take two NumPy arrays as arguments.

reddit.com › r/learnpython › why nan when calculating correlation?

r/learnpython on Reddit: Why nan when calculating correlation?

January 29, 2023 -

 list1=[0.0007290244102478027, 0.12133669853210449, 0.0005068778991699219, 0.18646371364593506, 0.001188039779663086] 
list2= [0.001188039779663086, 0.001188039779663086, 0.001188039779663086, 0.001188039779663086, 0.001188039779663086] 
 l=np.corrcoef(list1,list2)

It returns nan, how to calculate correlation between two floating values in python?

Top answer

1 of 1

6

When NaNs appear in the output but are not present in the inputs Notice that all of the values in y are identical y=[-0.414; -0.414; -0.414]; If you look at the equations for corr2() or Pearson's corr() you'll notice that both have a term in the denominator that subtracts the mean of y from each y-value. When each value of y is identical, the result is a vector of 0s. When you divide by zero, you get NaN. Another way of putting it, the standard deviation of x or y cannot be 0. When you have a vector of identical values, the std is 0. The NaN, in this case, is interpretted as no correlation between the two variables. The correlation describes how much one variable changes as the other variable changes. That requires both variables to change. NaN values in the inputs spreading to the outputs For r=corr2(x,y): When there is 1 or more NaN values in the inputs, to corr2(x,y), the output will be NaN. Fill in the missing data before computing the 2D correlation coefficient. For r=corr(x): A single NaN value in position (i,j) of input matrix x will result in a full row of NaN values at row i and a full column of NaN values in column j of the output matrix r (see explanation). x = [ 6 5 1 3 NaN 9 5 3 7 9 5 5 ]; r = corr(x) 1 NaN -0.52699 NaN NaN NaN -0.52699 NaN 1 For r=corr(x,y): A single NaN value in position (i,j) of either x or y inputs will results in a column of NaN values in column j of the output matrix r. x = [ 9 5 1 1 4 4 2 6 4 2 5 9 ]; y = [ 6 5 1 3 NaN 9 5 3 7 9 5 5 ]; r = corr(x,y) 0.1623 NaN -0.92394 0.3266 NaN -0.23905 0.62312 NaN 0.32367 Ignoring NaNs in corr() inputs The rows option in corr() can be set to complete or pairwise which will ignore NaN values using different methods. 'rows','complete' removes the entire row if the row contains a NaN. In other words, it will remove row 2 from both x and y input matrices. Using the same inputs above, r = corr(x,y,'rows','complete') -0.27735 0.5 -0.94491 -0.69338 -1 0.75593 0.81224 0.14286 0.53995 r2 = corr(x,y) % for comparison 0.1623 NaN -0.92394 0.3266 NaN -0.23905 0.62312 NaN 0.32367 Notice that this changes all of the correlation values since the entire row #2 was removed from both inputs x and y. To confirm that, we can remove those rows and recompute the correlation matrix. % Remove row 2 which contains a NaN in y r3 = corr(x([1,3,4],:) ,y([1,3,4],:)); -0.27735 0.5 -0.94491 -0.69338 -1 0.75593 0.81224 0.14286 0.53995 Voila! Outputs r and r3 match. 'rows','pairwise' only removes rows only if a NaN appears in the pairing of two columns. For the same x, y inputs as above, the correlation with columns in x paired with the 2nd column in y will omit the NaN and will be based on the remaining 3 values. All other column-paired correlations will use all 4 rows of values. r = corr(x,y,'rows','pairwise') 0.1623 0.5 -0.92394 0.3266 -1 -0.23905 0.62312 0.14286 0.32367 r2 = corr(x,y) % for comparison 0.1623 NaN -0.92394 0.3266 NaN -0.23905 0.62312 NaN 0.32367 Notice that values in columns 1 and 3 haven't changed since they do not involve column #2 in y. To confirm the correlation values in column 2 of r, % Remove row 2 which contains a NaN in y r3 = corr(x([1,3,4],:) ,y([1,3,4],:)); % Replace NaN column in r2 with new r values r2(:,2) = r3(:,2) 0.1623 0.5 -0.92394 0.3266 -1 -0.23905 0.62312 0.14286 0.32367 Voila! Updated output r2 matches r.

NumPy

numpy.org › doc › 2.0 › reference › generated › numpy.ma.corrcoef.html

numpy.ma.corrcoef — NumPy v2.0 Manual

These arguments had no effect on the return values of the function and can be safely ignored in this and previous versions of numpy. ... >>> x = np.ma.array([[0, 1], [1, 1]], mask=[0, 1, 0, 1]) >>> np.ma.corrcoef(x) masked_array( data=[[--, --], [--, --]], mask=[[ True, True], [ True, True]], ...

Itdaan

itdaan.com › tw › ac11d3052e7963c0e9e703a00c240352

numpy corrcoef -在忽略缺失數據的同時計算相關矩陣 - numpy corrcoef - compute correlation matrix while ignoring missing data - 开发者知识库

July 24, 2015 - 熊貓的一個主要特點是對南友好。要算相關矩陣，只需調用df_counti .corr()。下面是一個例子來說明df.corr()是耐南性的，而np是。corrcoef不是。 · import pandas as pd import numpy as np # data # ============================== np.random.seed(0) df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE')) df[df < 0] = np.nan df A B C D E 0 1.7641 0.4002 0.9787 2.2409 1.8676 1 NaN 0.9501 NaN NaN 0.4106 2 0.1440 1.4543 0.7610 0.1217 0.4439 3 0.3337 1.4941 NaN 0.3131 NaN 4 NaN 0.6536 0.8644 NaN 2.2698 5 NaN 0.0458 NaN 1.5328 1.4694 6 0.1549 0.3782 NaN NaN NaN 7 0.1563 1.2303 1.2024 NaN NaN 8 NaN NaN NaN 1.9508 NaN 9 NaN NaN 0.7775 NaN NaN ..