One of the main features of pandas is being NaN friendly. To calculate correlation matrix, simply call df_counties.corr(). Below is an example to demonstrate df.corr() is NaN tolerant whereas np.corrcoef is not.
import pandas as pd
import numpy as np
# data
# ==============================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE'))
df[df < 0] = np.nan
df
A B C D E
0 1.7641 0.4002 0.9787 2.2409 1.8676
1 NaN 0.9501 NaN NaN 0.4106
2 0.1440 1.4543 0.7610 0.1217 0.4439
3 0.3337 1.4941 NaN 0.3131 NaN
4 NaN 0.6536 0.8644 NaN 2.2698
5 NaN 0.0458 NaN 1.5328 1.4694
6 0.1549 0.3782 NaN NaN NaN
7 0.1563 1.2303 1.2024 NaN NaN
8 NaN NaN NaN 1.9508 NaN
9 NaN NaN 0.7775 NaN NaN
.. ... ... ... ... ...
90 NaN 0.8202 0.4631 0.2791 0.3389
91 2.0210 NaN NaN 0.1993 NaN
92 NaN NaN NaN 0.1813 NaN
93 2.4125 NaN NaN NaN 0.2515
94 NaN NaN NaN NaN 1.7389
95 0.9944 1.3191 NaN 1.1286 0.4960
96 0.7714 1.0294 NaN NaN 0.8626
97 NaN 1.5133 0.5531 NaN 0.2205
98 NaN NaN 1.1003 1.2980 2.6962
99 NaN NaN NaN NaN NaN
[100 rows x 5 columns]
# calculations
# ================================
df.corr()
A B C D E
A 1.0000 0.2718 0.2678 0.2822 0.1016
B 0.2718 1.0000 -0.0692 0.1736 -0.1432
C 0.2678 -0.0692 1.0000 -0.3392 0.0012
D 0.2822 0.1736 -0.3392 1.0000 0.1562
E 0.1016 -0.1432 0.0012 0.1562 1.0000
np.corrcoef(df, rowvar=False)
array([[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan]])
Answer from Jianxun Li on Stack Overflow Top answer 1 of 3
40
One of the main features of pandas is being NaN friendly. To calculate correlation matrix, simply call df_counties.corr(). Below is an example to demonstrate df.corr() is NaN tolerant whereas np.corrcoef is not.
import pandas as pd
import numpy as np
# data
# ==============================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE'))
df[df < 0] = np.nan
df
A B C D E
0 1.7641 0.4002 0.9787 2.2409 1.8676
1 NaN 0.9501 NaN NaN 0.4106
2 0.1440 1.4543 0.7610 0.1217 0.4439
3 0.3337 1.4941 NaN 0.3131 NaN
4 NaN 0.6536 0.8644 NaN 2.2698
5 NaN 0.0458 NaN 1.5328 1.4694
6 0.1549 0.3782 NaN NaN NaN
7 0.1563 1.2303 1.2024 NaN NaN
8 NaN NaN NaN 1.9508 NaN
9 NaN NaN 0.7775 NaN NaN
.. ... ... ... ... ...
90 NaN 0.8202 0.4631 0.2791 0.3389
91 2.0210 NaN NaN 0.1993 NaN
92 NaN NaN NaN 0.1813 NaN
93 2.4125 NaN NaN NaN 0.2515
94 NaN NaN NaN NaN 1.7389
95 0.9944 1.3191 NaN 1.1286 0.4960
96 0.7714 1.0294 NaN NaN 0.8626
97 NaN 1.5133 0.5531 NaN 0.2205
98 NaN NaN 1.1003 1.2980 2.6962
99 NaN NaN NaN NaN NaN
[100 rows x 5 columns]
# calculations
# ================================
df.corr()
A B C D E
A 1.0000 0.2718 0.2678 0.2822 0.1016
B 0.2718 1.0000 -0.0692 0.1736 -0.1432
C 0.2678 -0.0692 1.0000 -0.3392 0.0012
D 0.2822 0.1736 -0.3392 1.0000 0.1562
E 0.1016 -0.1432 0.0012 0.1562 1.0000
np.corrcoef(df, rowvar=False)
array([[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan]])
2 of 3
31
This will work, using the masked array numpy module:
import numpy as np
import numpy.ma as ma
A = [1, 2, 3, 4, 5, np.NaN]
B = [2, 3, 4, 5.25, np.NaN, 100]
print(ma.corrcoef(ma.masked_invalid(A), ma.masked_invalid(B)))
It outputs:
[[1.0 0.99838143945703]
[0.99838143945703 1.0]]
Read more here: https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html
NumPy
numpy.org › doc › 2.1 › reference › generated › numpy.ma.corrcoef.html
numpy.ma.corrcoef — NumPy v2.1 Manual
These arguments had no effect on the return values of the function and can be safely ignored in this and previous versions of numpy. ... >>> import numpy as np >>> x = np.ma.array([[0, 1], [1, 1]], mask=[0, 1, 0, 1]) >>> np.ma.corrcoef(x) masked_array( data=[[--, --], [--, --]], mask=[[ True, ...
numpy.corrcoef RuntimeWarning and NaN (wrong output)
I have found a weird behaviour for numpy.corrcoef . I reproduce with debian's squeeze python 2.6, on a compiled 2.7 python and in anaconda's 2.7 and 3.3 pythons on MacOSX. The bug is shown ... More on github.com
Why nan when calculating correlation?
List 2 is comprised of completely identical elements. Its standard deviation is therefore zero. My stats is rusty, but according to Wikipedia, the correlation coefficient is calculated by dividing by the SDs, and you can't divide by zero. More on reddit.com
How to ignore NaN values in the CORR Function?
Hi Guys, My problem is the opposite of the other problems reported here between NAN values in CORR function. If I have a matrix A = [1;2;3;4] and a matrix B = [3;5;7;8], the correlation corr(... More on mathworks.com
getting a NaN in correlation coefficient
Hi, i have a simple problem which unfortunately i am unable to understand. I have matrices and i am trying to calculate correlation coefficient between two variables. A simple example from my code... More on nl.mathworks.com
GitHub
github.com › numpy › numpy › issues › 14414
[Feature Request]: Nan Values for correlation and cross correlation · Issue #14414 · numpy/numpy
September 3, 2019 - In the case of corrcoef it is straight forward and can be solved by ignoring the nan values of both arrays, however in the convolution setting, it might have different lag on the two series which would create unwanted results.
Author numpy
NumPy
numpy.org › doc › 2.2 › reference › generated › numpy.corrcoef.html
numpy.corrcoef — NumPy v2.2 Manual
>>> R3 = np.corrcoef(xarr, yarr, rowvar=False) >>> R3 array([[ 1. , 0.77598074, -0.47458546, -0.75078643, -0.9665554 , 0.22423734], [ 0.77598074, 1. , -0.92346708, -0.99923895, -0.58826587, -0.44069024], [-0.47458546, -0.92346708, 1. , 0.93773029, 0.23297648, 0.75137473], [-0.75078643, -0.99923895, 0.93773029, 1.
Medium
medium.com › @amit25173 › understanding-pearson-correlation-in-numpy-step-by-step-guide-d8073425b5dd
Understanding Pearson Correlation in NumPy (Step-by-Step Guide) | by Amit Yadav | Medium
February 8, 2025 - import numpy as np # Sample data with NaN values x = np.array([10, 20, np.nan, 40, 50]) # One value is missing y = np.array([5, 15, 25, 35, 45]) # Remove NaN values before computing correlation mask = ~np.isnan(x) & ~np.isnan(y) # Create a mask for valid values correlation_matrix = np.corrcoef(x[mask], y[mask]) print("Pearson Correlation (after handling NaN values):") print(correlation_matrix) 🔹 What’s happening here?
Dontusethiscode
dontusethiscode.com › blog › 2023-06-28_pandas_slow_corr.html
Why is DataFrame.corr() so much slower than numpy.corrcoef?
from numpy import allclose assert allclose(df.corr(), corrcoef(df.to_numpy(), rowvar=False)) Seems that our outputs match up! Let's take a slightly deeper dive by profiling the code we ran. ... # 175 function calls (169 primitive calls) in 0.121 seconds # Ordered by: internal time # List reduced from 91 to 9 due to restriction <0.1> # ncalls tottime percall cumtime percall filename:lineno(function) # 1 0.095 0.095 0.095 0.095 {pandas._libs.algos.nancorr} # 1 0.023 0.023 0.023 0.023 {method 'copy' of 'numpy.ndarray' objects} # 1 0.002 0.002 0.002 0.002 missing.py:268(_isna_array) # 1 0.000 0.00
Real Python
realpython.com › numpy-scipy-pandas-correlation-python
NumPy, SciPy, and pandas: Correlation With Python – Real Python
October 21, 2023 - In this tutorial, you'll learn what correlation is and how you can calculate it with Python. You'll use SciPy, NumPy, and pandas correlation methods to calculate three different correlation coefficients. You'll also see how to visualize data, regression lines, and correlation matrices with ...
Spark Code Hub
sparkcodehub.com › numpy › data-analysis › correlation-coefficients
Mastering Correlation Coefficients with NumPy Arrays: A Comprehensive Guide
np.corrcoef() does not natively handle · np.nan, producing · nan outputs if missing values are present. To address this, you can preprocess the data using ·
GitHub
github.com › numpy › numpy › issues › 5080
numpy.corrcoef RuntimeWarning and NaN (wrong output) · Issue #5080 · numpy/numpy
September 18, 2014 - wk=np.ones((400,))_0.00282490517428 print("\nCorrect output for values of ", wk[0]) print(" corrcoef=",np.corrcoef(wk,wk)[0,1]) wk2=wk_1.e13 print("\nCorrect output for values of ", wk2[0]) print(" corrcoef=",np.corrcoef(wk2,wk2)[0,1]) wk2=wk*1.e14 print("\nIncorrect output for values of ", wk2[0]) ... Incorrect output for values of 282490517428.0 /Users/nino/anaconda/envs/py3/lib/python3.3/site-packages/numpy/lib/function_base.py:1823: RuntimeWarning: invalid value encountered in true_divide return c/sqrt(multiply.outer(d, d)) corrcoef= nan
Author numpy
MathWorks
de.mathworks.com › matlabcentral › answers › 379071-how-to-ignore-nan-values-in-the-corr-function
How to ignore NaN values in the CORR Function? - MATLAB Answers - MATLAB Central
January 25, 2018 - But, If there is a NaN value in B, such as: B = [3;5;7;NaN], the correlation corr(A,B) will be NaN instead of 1.0000 (that is the correlation of the not NaN values of A (1;2;3) and B(3;5;7). What can I do to make it calculate the corr function ignoring this NaN values making it give me answers different of "NaN"?
CopyProgramming
copyprogramming.com › howto › numpy-corrcoef-compute-correlation-matrix-while-ignoring-missing-data
Python: Calculate correlation matrix using Numpy corrcoef, with the ability to disregard missing information
August 5, 2023 - Compute correlation matrix with omission of missing data using Numpy corrcoef, NaN values are returned by Pandas df.corr() function instead of correlating coefficients when the dataset contains missing values, Output of Corrcoef results in NaN, Calculating Correlation Coefficient of Two Numpy Arrays with Missing Values
Nickmccullum
nickmccullum.com › python-correlation-statistics
A Guide to Python Correlation Statistics with NumPy, SciPy, & Pandas | Nick McCullum
At this point, you know how to use the corrcoef() and pearsonr() functions to calculate the Pearson correlation coefficient. ... Run the above command then access the values of r and p by typing them on the terminal. ... Note that if you pass an array with a nan value to the pearsonr() function, it will return a ValueError. There are a number of details that you should consider. First, remember that the np.corrcoef() function can take two NumPy arrays as arguments.
Reddit
reddit.com › r/learnpython › why nan when calculating correlation?
r/learnpython on Reddit: Why nan when calculating correlation?
January 29, 2023 -
list1=[0.0007290244102478027, 0.12133669853210449, 0.0005068778991699219, 0.18646371364593506, 0.001188039779663086] list2= [0.001188039779663086, 0.001188039779663086, 0.001188039779663086, 0.001188039779663086, 0.001188039779663086] l=np.corrcoef(list1,list2)
It returns nan, how to calculate correlation between two floating values in python?
SciPy
docs.scipy.org › doc › scipy › reference › generated › scipy.stats.pearsonr.html
pearsonr — SciPy v1.17.0 Manual
In some cases, confidence limits may be NaN due to a degenerate resample, and this is typical for very small samples (~6 observations). ... If x and y do not have length at least 2. ... Raised if an input is a constant array. The correlation coefficient is not defined in this case, so np.nan is ...
Cancerdatascience
cancerdatascience.org › blog › posts › pearson-correlation
Speeding up calculation of Pearson correlation for matrices with missing data - Cancer Data Science Blog
def np_pearson_cor(x, y): xv = x - x.mean(axis=0) yv = y - y.mean(axis=0) xvss = (xv * xv).sum(axis=0) yvss = (yv * yv).sum(axis=0) result = np.matmul(xv.transpose(), yv) / np.sqrt(np.outer(xvss, yvss)) # bound the values to -1 to 1 in the event of precision issues return np.maximum(np.minimum(result, 1.0), -1.0) ... That was so much faster, let's actually make our test data much larger to get a better sense of time. ... Now, that's great, but our matrices have missing values (represented as NaNs in the matrices X and Y).
NumPy
numpy.org › doc › 2.0 › reference › generated › numpy.ma.corrcoef.html
numpy.ma.corrcoef — NumPy v2.0 Manual
These arguments had no effect on the return values of the function and can be safely ignored in this and previous versions of numpy. ... >>> x = np.ma.array([[0, 1], [1, 1]], mask=[0, 1, 0, 1]) >>> np.ma.corrcoef(x) masked_array( data=[[--, --], [--, --]], mask=[[ True, True], [ True, True]], ...
Itdaan
itdaan.com › tw › ac11d3052e7963c0e9e703a00c240352
numpy corrcoef -在忽略缺失數據的同時計算相關矩陣 - numpy corrcoef - compute correlation matrix while ignoring missing data - 开发者知识库
July 24, 2015 - 熊貓的一個主要特點是對南友好。要算相關矩陣,只需調用df_counti .corr()。下面是一個例子來說明df.corr()是耐南性的,而np是。corrcoef不是。 · import pandas as pd import numpy as np # data # ============================== np.random.seed(0) df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE')) df[df < 0] = np.nan df A B C D E 0 1.7641 0.4002 0.9787 2.2409 1.8676 1 NaN 0.9501 NaN NaN 0.4106 2 0.1440 1.4543 0.7610 0.1217 0.4439 3 0.3337 1.4941 NaN 0.3131 NaN 4 NaN 0.6536 0.8644 NaN 2.2698 5 NaN 0.0458 NaN 1.5328 1.4694 6 0.1549 0.3782 NaN NaN NaN 7 0.1563 1.2303 1.2024 NaN NaN 8 NaN NaN NaN 1.9508 NaN 9 NaN NaN 0.7775 NaN NaN ..