Short answer:
regression.predict([[60]])
Long answer:
regression.predict takes a 2d array of values you want to predict on. Each item in the array is a "point" you want your model to predict on. Suppose we want to predict on the points 60, 52, and 31. Then we'd say regression.predict([[60], [52], [31]])
The reason we need a 2d array is because we can do linear regression in a higher dimension space than just 2d. For example, we could do linear regression in a 3d space. Suppose we want to predict "z" for a given data point (x, y). Then we'd need to say regression.predict([[x, y]]).
Taking this example further, we could predict "z" for a set of "x" and "y" points. For example, we want to predict the "z" values for each of the points: (0, 2), (3, 7), (10, 8). Then we would say regression.predict([[0, 2], [3, 7], [10, 8]]) which fully demonstrates the need for regression.predict to take a 2d array of values to predict on points.
Answer from Will Lyles on Stack OverflowShort answer:
regression.predict([[60]])
Long answer:
regression.predict takes a 2d array of values you want to predict on. Each item in the array is a "point" you want your model to predict on. Suppose we want to predict on the points 60, 52, and 31. Then we'd say regression.predict([[60], [52], [31]])
The reason we need a 2d array is because we can do linear regression in a higher dimension space than just 2d. For example, we could do linear regression in a 3d space. Suppose we want to predict "z" for a given data point (x, y). Then we'd need to say regression.predict([[x, y]]).
Taking this example further, we could predict "z" for a set of "x" and "y" points. For example, we want to predict the "z" values for each of the points: (0, 2), (3, 7), (10, 8). Then we would say regression.predict([[0, 2], [3, 7], [10, 8]]) which fully demonstrates the need for regression.predict to take a 2d array of values to predict on points.
The ValueError is fairly clear, predict expects a 2D array but you passed a scalar.
hgt = np.random.randint(50, 70, 10).reshape(-1, 1)
wgt = np.random.randint(90, 120, 10).reshape(-1, 1)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
regression = LinearRegression()
regression.fit(hgt,wgt)
regression.predict([[60]])
You get
array([[105.10013717]])
Expected 2D array, got scalar array instead:\narray=nan.\nReshape your data either using array.reshape(-1, 1)
Predict() not working when using Jupyter
python - ValueError: Expected 2D array, got scalar array instead: array=11 - Stack Overflow
Expected 2D Array, got scalar array - QuantConnect.com
What Is Causing Valueerror: Expected 2D Array, Got 1D Array Instead: Error in Python?
What Is a 2-Dimensional Array in Python?
What Is the Syntax for Declaring a 2D Array in Python?
Videos
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set() from sklearn.linear_model import LinearRegression csvfile = r"C:\Users\User1\Documents\Python\1.01.+Simple+linear+regression.csv" data = pd.read_csv(csvfile) data.head() x = data['SAT'] y = data['GPA'] # x, y are vector of length 84 x.shape y.shape # Data needs to be reshaped as 2D # x_matrix = x.values.reshape(84,1) or use this x_matrix = x.values.reshape(-1,1) x_matrix = x.values.reshape(-1,1) x_matrix.shape # Create instance object of linear regression class reg = LinearRegression() reg.fit(x_matrix,y) # R squared coefficient reg.score(x_matrix,y) reg.coef_ reg.intercept_ reg.predict(1740)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[24], line 1
----> 1 reg.predict(1740)
File ~\anaconda3\Lib\site-packages\sklearn\linear_model\_base.py:386, in LinearModel.predict(self, X)
372 def predict(self, X):
373 """
374 Predict using the linear model.
375
(...)
384 Returns predicted values.
385 """
--> 386 return self._decision_function(X)
File ~\anaconda3\Lib\site-packages\sklearn\linear_model\_base.py:369, in LinearModel._decision_function(self, X)
366 def _decision_function(self, X):
367 check_is_fitted(self)
--> 369 X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False)
370 return safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_
File ~\anaconda3\Lib\site-packages\sklearn\base.py:604, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
602 out = X, y
603 elif not no_val_X and no_val_y:
--> 604 out = check_array(X, input_name="X", **check_params)
605 elif no_val_X and not no_val_y:
606 out = _check_y(y, **check_params)
File ~\anaconda3\Lib\site-packages\sklearn\utils\validation.py:932, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
929 if ensure_2d:
930 # If input is scalar raise error
931 if array.ndim == 0:
--> 932 raise ValueError(
933 "Expected 2D array, got scalar array instead:\narray={}.\n"
934 "Reshape your data either using array.reshape(-1, 1) if "
935 "your data has a single feature or array.reshape(1, -1) "
936 "if it contains a single sample.".format(array)
937 )
938 # If input is 1D raise error
939 if array.ndim == 1:
ValueError: Expected 2D array, got scalar array instead:
array=1740.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Try:
Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1))
Scikit does not work with scalars (just one single value). It expects a shape $(m\times n)$ where $m$ is the number of features and $n$ is the number of observations, both are 1 in your case.
Y_pred = regressor.predict([[6.5]])