It may be that the order of your X_train data is wrong. Try to sort them out. For instance, if X_train is just a list of numbers, you could say:
X_train.sort()
Answer from Neb on Stack Overflowpython - Sklearn logistic regression, plotting probability curve graph - Stack Overflow
Best way to plot and rank logistic regression coefficients?
Trouble producing logistic regression plot
Comprehensive Guide on Logistic Regression
Videos
It may be that the order of your X_train data is wrong. Try to sort them out. For instance, if X_train is just a list of numbers, you could say:
X_train.sort()
You can plot a smooth line curve by first determining the spline curve’s coefficients using the scipy.interpolate.make_interp_spline():
import numpy as np
import numpy as np
from scipy.interpolate import make_interp_spline
import matplotlib.pyplot as plt
# Dataset
x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y = np.array([20, 30, 5, 12, 39, 48, 50, 3])
X_Y_Spline = make_interp_spline(x, y)
# Returns evenly spaced numbers
# over a specified interval.
X_ = np.linspace(x.min(), x.max(), 500)
Y_ = X_Y_Spline(X_)
# Plotting the Graph
plt.plot(X_, Y_)
plt.title("Plot Smooth Curve Using the scipy.interpolate.make_interp_spline() Class")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Result:

You can use seaborn regplot with the following syntax
import seaborn as sns
sns.regplot(x='balance', y='default', data=data, logistic=True)
you use predict(X) which gives out the prediction of the class.
replace predict(X) with predict_proba(X)[:,1] which would gives out the probability of which the data belong to class 1.
Hi I am a beginner in coding in python and machine learning and I am trying to learn about what goes on under the hood of logistic regression and making it run in python. I have been tasked with plotting and ranking the weights/coefficients of logistic regression below in order to remove features with the least impact from the code. But, whilst I've added a basic plot it doesn't help me rank the coefficients/thetas. Any help pointing in the right direction would be appreciated.
This is also using the wisconin breast cancer dataset (https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
df = pd.read_csv("cancerdata.csv")
X = df.values[:,2:-1].astype('float64')
X = (X - np.mean(X, axis =0)) / np.std(X, axis = 0)
X = np.hstack([np.ones((X.shape[0], 1)),X])
X = MinMaxScaler().fit_transform(X)
Y = df["diagnosis"].map({'M':1,'B':0})
Y = np.array(Y)
X_train,X_test,Y_train,Y_test =
train_test_split(X,Y,test_size=0.25)
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, x):
return Sigmoid(x @ theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T @ (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta -
Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy %: ', my_accuracy)
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta =
Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
Accuracy(theta)
plt.plot(theta)
plt.show()
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 2000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)