In programming, you must know your data types (classes).
You wanted to use this cast method:
Column.cast(dataType: Union[pyspark.sql.types.DataType, str]) → pyspark.sql.column.Column
You must know your data types (classes)
A.cast(B) → C
A: The parent class of the method. It's pyspark.sql.column.Column class (a.k.a. pyspark.sql.Column).
B: Inputs for the method. According to the above documentation line, you can use either pyspark.sql.types.DataType or str class.
C: The output class. According to the above documentation line, it's pyspark.sql.column.Column.
In your case, your actual A is of wrong data type to be chained with cast.
In other words, the class of A, doesn't have a cast method.
In other words, as your A = number1-number2/number3*number4 which means it's a float class object , the error precisely tells you that "'float' object has no attribute 'cast'".
Regarding the translation of your Python code to PySpark, it doesn't really make sense. It's because you do the calculation for variables. I mean, only 2 variables. The pyspark.sql.Column objects are called columns, because they contain many different values. So you must create a dataframe (just columns are not enough for actual calculations) and put some values in columns in order to make sense of translating the formula to PySpark.
I'll just show you how it may work if you had just one row.
Creating Spark session (not needed if you run the code in PySpark shell):
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
Creating and printing the dataframe:
df = spark.createDataFrame([(2, 50)], ['null_count', 'total'])
df.show()
# +----------+-----+
# |null_count|total|
# +----------+-----+
# | 2| 50|
# +----------+-----+
Adding a column using your logic, but working with Spark columns instead of Python variables.
df = df.withColumn('d', F.round(100 - F.col('null_count') / F.col('total') * 100, 2).cast('float'))
df.show()
# +----------+-----+----+
# |null_count|total| d|
# +----------+-----+----+
# | 2| 50|96.0|
# +----------+-----+----+
Python's round was also replaced with PySpark's F.round, because the argument to the function will now be Spark column expression (i.e. a column) as opposed to a single value or variable.
'float' object has no attribute...(beginner)
python - Why I get AttributeError: 'float' object has no attribute '3f'? - Data Science Stack Exchange
python - Numpy AttributeError: 'float' object has no attribute 'exp' - Stack Overflow
AttributeError: 'float' object has no attribute 'upper'
Videos
Try this instead,
print(
"{:.3f}% {} ({} sentences)".format(pcent, gender, nsents)
)
Refer the latest docs for more examples and check the Py version!
You could also use {:.3%} instead of {:.3f}%.
It will transform the value into percentages automatically.
That means "{:.3%}".format(0.3) will print "30%" while you have to write "{:.3f}%".format(0.3 * 100) to get "30%" as well.
Probably there's something wrong with the input values for X and/or T. The function from the question works ok:
import numpy as np
from math import e
def sigmoid(X, T):
return 1.0 / (1.0 + np.exp(-1.0 * np.dot(X, T)))
X = np.array([[1, 2, 3], [5, 0, 0]])
T = np.array([[1, 2], [1, 1], [4, 4]])
print(X.dot(T))
# Just to see if values are ok
print([1. / (1. + e ** el) for el in [-5, -10, -15, -16]])
print()
print(sigmoid(X, T))
Result:
[[15 16]
[ 5 10]]
[0.9933071490757153, 0.9999546021312976, 0.999999694097773, 0.9999998874648379]
[[ 0.99999969 0.99999989]
[ 0.99330715 0.9999546 ]]
Probably it's the dtype of your input arrays. Changing X to:
X = np.array([[1, 2, 3], [5, 0, 0]], dtype=object)
Gives:
Traceback (most recent call last):
File "/[...]/stackoverflow_sigmoid.py", line 24, in <module>
print sigmoid(X, T)
File "/[...]/stackoverflow_sigmoid.py", line 14, in sigmoid
return 1.0 / (1.0 + np.exp(-1.0 * np.dot(X, T)))
AttributeError: exp
You convert type np.dot(X, T) to float32 like this:
z=np.array(np.dot(X, T),dtype=np.float32)
def sigmoid(X, T):
return (1.0 / (1.0 + np.exp(-z)))
Hopefully it will finally work!
[Solved - thanks to DisasterArt]
https://codeshare.io/246gXj
I keep getting this error:
AttributeError: 'float' object has no attribute 'time'
I don't see anything wrong? Thanks!
The error points to this line:
df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() \
if x not in stop_words))
split is being used here as a method of Python's built-in str class. Your error indicates one or more values in df['content'] is of type float. This could be because there is a null value, i.e. NaN, or a non-null float value.
One workaround, which will stringify floats, is to just apply str on x before using split:
df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in str(x).split() \
if x not in stop_words))
Alternatively, and possibly a better solution, be explicit and use a named function with a try / except clause:
def converter(x):
try:
return ' '.join([x.lower() for x in str(x).split() if x not in stop_words])
except AttributeError:
return None # or some other value
df['content'] = df['content'].apply(converter)
Since pd.Series.apply is just a loop with overhead, you may find a list comprehension or map more efficient:
df['content'] = [converter(x) for x in df['content']]
df['content'] = list(map(converter, df['content']))
split() is a python method which is only applicable to strings. It seems that your column "content" not only contains strings but also other values like floats to which you cannot apply the .split() mehthod.
Try converting the values to a string by using str(x).split() or by converting the entire column to strings first, which would be more efficient. You do this as follows:
df['column_name'].astype(str)