To pickup from the comment: "I was doing this:"
df = [df.hc== 2]
What you create there is a "mask": an array with booleans that says which part of the index fulfilled your condition.
To filter your dataframe on your condition you want to do this:
df = df[df.hc == 2]
A bit more explicit is this:
mask = df.hc == 2
df = df[mask]
If you want to keep the entire dataframe and only want to replace specific values, there are methods such replace: Python pandas equivalent for replace. Also another (performance wise great) method would be creating a separate DataFrame with the from/to values as column and using pd.merge to combine it into the existing DataFrame. And using your index to set values is also possible:
df[mask]['fname'] = 'Johnson'
But for a larger set of replaces you would want to use one of the two other methods or use "apply" with a lambda function (for value transformations). Last but not least: you can use .fillna('bla') to rapidly fill up NA values.
Answer from Carst on Stack OverflowTo pickup from the comment: "I was doing this:"
df = [df.hc== 2]
What you create there is a "mask": an array with booleans that says which part of the index fulfilled your condition.
To filter your dataframe on your condition you want to do this:
df = df[df.hc == 2]
A bit more explicit is this:
mask = df.hc == 2
df = df[mask]
If you want to keep the entire dataframe and only want to replace specific values, there are methods such replace: Python pandas equivalent for replace. Also another (performance wise great) method would be creating a separate DataFrame with the from/to values as column and using pd.merge to combine it into the existing DataFrame. And using your index to set values is also possible:
df[mask]['fname'] = 'Johnson'
But for a larger set of replaces you would want to use one of the two other methods or use "apply" with a lambda function (for value transformations). Last but not least: you can use .fillna('bla') to rapidly fill up NA values.
The traceback indicates to you that df is a list and not a DataFrame as expected in your line of code.
It means that between df = pd.read_csv("test.csv") and df.loc[df.ID == 103, ['fname', 'lname']] = 'Michael', 'Johnson' you have other lines of codes that assigns a list object to df. Review that piece of code to find your bug
In pandas the object type is used when there is not a clear distinction between the types stored in the column.
So, I guess that in your column, some objects are float type and some objects are str type. Or maybe, you are also dealing with NaN objects, NaN objects are float objects.
a) Convert the column to string: Are you getting your DataFrame from a CSV or XLS format file? Then at the moment of reading the file, you can specify that that column is an str type or just make the type conversion of the column you are dealing with.
b) After that, you can apply the string changes and/or deal with the NaN objects.
c) Finally, you transform your column into float type`.
Maybe it's a very rudimentary method but I would just do
listt = []
for i in data['column_name']:
listt.append(float(i))
data['FloatData'] = listt
I am learning pandas right now and I am working on a little project for fun with Netflix movies. I am trying to filter the dataset, so it contains only movies with a genre specified by the user; however, I am getting the following error for this chunk of code after an input has been made:
genre = input("Genre: ")
for index in df.index:
if df.loc[index, genre] == 0:
df = df.drop(index, inplace = True)
print(df.head())
if df.loc[index, genre] == 0:
^^^^^^
AttributeError: 'NoneType' object has no attribute 'loc'
Try this instead,
print(
"{:.3f}% {} ({} sentences)".format(pcent, gender, nsents)
)
Refer the latest docs for more examples and check the Py version!
You could also use {:.3%} instead of {:.3f}%.
It will transform the value into percentages automatically.
That means "{:.3%}".format(0.3) will print "30%" while you have to write "{:.3f}%".format(0.3 * 100) to get "30%" as well.
[Solved - thanks to DisasterArt]
https://codeshare.io/246gXj
I keep getting this error:
AttributeError: 'float' object has no attribute 'time'
I don't see anything wrong? Thanks!
The error points to this line:
df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() \
if x not in stop_words))
split is being used here as a method of Python's built-in str class. Your error indicates one or more values in df['content'] is of type float. This could be because there is a null value, i.e. NaN, or a non-null float value.
One workaround, which will stringify floats, is to just apply str on x before using split:
df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in str(x).split() \
if x not in stop_words))
Alternatively, and possibly a better solution, be explicit and use a named function with a try / except clause:
def converter(x):
try:
return ' '.join([x.lower() for x in str(x).split() if x not in stop_words])
except AttributeError:
return None # or some other value
df['content'] = df['content'].apply(converter)
Since pd.Series.apply is just a loop with overhead, you may find a list comprehension or map more efficient:
df['content'] = [converter(x) for x in df['content']]
df['content'] = list(map(converter, df['content']))
split() is a python method which is only applicable to strings. It seems that your column "content" not only contains strings but also other values like floats to which you cannot apply the .split() mehthod.
Try converting the values to a string by using str(x).split() or by converting the entire column to strings first, which would be more efficient. You do this as follows:
df['column_name'].astype(str)
The answer is already provided in the comments by @mattdmo and @tdelaney:
NumPy 1.20 (release notes) deprecated
numpy.float,numpy.int, and similar aliases, causing them to issue a deprecation warningNumPy 1.24 (release notes) removed these aliases altogether, causing an error when they are used
In many cases you can simply replace the deprecated NumPy types by the equivalent Python built-in type, e.g., numpy.float becomes a "plain" Python float.
For detailed guidelines on how to deal with various deprecated types, have a closer look at the table and guideline in the release notes for 1.20:
...
To give a clear guideline for the vast majority of cases, for the types
bool,object,str(andunicode) using the plain version is shorter and clear, and generally a good replacement. Forfloatandcomplexyou can usefloat64andcomplex128if you wish to be more explicit about the precision.For
np.inta direct replacement withnp.int_orintis also good and will not change behavior, but the precision will continue to depend on the computer and operating system. If you want to be more explicit and review the current use, you have the following alternatives:
np.int64ornp.int32to specify the precision exactly. This ensures that results cannot depend on the computer or operating system.np.int_orint(the default), but be aware that it depends on the computer and operating system.- The C types:
np.cint(int),np.int_(long),np.longlong.np.intpwhich is 32bit on 32bit machines 64bit on 64bit machines. This can be the best type to use for indexing....
If you have dependencies that use the deprecated types, a quick workaround would be to roll back your NumPy version to 1.24 or less (as suggested in some of the other answers), while waiting for the dependency to catch up. Alternatively, you could create a patch yourself and open a pull request, or monkey patch the dependency in your own code.
In the 1.24 version:
The deprecation for the aliases np.object, np.bool, np.float, np.complex, np.str, and np.int is expired (introduces NumPy 1.20). Some of these will now give a FutureWarning in addition to raising an error since they will be mapped to the NumPy scalars in the future.
pip install "numpy<1.24" to work around it.
In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.23.5'
In [3]: np.float(3)
<ipython-input-3-8262e04d58e1>:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.float(3)
Out[3]: 3.0
As pointed out by warren-weckesser this can also happen if you use dtype object (and in fact this is likelier the issue you are facing):
>>> s = pd.Series([1.0], dtype='object')
>>> s
0 1
dtype: object
>>> np.log(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'log'
You can address this by setting the dtype to float explicitly:
>>> np.log(s.astype('float64'))
0 0.0
dtype: float64
In your case:
np.log(df['price'].astype('float'))
Note: You can have more control using to_numeric.
First/alternative answer:
You have a float variable np in scope.
The problem is that:
import numpy as np
np = 1
np.log
is perfectly valid python.
>>> import numpy as np
>>> np = 1.
>>> np.log
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'log'
The solution is not to use np are a variable name, or other popular import abbreviations pd or dt etc.
You can pick this kind of error up using a linter.
The problem is outside of the code that you posted. Your code works. At least if I assume that df is a dict. But I cannot assume anything else, because your question does not specify it.
import numpy as np
df = {'price': 10.0}
df['ln_price'] = np.log(df['price'])
print(df)
{'price': 10.0, 'ln_price': 2.3025850929940459}