The return value of Series.describe() is a Series with the descriptive statistics. The dtype you see in the Series is not the dtype of the original column but the dtype of the statistics - which is float.
The name of the result is price because that is set as the name of the Series autos["price"].
Videos
The return value of Series.describe() is a Series with the descriptive statistics. The dtype you see in the Series is not the dtype of the original column but the dtype of the statistics - which is float.
The name of the result is price because that is set as the name of the Series autos["price"].
If I control the number of display digits, will I get the data I want?
pd.set_option('display.float_format', lambda x: '%.5f' % x)
df['X'].describe().apply("{0:.5f}".format)
.describe is the bound method. It's bound to your dataframe, and the representation of a bound method includes the repr() output of whatever it is bound to.
You can see this at the start of the output:
<bound method NDFrame.describe of ...>
The rest is just the same string as what repr(data) produces.
Note that Python interactive interpreter always echoes the representation of whatever the last expression produced (unless it produced None). data.describe produces the bound method, data.describe() produces whatever that method was designed to produce.
You can create the same kind of output for any bound method:
>>> class Foo:
... def __repr__(self):
... return "[This is an instance of the Foo class]"
... def bar(self):
... return "This is what Foo().bar() produces"
...
>>> Foo()
[This is an instance of the Foo class]
>>> Foo().bar
<bound method Foo.bar of [This is an instance of the Foo class]>
>>> Foo().bar()
"This is what Foo().bar() produces"
Note that Foo() has a custom __repr__ method, which is what is called to produce the representation of an instance.
You can see the same kind of output (the representation of the whole dataframe) for any method on the dataframe you don't actually call, e.g. data.sum, data.query, data.pivot, or data.__repr__.
A bound method is part of the process by which Python passes in the instance as the first argument when you call it, the argument usually named self. It is basically a proxy object with references to the original function (data.describe.__func__) and the instance to pass in before all other arguments (data.describe.__self__). See the descriptor HOWTO for details on how binding works.
If you wanted to express the __repr__ implementation of a bound method as Python code, it would be:
def __repr__(self):
return f"<bound method {self.__func__.__qualname__} of {self.__self__!r}>"
At the risk of over-simplifying:
.describe is a method which is part of the NDFrame class, which can be called to get stats on your frame.
You use this method by calling the describe() function.
For more detail, and an excellent low-level explanation - see Martijn's answer.
As of pandas v15.0, use the parameter, DataFrame.describe(include = 'all') to get a summary of all the columns when the dataframe has mixed column types. The default behavior is to only provide a summary for the numerical columns.
Example:
In[1]:
df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df.describe(include = 'all')
Out[1]:
$a $b
count 5 5.000000
unique 4 NaN
top a NaN
freq 2 NaN
mean NaN 2.000000
std NaN 1.581139
min NaN 0.000000
25% NaN 1.000000
50% NaN 2.000000
75% NaN 3.000000
max NaN 4.000000
The numerical columns will have NaNs for summary statistics pertaining to objects (strings) and vice versa.
Summarizing only numerical or object columns
- To call
describe()on just the numerical columns usedescribe(include = [np.number]) To call
describe()on just the objects (strings) usingdescribe(include = ['O']).In[2]: df.describe(include = [np.number]) Out[3]: $b count 5.000000 mean 2.000000 std 1.581139 min 0.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 4.000000 In[3]: df.describe(include = ['O']) Out[3]: $a count 5 unique 4 top a freq 2
pd.options.display.max_columns = DATA.shape[1] will work.
Here DATA is a 2d matrix, and above code will display stats vertically.