You can use the assign function:
df = df.assign(industry='yyy')
Answer from Mina HE on Stack OverflowYou can use the assign function:
df = df.assign(industry='yyy')
Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, df is really just a stand-in for the rows stored in the df_all object: a new object is NOT created in memory.
To avoid these issues altogether, I often have to remind myself to use the copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy function.
In your case, this should get rid of the warning message:
from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'
EDIT: Also see David M.'s excellent comment below!
df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'
Videos
If you only need to get list of unique values, you can just use unique method.
If you want to have Python's set, then do set(some_series)
In [1]: s = pd.Series([1, 2, 3, 1, 1, 4])
In [2]: s.unique()
Out[2]: array([1, 2, 3, 4])
In [3]: set(s)
Out[3]: {1, 2, 3, 4}
However, if you have DataFrame, just select series out of it ( some_data_frame['<col_name>'] ).
With large size series with duplicates the set(some_series) execution-time will evolve exponentially with series size.
Better practice would be to set(some_series.unique()).
If the column ranges are non-contiguous you will have to call set_column() for each range:
writer.sheets['Sheet1'].set_column('A:A', 15, my_format)
writer.sheets['Sheet1'].set_column('C:C', 15, my_format)
Note, to do this programmatically you can also use a numeric range:
for col in (0, 2):
writer.sheets['Sheet1'].set_column(col, col, 15, my_format)
Or you could reference columns like this:
for col in ('X', 'Z'):
writer.sheets['Sheet1'].set_column(col+':'+col, None, my_format)
Rename Specific Columns
Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:
Copydf = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example
Copydf = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:
Copydf2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:
Copydf.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
You can specify errors='raise' to raise errors if an invalid column-to-rename is specified.
Reassign Column Headers
Use df.set_axis() with axis=1.
Copydf2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
Headers can be assigned directly:
Copydf.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
Just assign it to the .columns attribute:
Copy>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20