If you look at the documentation for pd.DataFrame.append
Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.
(emphasis mine).
Try
df_res = df_res.append(res)
Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:
all_res = []
for df in df_all:
for i in substr:
res = df[df['url'].str.contains(i)]
all_res.append(res)
df_res = pd.concat(all_res)
This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.
Answer from Ami Tavory on Stack OverflowIf you look at the documentation for pd.DataFrame.append
Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.
(emphasis mine).
Try
df_res = df_res.append(res)
Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:
all_res = []
for df in df_all:
for i in substr:
res = df[df['url'].str.contains(i)]
all_res.append(res)
df_res = pd.concat(all_res)
This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.
Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?
pandas >= 2.0 append has been removed, use pd.concat instead1
Starting from pandas 2.0, append has been removed from the API. It was previously deprecated in version 1.4. See the docs on Deprecations as well as this github issue that originally proposed its deprecation.
The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.
1. This assume you're appending one DataFrame to another. If you're appending a row to a DataFrame, the solution is slightly different - see below.
The idiomatic way to append DataFrames is to collect all your smaller DataFrames into a list, and then make one single call to pd.concat. Here's a(n oversimplified) example
df_list = []
for df in some_function_that_yields_dfs():
df_list.append(df)
final_df = pd.concat(df_list)
Note that if you are trying to append one row at a time rather than one DataFrame at a time, the solution is even simpler.
data = []
for a, b, c from some_function_that_yields_data():
data.append([a, b, c])
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
More information in Creating an empty Pandas DataFrame, and then filling it?
python - Pandas merge two dataframes with different columns - Stack Overflow
python - How to append two pandas.DataFrame with different numbers of columns - Stack Overflow
data cleaning - Append Existing Columns to another Column in Pandas Dataframe - Data Science Stack Exchange
python - Append column to pandas dataframe - Stack Overflow
Videos
I think in this case concat is what you want:
In [12]:
pd.concat([df,df1], axis=0, ignore_index=True)
Out[12]:
attr_1 attr_2 attr_3 id quantity
0 0 1 NaN 1 20
1 1 1 NaN 2 23
2 1 1 NaN 3 19
3 0 0 NaN 4 19
4 1 NaN 0 5 8
5 0 NaN 1 6 13
6 1 NaN 1 7 20
7 1 NaN 1 8 25
by passing axis=0 here you are stacking the df's on top of each other which I believe is what you want then producing NaN value where they are absent from their respective dfs.
The accepted answer will break if there are duplicate headers:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects.
For example, here A has 3x trial columns, which prevents concat:
A = pd.DataFrame([[3, 1, 4, 1]], columns=['id', 'trial', 'trial', 'trial'])
# id trial trial trial
# 0 3 1 4 1
B = pd.DataFrame([[5, 9], [2, 6]], columns=['id', 'trial'])
# id trial
# 0 5 9
# 1 2 6
A_B = pd.concat([A, B], ignore_index=True)
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects
To fix this, deduplicate the column names before you concat:
pandas 2.0+
for df in [A, B]: df.columns = pd.io.common.dedup_names(df.columns, is_potential_multiindex=False) A_B = pd.concat([A, B], ignore_index=True) # id trial trial.1 trial.2 # 0 3 1 4 1 # 1 5 9 NaN NaN # 2 2 6 NaN NaNpandas < 2.0
parser = pd.io.parsers.base_parser.ParserBase({'usecols': None}) for df in [A, B]: df.columns = parser._maybe_dedup_names(df.columns) A_B = pd.concat([A, B], ignore_index=True) # id trial trial.1 trial.2 # 0 3 1 4 1 # 1 5 9 NaN NaN # 2 2 6 NaN NaNpandas < 1.3
parser = pd.io.parsers.ParserBase({}) for df in [A, B]: df.columns = parser._maybe_dedup_names(df.columns) A_B = pd.concat([A, B], ignore_index=True) # id trial trial.1 trial.2 # 0 3 1 4 1 # 1 5 9 NaN NaN # 2 2 6 NaN NaN