Update: You can save yourself some typing by using this method.


If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]
Answer from Gord Thompson on Stack Overflow
Top answer
1 of 6
24

Update: You can save yourself some typing by using this method.


If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]
2 of 6
18

I have needed this so many times, I ended up creating a gist for it.

The function is below, it will create the table if it is the first time persisting the dataframe and will update the table if it already exists:

import pandas as pd
import sqlalchemy
import uuid
import os

def upsert_df(df: pd.DataFrame, table_name: str, engine: sqlalchemy.engine.Engine):
    """Implements the equivalent of pd.DataFrame.to_sql(..., if_exists='update')
    (which does not exist). Creates or updates the db records based on the
    dataframe records.
    Conflicts to determine update are based on the dataframes index.
    This will set unique keys constraint on the table equal to the index names
    1. Create a temp table from the dataframe
    2. Insert/update from temp table into table_name
    Returns: True if successful
    """

    # If the table does not exist, we should just use to_sql to create it
    if not engine.execute(
        f"""SELECT EXISTS (
            SELECT FROM information_schema.tables 
            WHERE  table_schema = 'public'
            AND    table_name   = '{table_name}');
            """
    ).first()[0]:
        df.to_sql(table_name, engine)
        return True

    # If it already exists...
    temp_table_name = f"temp_{uuid.uuid4().hex[:6]}"
    df.to_sql(temp_table_name, engine, index=True)

    index = list(df.index.names)
    index_sql_txt = ", ".join([f'"{i}"' for i in index])
    columns = list(df.columns)
    headers = index + columns
    headers_sql_txt = ", ".join(
        [f'"{i}"' for i in headers]
    )  # index1, index2, ..., column 1, col2, ...

    # col1 = exluded.col1, col2=excluded.col2
    update_column_stmt = ", ".join([f'"{col}" = EXCLUDED."{col}"' for col in columns])

    # For the ON CONFLICT clause, postgres requires that the columns have unique constraint
    query_pk = f"""
    ALTER TABLE "{table_name}" DROP CONSTRAINT IF EXISTS unique_constraint_for_upsert;
    ALTER TABLE "{table_name}" ADD CONSTRAINT unique_constraint_for_upsert UNIQUE ({index_sql_txt});
    """
    engine.execute(query_pk)

    # Compose and execute upsert query
    query_upsert = f"""
    INSERT INTO "{table_name}" ({headers_sql_txt}) 
    SELECT {headers_sql_txt} FROM "{temp_table_name}"
    ON CONFLICT ({index_sql_txt}) DO UPDATE 
    SET {update_column_stmt};
    """
    engine.execute(query_upsert)
    engine.execute(f"DROP TABLE {temp_table_name}")

    return True
Discussions

I made a Pandas.to_sql_upsert()
Do you have any performance tests on it? More on reddit.com
🌐 r/dataengineering
37
62
December 28, 2024
python - Insert into postgreSQL table from pandas with "on conflict" update - Stack Overflow
Copy df.to_sql('my_table_name', ... method=postgres_upsert) And that's it. The upsert works. ... Yes.... yes this is like magic 2024-05-01T02:30:28.677Z+00:00 ... Save this answer. Show activity on this post. To follow up on Brendan's answer with an example, this is what worked for me: Copyimport os import sqlalchemy as sa import pandas as pd from ... More on stackoverflow.com
🌐 stackoverflow.com
perform upsert operation on postgres like pandas to_sql function using python - Stack Overflow
Before asking this question, I have read many links about UPSERT operation on Postgres: PostgreSQL Upsert Using INSERT ON CONFLICT statement Anyway to Upsert database using PostgreSQL in Python B... More on stackoverflow.com
🌐 stackoverflow.com
Pandas to SQL DB
Rob Mulla did this exact thing in his latest video titled “SQL databases with Pandas and python - A Complete guide”. More on reddit.com
🌐 r/dataengineering
21
23
July 17, 2023
🌐
GitHub
github.com › ryanbaumann › Pandas-to_sql-upsert
GitHub - ryanbaumann/Pandas-to_sql-upsert: Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory · GitHub
The goal of this library is to extend the Python Pandas to_sql() function to be: Muti-threaded (improving time-to-insert on large datasets) Allow the to_sql() command to run an 'insert if does not exist' to the database ...
Starred by 84 users
Forked by 16 users
Languages   Jupyter Notebook 67.7% | Python 32.3%
🌐
Readthedocs
aws-sdk-pandas.readthedocs.io › en › 3.2.1 › stubs › awswrangler.postgresql.to_sql.html
awswrangler.postgresql.to_sql — AWS SDK for pandas 3.2.1 documentation
awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', index: bool = False, dtype: Dict[str, str] | None = None, varchar_lengths: Dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: List[str] | None = None, insert_conflict_columns: List[str] | None = None) → None¶ · Write records stored in a DataFrame into PostgreSQL. ... This function has arguments which can be configured globally through wr.config or environment variables: ... Check out the Global Configurations Tutorial for details. ... df (pandas.DataFrame) – Pandas DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.1 documentation
>>> df3 = pd.DataFrame({"name": ['User 8', 'User 9']}) >>> df3.to_sql(name='users', con=engine, if_exists='delete_rows', ... index_label='id') 2 >>> with engine.connect() as conn: ... conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 8'), (1, 'User 9')] Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.
🌐
GitHub
github.com › ThibTrip › pangres
GitHub - ThibTrip/pangres: SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features · GitHub
Upsert with pandas DataFrames (ON CONFLICT DO NOTHING or ON CONFLICT DO UPDATE) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with primary keys or unique keys.
Starred by 234 users
Forked by 15 users
Languages   Python
🌐
PyPI
pypi.org › project › pangres
pangres · PyPI
Upsert with pandas DataFrames (ON CONFLICT DO NOTHING or ON CONFLICT DO UPDATE) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with primary keys or unique keys.
      » pip install pangres
    
Published   Nov 05, 2023
Version   4.2.1
🌐
Reddit
reddit.com › r/dataengineering › i made a pandas.to_sql_upsert()
r/dataengineering on Reddit: I made a Pandas.to_sql_upsert()
December 28, 2024 -

Hi guys. I made a Pandas.to_sql() upsert that uses the same syntax as Pandas.to_sql(), but allows you to upsert based on unique column(s): https://github.com/vile319/sql_upsert

This is incredibly useful to me for scraping multiple times daily with a live baseball database. The only thing is, I would prefer if pandas had this built in to the package, and I did open a pull request about it, but I think they are too busy to care.

Maybe it is just a stupid idea? I would like to know your opinions on whether or not pandas should have upsert. I think my code handles it pretty well as a workaround, but I feel like Pandas could just do this as part of their package. Maybe I am just thinking about this all wrong?

Not sure if this is the wrong subreddit to post this on. While this I guess is technically self promotion, I would much rather delete my package in exchange for pandas adopting any equivalent.

Find elsewhere
🌐
GitHub
gist.github.com › Nikolay-Lysenko › 0887f4b59dc4914cec9b236c317d06e3
Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database · GitHub
Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database - upsert_from_pandas_to_postgres.py
🌐
Medium
medium.com › @kennethhughesa › optimization-of-upsert-methods-in-postgresql-python-ac11b8471494
Optimization of Upsert Methods in PostgreSQL/Python | by Kenny Hughes | Medium
June 5, 2022 - The ETF holdings data was extracted from a webservice endpoint that returned the data in a structured csv file, by which I would convert to a Pandas DataFrame. After various stages of data cleansing in Python the end DataFrame to ingest was ~500,000 rows long. The ingestion process required records to be updated, inserted, and deleted. Given that I would be conducting two different operations concurrently (update/insert) I opted to use a SQL Upsert statement. On some non PostgreSQL database engines an UPSERT command can be used.
🌐
Readthedocs
aws-sdk-pandas.readthedocs.io › en › stable › stubs › awswrangler.postgresql.to_sql.html
awswrangler.postgresql.to_sql — AWS SDK for pandas 3.14.0 documentation
AWS SDK for pandas 3.14.0 · About · Install · At Scale · Tutorials · API Reference · License · Contribute · GitHub · awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'truncate cascade'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶ ·
🌐
Readthedocs
aws-sdk-pandas.readthedocs.io › en › 3.10.1 › stubs › awswrangler.postgresql.to_sql.html
awswrangler.postgresql.to_sql — AWS SDK for pandas 3.10.1 documentation
AWS SDK for pandas 3.10.1 · About · Install · At Scale · Tutorials · API Reference · License · Contribute · GitHub · awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'truncate cascade'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶ ·
🌐
ML in a Nutshell
blog.alexparunov.com › upserting-update-and-insert-with-pandas
Upserting (Update & Insert) With Pandas - ML in a Nutshell
August 21, 2021 - Those who have been working with pandas and wanted to insert DataFrame values into Relational Database (Postgres, MySQL, etc.), most likely faced the problem of conflicting rows. This conflict occurs when statement tries to insert values with duplicated primary key column. Relational Database offers solution to this with its ON CONFLICT DO UPDATE SET column=EXCLUDED.column command that updates the rows with newly inserted data, while maintaining uniqueness constraint of primary key. Unfortunately, the pandas's to_sql(...) method doesn't haven this capability built in by default.
Top answer
1 of 5
36

I know it's an old thread, but I ran into the same issue and this thread showed up in Google. None of the answers is really satisfying yet, so I here's what I came up with:

My solution is pretty similar to zdgriffith's answer, but much more performant as there's no need to iterate over data_iter:

Copydef postgres_upsert(table, conn, keys, data_iter):
    from sqlalchemy.dialects.postgresql import insert

    data = [dict(zip(keys, row)) for row in data_iter]

    insert_statement = insert(table.table).values(data)
    upsert_statement = insert_statement.on_conflict_do_update(
        constraint=f"{table.table.name}_pkey",
        set_={c.key: c for c in insert_statement.excluded},
    )
    conn.execute(upsert_statement)

Now you can use this custom upsert method in pandas' to_sql method like zdgriffith showed.

Please note that my upsert function uses the primary key constraint of the table. You can target another constraint by changing the constraint argument of .on_conflict_do_update.

This SO answer on a related thread explains the use of .excluded a bit more: https://stackoverflow.com/a/51935542/7066758

2 of 5
19

@ SaturnFromTitan, thanks for the reply to this old thread. That worked like magic. I would upvote, but I don't have the rep.

For those that are as new to all this as I am: You can cut and paste SaturnFromTitan answer and call it with something like:

Copy    df.to_sql('my_table_name', 
              dbConnection,schema='my_schema',
              if_exists='append',
              index=False,
              method=postgres_upsert)  

And that's it. The upsert works.

🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation
>>> df3 = pd.DataFrame({"name": ['User 8', 'User 9']}) >>> df3.to_sql(name='users', con=engine, if_exists='delete_rows', ... index_label='id') 2 >>> with engine.connect() as conn: ... conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 8'), (1, 'User 9')] Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.
🌐
PyPI
pypi.org › project › pandabase
Client Challenge
JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
Readthedocs
sqlify.readthedocs.io › en › latest › pandas_example.html
Benchmark/Example: pandas DataFrame COPY and UPSERT — pgreaper 1.0.0 documentation
The resulting SQL table should have 4 text, 1 bigint, and 1 jsonb column. ... And for the moment of truth... ... Suppose now that we live in such an amazing economy that everybody past 50 has enough money to retire. This means we’ll need to update our data to reflect this. As you can see for yourself, this operation will affect about 160,000 rows. ... Apparently it only takes Python about 2.5 seconds to create the 160,000 row UPSERT statement (which includes properly encoding dicts, escaping quotes, and so on).
🌐
Stack Overflow
stackoverflow.com › questions › 70313318 › perform-upsert-operation-on-postgres-like-pandas-to-sql-function-using-python
perform upsert operation on postgres like pandas to_sql function using python - Stack Overflow
stackoverflow.com/questions/61366664/… interesting take here, they create temp table with to_sql() pandas and then with a query they run the nurmal upsert command. ... I have written a very generic code that performs UPSERT which is not supported officially in Postgres (until December 2021), using Pandas dataframe and in an efficient way.
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-insert-a-pandas-dataframe-to-an-existing-postgresql-table
How to insert a pandas DataFrame to an existing PostgreSQL table? - GeeksforGeeks
July 23, 2025 - The create_engine() function takes ... using the method pandas.DataFrame() method. The to_sql() method is used to insert a pandas data frame into the Postgresql table....
🌐
Readthedocs
aws-sdk-pandas.readthedocs.io › en › 3.8.0 › stubs › awswrangler.postgresql.to_sql.html
awswrangler.postgresql.to_sql — AWS SDK for pandas 3.8.0 documentation
awswrangler.postgresql.to_sql(df: pd.DataFrame, con: pg8000.Connection, table: str, schema: str, mode: _ToSqlModeLiteral = 'append', overwrite_method: _ToSqlOverwriteModeLiteral = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶