pandas to_sql upsert postgres

How to upsert pandas DataFrame to PostgreSQL table?

stackoverflow.com › questions › 61366664 › how-to-upsert-pandas-dataframe-to-postgresql-table

Update: You can save yourself some typing by using this method.

If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

Answer from Gord Thompson on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 61366664 › how-to-upsert-pandas-dataframe-to-postgresql-table

python - How to upsert pandas DataFrame to PostgreSQL table? - Stack Overflow

Top answer

1 of 6

Update: You can save yourself some typing by using this method.

If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

2 of 6

I have needed this so many times, I ended up creating a gist for it.

The function is below, it will create the table if it is the first time persisting the dataframe and will update the table if it already exists:

import pandas as pd
import sqlalchemy
import uuid
import os

def upsert_df(df: pd.DataFrame, table_name: str, engine: sqlalchemy.engine.Engine):
    """Implements the equivalent of pd.DataFrame.to_sql(..., if_exists='update')
    (which does not exist). Creates or updates the db records based on the
    dataframe records.
    Conflicts to determine update are based on the dataframes index.
    This will set unique keys constraint on the table equal to the index names
    1. Create a temp table from the dataframe
    2. Insert/update from temp table into table_name
    Returns: True if successful
    """

    # If the table does not exist, we should just use to_sql to create it
    if not engine.execute(
        f"""SELECT EXISTS (
            SELECT FROM information_schema.tables 
            WHERE  table_schema = 'public'
            AND    table_name   = '{table_name}');
            """
    ).first()[0]:
        df.to_sql(table_name, engine)
        return True

    # If it already exists...
    temp_table_name = f"temp_{uuid.uuid4().hex[:6]}"
    df.to_sql(temp_table_name, engine, index=True)

    index = list(df.index.names)
    index_sql_txt = ", ".join([f'"{i}"' for i in index])
    columns = list(df.columns)
    headers = index + columns
    headers_sql_txt = ", ".join(
        [f'"{i}"' for i in headers]
    )  # index1, index2, ..., column 1, col2, ...

    # col1 = exluded.col1, col2=excluded.col2
    update_column_stmt = ", ".join([f'"{col}" = EXCLUDED."{col}"' for col in columns])

    # For the ON CONFLICT clause, postgres requires that the columns have unique constraint
    query_pk = f"""
    ALTER TABLE "{table_name}" DROP CONSTRAINT IF EXISTS unique_constraint_for_upsert;
    ALTER TABLE "{table_name}" ADD CONSTRAINT unique_constraint_for_upsert UNIQUE ({index_sql_txt});
    """
    engine.execute(query_pk)

    # Compose and execute upsert query
    query_upsert = f"""
    INSERT INTO "{table_name}" ({headers_sql_txt}) 
    SELECT {headers_sql_txt} FROM "{temp_table_name}"
    ON CONFLICT ({index_sql_txt}) DO UPDATE 
    SET {update_column_stmt};
    """
    engine.execute(query_upsert)
    engine.execute(f"DROP TABLE {temp_table_name}")

    return True

GitHub

gist.github.com › raaghulr › 5eddd5b9b2a97b7bf9549e53186570d5

Allow upserting a pandas dataframe to a postgres table (equivalent to df.to_sql(..., if_exists='update') · GitHub

Allow upserting a pandas dataframe to a postgres table (equivalent to df.to_sql(..., if_exists='update') - upsert_df.py

Discussions

I made a Pandas.to_sql_upsert()

Do you have any performance tests on it? More on reddit.com

r/dataengineering

December 28, 2024

python - Insert into postgreSQL table from pandas with "on conflict" update - Stack Overflow

Copy df.to_sql('my_table_name', ... method=postgres_upsert) And that's it. The upsert works. ... Yes.... yes this is like magic 2024-05-01T02:30:28.677Z+00:00 ... Save this answer. Show activity on this post. To follow up on Brendan's answer with an example, this is what worked for me: Copyimport os import sqlalchemy as sa import pandas as pd from ... More on stackoverflow.com

stackoverflow.com

perform upsert operation on postgres like pandas to_sql function using python - Stack Overflow

Before asking this question, I have read many links about UPSERT operation on Postgres: PostgreSQL Upsert Using INSERT ON CONFLICT statement Anyway to Upsert database using PostgreSQL in Python B... More on stackoverflow.com

stackoverflow.com

Pandas to SQL DB

Rob Mulla did this exact thing in his latest video titled “SQL databases with Pandas and python - A Complete guide”. More on reddit.com

r/dataengineering

July 17, 2023

GitHub

github.com › ryanbaumann › Pandas-to_sql-upsert

GitHub - ryanbaumann/Pandas-to_sql-upsert: Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory · GitHub

The goal of this library is to extend the Python Pandas to_sql() function to be: Muti-threaded (improving time-to-insert on large datasets) Allow the to_sql() command to run an 'insert if does not exist' to the database ...

Starred by 84 users

Forked by 16 users

Languages Jupyter Notebook 67.7% | Python 32.3%

Readthedocs

aws-sdk-pandas.readthedocs.io › en › 3.2.1 › stubs › awswrangler.postgresql.to_sql.html

awswrangler.postgresql.to_sql — AWS SDK for pandas 3.2.1 documentation

awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', index: bool = False, dtype: Dict[str, str] | None = None, varchar_lengths: Dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: List[str] | None = None, insert_conflict_columns: List[str] | None = None) → None¶ · Write records stored in a DataFrame into PostgreSQL. ... This function has arguments which can be configured globally through wr.config or environment variables: ... Check out the Global Configurations Tutorial for details. ... df (pandas.DataFrame) – Pandas DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.1 documentation

>>> df3 = pd.DataFrame({"name": ['User 8', 'User 9']}) >>> df3.to_sql(name='users', con=engine, if_exists='delete_rows', ... index_label='id') 2 >>> with engine.connect() as conn: ... conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 8'), (1, 'User 9')] Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.

GitHub

github.com › ThibTrip › pangres

GitHub - ThibTrip/pangres: SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features · GitHub

Upsert with pandas DataFrames (ON CONFLICT DO NOTHING or ON CONFLICT DO UPDATE) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with primary keys or unique keys.

Starred by 234 users

Forked by 15 users

Languages Python

PyPI

pypi.org › project › pangres

pangres · PyPI

      » pip install pangres

Published Nov 05, 2023

Version 4.2.1

Homepage https://github.com/ThibTrip/pangres

reddit.com › r/dataengineering › i made a pandas.to_sql_upsert()

r/dataengineering on Reddit: I made a Pandas.to_sql_upsert()

December 28, 2024 -

Hi guys. I made a Pandas.to_sql() upsert that uses the same syntax as Pandas.to_sql(), but allows you to upsert based on unique column(s): https://github.com/vile319/sql_upsert

This is incredibly useful to me for scraping multiple times daily with a live baseball database. The only thing is, I would prefer if pandas had this built in to the package, and I did open a pull request about it, but I think they are too busy to care.

Maybe it is just a stupid idea? I would like to know your opinions on whether or not pandas should have upsert. I think my code handles it pretty well as a workaround, but I feel like Pandas could just do this as part of their package. Maybe I am just thinking about this all wrong?

Not sure if this is the wrong subreddit to post this on. While this I guess is technically self promotion, I would much rather delete my package in exchange for pandas adopting any equivalent.

Top answer

1 of 5

I know it's an old thread, but I ran into the same issue and this thread showed up in Google. None of the answers is really satisfying yet, so I here's what I came up with:

My solution is pretty similar to zdgriffith's answer, but much more performant as there's no need to iterate over data_iter:

Copydef postgres_upsert(table, conn, keys, data_iter):
    from sqlalchemy.dialects.postgresql import insert

    data = [dict(zip(keys, row)) for row in data_iter]

    insert_statement = insert(table.table).values(data)
    upsert_statement = insert_statement.on_conflict_do_update(
        constraint=f"{table.table.name}_pkey",
        set_={c.key: c for c in insert_statement.excluded},
    )
    conn.execute(upsert_statement)

Now you can use this custom upsert method in pandas' to_sql method like zdgriffith showed.

Please note that my upsert function uses the primary key constraint of the table. You can target another constraint by changing the constraint argument of .on_conflict_do_update.

This SO answer on a related thread explains the use of .excluded a bit more: https://stackoverflow.com/a/51935542/7066758

2 of 5

@ SaturnFromTitan, thanks for the reply to this old thread. That worked like magic. I would upvote, but I don't have the rep.

For those that are as new to all this as I am: You can cut and paste SaturnFromTitan answer and call it with something like:

Copy    df.to_sql('my_table_name', 
              dbConnection,schema='my_schema',
              if_exists='append',
              index=False,
              method=postgres_upsert)

And that's it. The upsert works.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation

PyPI

pypi.org › project › pandabase

Client Challenge

JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Readthedocs

sqlify.readthedocs.io › en › latest › pandas_example.html

Benchmark/Example: pandas DataFrame COPY and UPSERT — pgreaper 1.0.0 documentation

The resulting SQL table should have 4 text, 1 bigint, and 1 jsonb column. ... And for the moment of truth... ... Suppose now that we live in such an amazing economy that everybody past 50 has enough money to retire. This means we’ll need to update our data to reflect this. As you can see for yourself, this operation will affect about 160,000 rows. ... Apparently it only takes Python about 2.5 seconds to create the 160,000 row UPSERT statement (which includes properly encoding dicts, escaping quotes, and so on).

Stack Overflow

stackoverflow.com › questions › 70313318 › perform-upsert-operation-on-postgres-like-pandas-to-sql-function-using-python

perform upsert operation on postgres like pandas to_sql function using python - Stack Overflow

stackoverflow.com/questions/61366664/… interesting take here, they create temp table with to_sql() pandas and then with a query they run the nurmal upsert command. ... I have written a very generic code that performs UPSERT which is not supported officially in Postgres (until December 2021), using Pandas dataframe and in an efficient way.

GeeksforGeeks

geeksforgeeks.org › python › how-to-insert-a-pandas-dataframe-to-an-existing-postgresql-table

How to insert a pandas DataFrame to an existing PostgreSQL table? - GeeksforGeeks

July 23, 2025 - The create_engine() function takes ... using the method pandas.DataFrame() method. The to_sql() method is used to insert a pandas data frame into the Postgresql table....

Readthedocs

aws-sdk-pandas.readthedocs.io › en › 3.8.0 › stubs › awswrangler.postgresql.to_sql.html

awswrangler.postgresql.to_sql — AWS SDK for pandas 3.8.0 documentation

awswrangler.postgresql.to_sql(df: pd.DataFrame, con: pg8000.Connection, table: str, schema: str, mode: _ToSqlModeLiteral = 'append', overwrite_method: _ToSqlOverwriteModeLiteral = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶