pandas to_sql upsert postgres github

How to upsert pandas DataFrame to PostgreSQL table?

stackoverflow.com › questions › 61366664 › how-to-upsert-pandas-dataframe-to-postgresql-table

Update: You can save yourself some typing by using this method.

If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

Answer from Gord Thompson on Stack Overflow

GitHub

github.com › ThibTrip › pangres

GitHub - ThibTrip/pangres: SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features · GitHub

SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features - ThibTrip/pangres

Starred by 234 users

Forked by 15 users

Languages Python

GitHub

github.com › ryanbaumann › Pandas-to_sql-upsert

GitHub - ryanbaumann/Pandas-to_sql-upsert: Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory · GitHub

The goal of this library is to extend the Python Pandas to_sql() function to be: Muti-threaded (improving time-to-insert on large datasets) Allow the to_sql() command to run an 'insert if does not exist' to the database ...

Starred by 84 users

Forked by 16 users

Languages Jupyter Notebook 67.7% | Python 32.3%

Discussions

I made a Pandas.to_sql_upsert()

Do you have any performance tests on it? More on reddit.com

r/dataengineering

December 28, 2024

Faster loading of Dataframes from Pandas to Postgres

I believe odo implements this kind of approach. More on reddit.com

r/Python

May 3, 2017

GitHub

gist.github.com › Nikolay-Lysenko › 0887f4b59dc4914cec9b236c317d06e3

Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database · GitHub

Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database - upsert_from_pandas_to_postgres.py

GitHub

gist.github.com › raaghulr › 5eddd5b9b2a97b7bf9549e53186570d5

Allow upserting a pandas dataframe to a postgres table (equivalent to df.to_sql(..., if_exists='update') · GitHub

Allow upserting a pandas dataframe to a postgres table (equivalent to df.to_sql(..., if_exists='update') - upsert_df.py

Stack Overflow

stackoverflow.com › questions › 61366664 › how-to-upsert-pandas-dataframe-to-postgresql-table

python - How to upsert pandas DataFrame to PostgreSQL table? - Stack Overflow

Top answer

1 of 6

Update: You can save yourself some typing by using this method.

If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

import sqlalchemy as sa

# …

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.exec_driver_sql(
        "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        INSERT INTO main_table (id, txt) 
        SELECT id, txt FROM temp_table
        ON CONFLICT (id) DO
            UPDATE SET txt = EXCLUDED.txt
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

2 of 6

I have needed this so many times, I ended up creating a gist for it.

The function is below, it will create the table if it is the first time persisting the dataframe and will update the table if it already exists:

import pandas as pd
import sqlalchemy
import uuid
import os

def upsert_df(df: pd.DataFrame, table_name: str, engine: sqlalchemy.engine.Engine):
    """Implements the equivalent of pd.DataFrame.to_sql(..., if_exists='update')
    (which does not exist). Creates or updates the db records based on the
    dataframe records.
    Conflicts to determine update are based on the dataframes index.
    This will set unique keys constraint on the table equal to the index names
    1. Create a temp table from the dataframe
    2. Insert/update from temp table into table_name
    Returns: True if successful
    """

    # If the table does not exist, we should just use to_sql to create it
    if not engine.execute(
        f"""SELECT EXISTS (
            SELECT FROM information_schema.tables 
            WHERE  table_schema = 'public'
            AND    table_name   = '{table_name}');
            """
    ).first()[0]:
        df.to_sql(table_name, engine)
        return True

    # If it already exists...
    temp_table_name = f"temp_{uuid.uuid4().hex[:6]}"
    df.to_sql(temp_table_name, engine, index=True)

    index = list(df.index.names)
    index_sql_txt = ", ".join([f'"{i}"' for i in index])
    columns = list(df.columns)
    headers = index + columns
    headers_sql_txt = ", ".join(
        [f'"{i}"' for i in headers]
    )  # index1, index2, ..., column 1, col2, ...

    # col1 = exluded.col1, col2=excluded.col2
    update_column_stmt = ", ".join([f'"{col}" = EXCLUDED."{col}"' for col in columns])

    # For the ON CONFLICT clause, postgres requires that the columns have unique constraint
    query_pk = f"""
    ALTER TABLE "{table_name}" DROP CONSTRAINT IF EXISTS unique_constraint_for_upsert;
    ALTER TABLE "{table_name}" ADD CONSTRAINT unique_constraint_for_upsert UNIQUE ({index_sql_txt});
    """
    engine.execute(query_pk)

    # Compose and execute upsert query
    query_upsert = f"""
    INSERT INTO "{table_name}" ({headers_sql_txt}) 
    SELECT {headers_sql_txt} FROM "{temp_table_name}"
    ON CONFLICT ({index_sql_txt}) DO UPDATE 
    SET {update_column_stmt};
    """
    engine.execute(query_upsert)
    engine.execute(f"DROP TABLE {temp_table_name}")

    return True

reddit.com › r/dataengineering › i made a pandas.to_sql_upsert()

r/dataengineering on Reddit: I made a Pandas.to_sql_upsert()

December 28, 2024 -

Hi guys. I made a Pandas.to_sql() upsert that uses the same syntax as Pandas.to_sql(), but allows you to upsert based on unique column(s): https://github.com/vile319/sql_upsert

This is incredibly useful to me for scraping multiple times daily with a live baseball database. The only thing is, I would prefer if pandas had this built in to the package, and I did open a pull request about it, but I think they are too busy to care.

Maybe it is just a stupid idea? I would like to know your opinions on whether or not pandas should have upsert. I think my code handles it pretty well as a workaround, but I feel like Pandas could just do this as part of their package. Maybe I am just thinking about this all wrong?

Not sure if this is the wrong subreddit to post this on. While this I guess is technically self promotion, I would much rather delete my package in exchange for pandas adopting any equivalent.

Top answer

1 of 5

Do you have any performance tests on it?

2 of 5

why not just duckdb and call for a sql merge?

GitHub

github.com › ryanbaumann › Pandas-to_sql-upsert › blob › master › Pandas_tosql_upsert.ipynb

Pandas-to_sql-upsert/Pandas_tosql_upsert.ipynb at master · ryanbaumann/Pandas-to_sql-upsert

"DB_TYPE = 'postgresql'\n", "DB_DRIVER = 'psycopg2'\n", "DB_USER = 'admin'\n", "DB_PASS = 'password'\n", "DB_HOST = 'localhost'\n", "DB_PORT = '5432'\n", "DB_NAME = 'pandas_upsert'\n", "POOL_SIZE = 50\n", "### Config update complete ###\n", "SQLALCHEMY_DATABASE_URI = '%s+%s://%s:%s@%s:%s/%s' %(DB_TYPE, DB_DRIVER, DB_USER,\n", " DB_PASS, DB_HOST, DB_PORT, DB_NAME)\n", "#Add more threads to the pool for execution\n", "engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_size=POOL_SIZE, max_overflow=0)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false ·

Author ryanbaumann

Readthedocs

aws-sdk-pandas.readthedocs.io › en › 3.2.1 › stubs › awswrangler.postgresql.to_sql.html

awswrangler.postgresql.to_sql — AWS SDK for pandas 3.2.1 documentation

AWS SDK for pandas 3.2.1 · About · Install · At Scale · Tutorials · API Reference · License · Contribute · GitHub · awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', index: bool = False, dtype: Dict[str, str] | None = None, varchar_lengths: Dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: List[str] | None = None, insert_conflict_columns: List[str] | None = None) → None¶ ·

Find elsewhere

Google Bing Mojeek

GitHub

github.com › ThibTrip › pangres › wiki › Aupsert

Aupsert

SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features - ThibTrip/pangres

Author ThibTrip

GitHub

github.com › ryanbaumann › Pandas-to_sql-upsert › blob › master › readme.md

Pandas-to_sql-upsert/readme.md at master · ryanbaumann/Pandas-to_sql-upsert

Author ryanbaumann

Minwook-shin

minwook-shin.github.io › pandas-dataframe-to-sql-upsert

Pandas to_sql 메소드로 PostgreSQL Upsert 구현하기

January 23, 2022 - 오늘은 Pandas dataframe 데이터로 PostgreSQL 데이터베이스에 Upsert 작업을 해보려고 합니다. PostgreSQL 데이터베이스는 로컬에 따로 띄어놓고 작업해보겠습니다. (현 시점 22년 1월 23일 기준으로) Pandas에서 제공하는 to_sql 메소드에서는 각 행마다 데이터베이스의 PK나 유니크 제약조건이 충돌날 때 업데이트하는 로직을 바로 사용할 수 없다고 알고있습니다.

GitHub

gist.github.com › gordthompson › ae7a1528fde1c00c03fdbb5c53c8f90f

Build a PostgreSQL INSERT … ON CONFLICT statement and upsert a DataFrame

Build a PostgreSQL INSERT … ON CONFLICT statement and upsert a DataFrame - postgresql_df_upsert.py

Readthedocs

aws-sdk-pandas.readthedocs.io › en › 3.10.1 › stubs › awswrangler.postgresql.to_sql.html

awswrangler.postgresql.to_sql — AWS SDK for pandas 3.10.1 documentation

AWS SDK for pandas 3.10.1 · About · Install · At Scale · Tutorials · API Reference · License · Contribute · GitHub · awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'truncate cascade'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶ ·

GitHub

github.com › reachanu21 › Pandas-to_sql-upsert

GitHub - reachanu21/Pandas-to_sql-upsert: Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory

Author reachanu21

PyPI

pypi.org › project › pangres

pangres · PyPI

Upsert with pandas DataFrames (ON CONFLICT DO NOTHING or ON CONFLICT DO UPDATE) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with primary keys or unique keys.

      » pip install pangres

Published Nov 05, 2023

Version 4.2.1

Homepage https://github.com/ThibTrip/pangres

GitHub

github.com › Ianphorsman › PandasSqlWrapper

GitHub - Ianphorsman/PandasSqlWrapper: Provides upsert and schema updating capabilities and wraps basic functionality expected when communicating between dataframes and sql tables.

sql_data = PandasSQLWrapper( ... database to communicate back performed actions ) Performs an upsert on a sql table and updates table schema by adding columns if necessary....

Author Ianphorsman

GitHub

github.com › ryanbaumann › Pandas-to_sql-upsert › blob › master › to_sql_newrows.py

Pandas-to_sql-upsert/to_sql_newrows.py at master · ryanbaumann/Pandas-to_sql-upsert

May 2, 2016 - DB_TYPE = 'postgresql' DB_DRIVER = 'psycopg2' DB_USER = 'admin' DB_PASS = 'password' DB_HOST = 'localhost' DB_PORT = '5432' DB_NAME = 'pandas_upsert' POOL_SIZE = 50 · TABLENAME = 'test_upsert' SQLALCHEMY_DATABASE_URI = '%s+%s://%s:%s@%s:%s/%s' % (DB_TYPE, DB_DRIVER, DB_USER, DB_PASS, DB_HOST, DB_PORT, DB_NAME) ENGINE = create_engine( SQLALCHEMY_DATABASE_URI, pool_size=POOL_SIZE, max_overflow=0) ·

Author ryanbaumann

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.2 documentation

>>> df3 = pd.DataFrame({"name": ['User 8', 'User 9']}) >>> df3.to_sql(name='users', con=engine, if_exists='delete_rows', ... index_label='id') 2 >>> with engine.connect() as conn: ... conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 8'), (1, 'User 9')] Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.

Medium

medium.com › @kennethhughesa › optimization-of-upsert-methods-in-postgresql-python-ac11b8471494

Optimization of Upsert Methods in PostgreSQL/Python | by Kenny Hughes | Medium

June 5, 2022 - I then would run a SQL DELETE statement to remove deprecated data. What I found was that when the Upsert was performed from within the database engine a significant compute advantage took place. However, when the Upsert procedure involved transferring data from Python to the database engine the ingestion time was well below compute and time performance standards (the data ingestion script was going to run on an Github Action Virtual Machine Runner).

Readthedocs

aws-sdk-pandas.readthedocs.io › en › stable › stubs › awswrangler.postgresql.to_sql.html

awswrangler.postgresql.to_sql — AWS SDK for pandas 3.14.0 documentation

AWS SDK for pandas 3.14.0 · About · Install · At Scale · Tutorials · API Reference · License · Contribute · GitHub · awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'truncate cascade'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) → None¶ ·