pandas to_sql update if exists example

stackoverflow.com › questions › 31988322 › pandas-update-sql

Consider a temp table which would be exact replica of your final table, cleaned out with each run:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('temp_table', engine, if_exists='replace')

sql = """
    UPDATE final_table AS f
    SET col1 = t.col1
    FROM temp_table AS t
    WHERE f.id = t.id
"""

with engine.begin() as conn:     # TRANSACTION
    conn.execute(sql)

Answer from Parfait on Stack Overflow

GitHub

github.com › pandas-dev › pandas › issues › 14553

Adding (Insert or update if key exists) option to `.to_sql` · Issue #14553 · pandas-dev/pandas

September 13, 2016 - import pandas as pd from sqlalchemy import create_engine import sqlite3 conn = sqlite3.connect('example.db') c = conn.cursor() c.execute('''DROP TABLE IF EXISTS person_age;''') c.execute(''' CREATE TABLE person_age (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL) ''') conn.commit() conn.close() ##### Create original table engine = create_engine("sqlite:///example.db") sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]}) sql_df.to_sql('person_age', engine, if_exists='append', index=False) #### Extra data to insert/update extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]}) extra_data.set_index('id', inplace=True) #### extra_data.to_sql() with row update or insert option expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]}) expected_df.set_index('id', inplace=True)

Author cdagnino

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.1 documentation

) ... stmt = stmt.on_duplicate_key_update(b=stmt.inserted.b, c=stmt.inserted.c) ... result = conn.execute(stmt) ... return result.rowcount >>> df_conflict.to_sql(name="conflict_table", con=conn, if_exists="append", # noqa: F821 ... method=insert_on_conflict_update) 2 · Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers.

Stack Overflow

stackoverflow.com › questions › 31988322 › pandas-update-sql

python - Pandas update sql - Stack Overflow

Top answer

1 of 8

86

Consider a temp table which would be exact replica of your final table, cleaned out with each run:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('temp_table', engine, if_exists='replace')

sql = """
    UPDATE final_table AS f
    SET col1 = t.col1
    FROM temp_table AS t
    WHERE f.id = t.id
"""

with engine.begin() as conn:     # TRANSACTION
    conn.execute(sql)

2 of 8

15

It looks like you are using some external data stored in df for the conditions on updating your database table. If it is possible why not just do a one-line sql update?

If you are working with a smallish database (where loading the whole data to the python dataframe object isn't going to kill you) then you can definitely conditionally update the dataframe after loading it using read_sql. Then you can use a keyword arg if_exists="replace" to replace the DB table with the new updated table.

df = pandas.read_sql("select * from your_table;", engine)

#update information (update your_table set column = "new value" where column = "old value")
#still may need to iterate for many old value/new value pairs
df[df['column'] == "old value", "column"] = "new value"

#send data back to sql
df.to_sql("your_table", engine, if_exists="replace")

Pandas is a powerful tool, where limited SQL support was just a small feature at first. As time goes by people are trying to use pandas as their only database interface software. I don't think pandas was ever meant to be an end-all for database interaction, but there are a lot of people working on new features all the time. See: https://github.com/pandas-dev/pandas/issues

Top answer

1 of 4

3

This, probably: https://mariadb.com/kb/en/insert-on-duplicate-key-update/

2 of 4

3

For future reference: what you want is usually called an upsert (update/insert). Most modern databases have something like that, but there no standard naming convention yet. For MariaDB, u/ireadyourmedrecord has already given you the answer

Python.org

discuss.python.org › python help

Panda dataframe update existing table with to_sql - Python Help - Discussions on Python.org

May 30, 2024 - Hi I am new to Python and is trying to make my first python application. But I have some problem with panda. How do I update an existing table with panda dataframe with out getting duplicate errors saying key already e…

reddit.com › r/learnpython › pandas to_sql(if_exists=update?)

r/learnpython on Reddit: pandas to_sql(if_exists=update?)

August 17, 2017 -

Is there an option in pandas to update existing records instead of recreating the table every time? My dataframes come from several users spreadsheets and i'd like to not blow away others data if one of their spreadsheets has moved changed and isn't picked up by Python.

Top answer

1 of 2

2

I haven't used pandas personally but if you are using sql syntax then there should be the option of not creating table if the table exists create table if not exists [tablename].

Also to update existing records check for the record in the database first before proceeding with either an insert or update

select * from [tablename] where [name] == [the existing record name]

If there is no result then you can insert and if there is then use the sql update command instead

2 of 2

1

I think you would want to_sql(..., if_exists='append')

append: If table exists, insert data. Create if does not exist.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation

) ... stmt = stmt.on_duplicate_key_update(b=stmt.inserted.b, c=stmt.inserted.c) ... result = conn.execute(stmt) ... return result.rowcount >>> df_conflict.to_sql(name="conflict_table", con=conn, if_exists="append", # noqa: F821 ... method=insert_on_conflict_update) 2 · Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers.

Kanaries

docs.kanaries.net › topics › Pandas › pandas-to-sql

Optimizing SQL Queries in Pandas: Pandas to SQL Made Easy! – Kanaries

July 4, 2023 - You can do this using the fillna() ... way to append records to an existing table in a SQL database. To do this, you simply need to set the if_exists parameter to 'append':...

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 40223927 › insert-or-update-if-exists-in-mysql-using-pandas

python 3.x - Insert or update if exists in mysql using pandas - Stack Overflow

Top answer

1 of 3

10

I can think of two options, but number 1 might be cleaner/faster:

1) Make SQL decide on the update/insert. Check this other question. You can iterate by rows of your 'df', from i=1 to n. Inside the loop for the insertion you can write something like:

query = """INSERT INTO table (id, name, age) VALUES(%s, %s, %s)
ON DUPLICATE KEY UPDATE name=%s, age=%s"""
engine.execute(query, (df.id[i], df.name[i], df.age[i], df.name[i], df.age[i]))

2) Define a python function that returns True or False when the record exists and then use it in your loop:

def check_existence(user_id):
    query = "SELECT EXISTS (SELECT 1 FROM your_table where user_id_str = %s);"
    return list(engine.execute(query,  (user_id, ) ) )[0][0] == 1

You could iterate over rows and do this check before inserting

Please also check the solution in this question and this one too which might work in your case.

2 of 3

1

Pangres is the tool for this job.

Overview here: https://pypi.org/project/pangres/

Use the function pangres.fix_psycopg2_bad_cols to "clean" the columns in the DataFrame.

Code/usage here: https://github.com/ThibTrip/pangres/wiki https://github.com/ThibTrip/pangres/wiki/Fix-bad-column-names-postgres Example code:

# From: <https://github.com/ThibTrip/pangres/wiki/Fix-bad-column-names-postgres>
import pandas as pd

# fix bad col/index names with default replacements (empty string for '(', ')' and '%'):

df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')
print(df)

test()  foo()%
     0      0

# clean cols, index w/ no replacements
df_fixed = fix_psycopg2_bad_cols(df)

print(df_fixed)

test    foo
   0      0

# fix bad col/index names with custom replacements - you MUST provide replacements for '(', ')' and '%': 

# reset df
df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')

# clean cols, index w/ user-specified replacements
df_fixed = fix_psycopg2_bad_cols(df, replacements={'%':'percent', '(':'', ')':''})

print(df_fixed)
test    foopercent
   0             0

Will only fix/correct some of the bad characters:

Replaces '%', '(' and ')' (characters that won't play nicely or even at all)

But, useful in that it handles cleanup and upsert.

(p.s., I know this post is over 4 years old, but still shows up in Google results when searching for "pangres upsert determine number inserts and updates" as the top SO result, dated May 13, 2020.)

w3resource

w3resource.com › pandas › dataframe › dataframe-to_sql.php

Pandas DataFrame: to_sql() function - w3resource

August 19, 2022 - DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Stack Overflow

stackoverflow.com › questions › 42461959 › how-do-i-perform-an-update-of-existing-rows-of-a-db-table-using-a-pandas-datafra

python - How do I perform an UPDATE of existing rows of a db table using a Pandas DataFrame? - Stack Overflow

Top answer

1 of 4

20

I think the easiest way would be to:

first delete those rows that are going to be "upserted". This can be done in a loop, but it's not very efficient for bigger data sets (5K+ rows), so i'd save this slice of the DF into a temporary MySQL table:

# assuming we have already changed values in the rows and saved those changed rows in a separate DF: `x`
x = df[mask]  # `mask` should help us to find changed rows...

# make sure `x` DF has a Primary Key column as index
x = x.set_index('a')

# dump a slice with changed rows to temporary MySQL table
x.to_sql('my_tmp', engine, if_exists='replace', index=True)

conn = engine.connect()
trans = conn.begin()

try:
    # delete those rows that we are going to "upsert"
    engine.execute('delete from test_upsert where a in (select a from my_tmp)')
    trans.commit()

    # insert changed rows
    x.to_sql('test_upsert', engine, if_exists='append', index=True)
except:
    trans.rollback()
    raise

PS i didn't test this code so it might have some small bugs, but it should give you an idea...

2 of 4

9

A MySQL specific solution using Panda's to_sql "method" arg and sqlalchemy's mysql insert on_duplicate_key_update features:

def create_method(meta):
    def method(table, conn, keys, data_iter):
        sql_table = db.Table(table.name, meta, autoload=True)
        insert_stmt = db.dialects.mysql.insert(sql_table).values([dict(zip(keys, data)) for data in data_iter])
        upsert_stmt = insert_stmt.on_duplicate_key_update({x.name: x for x in insert_stmt.inserted})
        conn.execute(upsert_stmt)

    return method

engine = db.create_engine(...)
conn = engine.connect()
with conn.begin():
    meta = db.MetaData(conn)
    method = create_method(meta)
    df.to_sql(table_name, conn, if_exists='append', method=method)

Pandas

pandas.pydata.org › pandas-docs › version › 0.15 › generated › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 0.15.2 documentation

DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)¶

Saturn Cloud

saturncloud.io › blog › how-to-insert-or-update-if-exists-in-mysql-using-pandas

How to Insert or Update If Exists in MySQL Using Pandas | Saturn Cloud Blog

October 4, 2023 - Finally, we use the to_sql() method to update the data in the MySQL database. The if_exists parameter is set to 'replace', which means that if the table already exists in the database, the data will be replaced with the new data.

Programiz

programiz.com › python-programming › pandas › methods › to_sql

Pandas to_sql()

In this example, we specified that ... new_df.to_sql(name='people', con=engine, if_exists='append') In this example, we appended the records in new_df to the people table by using if_exists='append'. import pandas as pd from sqlalchemy import create_engine engine = create...

pandas

pandas.pydata.org › pandas-docs › dev › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.0rc0+40.gecf28e538a documentation

) ... stmt = stmt.on_duplicate_key_update(b=stmt.inserted.b, c=stmt.inserted.c) ... result = conn.execute(stmt) ... return result.rowcount >>> df_conflict.to_sql(name="conflict_table", con=conn, if_exists="append", # noqa: F821 ... method=insert_on_conflict_update) 2 · Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers.

PyPI

pypi.org › project › pandas-upsert-to-mysql

pandas-upsert-to-mysql · PyPI

July 1, 2020 - Pandas official (up to 1.0.5 version) to_sql method does not implement upsert feature. Its parameter if_exist has avaliable values as below:

      » pip install pandas-upsert-to-mysql

Published Jul 01, 2020

Version 0.0.3

Homepage https://github.com/LawrentChen/pandas_upsert_to_mysql

Snowflake Community

community.snowflake.com › s › question › 0D50Z00009QBJhVSAX › sql-alchemy-pandas-dataframe-tosql-replace-table-if-it-exists

SQL Alchemy, pandas dataframe to_sql : Replace table if it ...

August 9, 2019 - Join our community of data professionals to learn, connect, share and innovate together

Pandas

pandas.pydata.org › pandas-docs › version › 0.23.4 › generated › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 0.23.4 documentation

>>> df1 = pd.DataFrame({'name' : ['User 4', 'User 5']}) >>> df1.to_sql('users', con=engine, if_exists='append') >>> engine.execute("SELECT * FROM users").fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3'), (0, 'User 4'), (1, 'User 5')]

Pandas

pandas.pydata.org › pandas-docs › version › 0.23.3 › generated › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 0.23.3 documentation

>>> df1 = pd.DataFrame({'name' : ['User 4', 'User 5']}) >>> df1.to_sql('users', con=engine, if_exists='append') >>> engine.execute("SELECT * FROM users").fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3'), (0, 'User 4'), (1, 'User 5')]

Vultr Docs

docs.vultr.com › python › third-party › pandas › DataFrame › to_sql

Python Pandas DataFrame to_sql() - Save Data to SQL Table | Vultr Docs

December 30, 2024 - df.index = pd.Index([1, 2, 3]) # Setting a specific index df.to_sql('daily_sales', con=engine, if_exists='append', index=True, index_label='ID') Explain Code