Consider a temp table which would be exact replica of your final table, cleaned out with each run:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('temp_table', engine, if_exists='replace')

sql = """
    UPDATE final_table AS f
    SET col1 = t.col1
    FROM temp_table AS t
    WHERE f.id = t.id
"""

with engine.begin() as conn:     # TRANSACTION
    conn.execute(sql)
Answer from Parfait on Stack Overflow
🌐
GitHub
github.com › pandas-dev › pandas › issues › 14553
Adding (Insert or update if key exists) option to `.to_sql` · Issue #14553 · pandas-dev/pandas
November 1, 2016 - In this case, the id=2 row would get updated to age=44 and the id=3 row would get added ... I looked at pandas sql.py sourcecode to come up with a solution, but I couldn't follow. ... import pandas as pd from sqlalchemy import create_engine import sqlite3 conn = sqlite3.connect('example.db') c = conn.cursor() c.execute('''DROP TABLE IF EXISTS person_age;''') c.execute(''' CREATE TABLE person_age (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL) ''') conn.commit() conn.close() ##### Create original table engine = create_engine("sqlite:///example.db") sql_df = pd.DataFrame({'id' : [1, 2], 'age'
Author   cdagnino
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.1 documentation
GitHub · X · Mastodon · DataFrame.to_sql(name, con, *, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]# Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten. Warning · The pandas library does not attempt to sanitize inputs provided via a to_sql call.
Discussions

python - Pandas update sql - Stack Overflow
Is there any way to do an SQL update-where from a dataframe without iterating through each line? I have a postgresql database and to update a table in the db from a dataframe I would use psycopg2 a... More on stackoverflow.com
🌐 stackoverflow.com
Pandas to SQL, if row exists then replace, otherwise append.
This, probably: https://mariadb.com/kb/en/insert-on-duplicate-key-update/ More on reddit.com
🌐 r/learnpython
4
2
April 11, 2022
Pandas.DataFrame.to_sql() Default Insert Behavior Assumes Table Creation
If for any number of reasons table ...aFrame.to_sql() is to create a new table with whatever types the fields of the dataframe are currently in. This table created by Pandas is missing all the indexes, primary/foreign keys, etc that the dynamic table generation process creates. A fix might be to add a new parameter, if_not_exists, that allows ... More on github.com
🌐 github.com
6
October 21, 2019
Panda dataframe update existing table with to_sql
Hi I am new to Python and is trying to make my first python application. But I have some problem with panda. How do I update an existing table with panda dataframe with out getting duplicate errors saying key already exists. Is it possible to skip record that already exists or what is best practice? More on discuss.python.org
🌐 discuss.python.org
0
0
May 30, 2024
🌐
GitHub
github.com › ryanbaumann › Pandas-to_sql-upsert
GitHub - ryanbaumann/Pandas-to_sql-upsert: Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory · GitHub
The goal of this library is to extend the Python Pandas to_sql() function to be: Muti-threaded (improving time-to-insert on large datasets) Allow the to_sql() command to run an 'insert if does not exist' to the database
Starred by 84 users
Forked by 16 users
Languages   Jupyter Notebook 67.7% | Python 32.3%
Top answer
1 of 8
86

Consider a temp table which would be exact replica of your final table, cleaned out with each run:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('temp_table', engine, if_exists='replace')

sql = """
    UPDATE final_table AS f
    SET col1 = t.col1
    FROM temp_table AS t
    WHERE f.id = t.id
"""

with engine.begin() as conn:     # TRANSACTION
    conn.execute(sql)
2 of 8
15

It looks like you are using some external data stored in df for the conditions on updating your database table. If it is possible why not just do a one-line sql update?

If you are working with a smallish database (where loading the whole data to the python dataframe object isn't going to kill you) then you can definitely conditionally update the dataframe after loading it using read_sql. Then you can use a keyword arg if_exists="replace" to replace the DB table with the new updated table.

df = pandas.read_sql("select * from your_table;", engine)

#update information (update your_table set column = "new value" where column = "old value")
#still may need to iterate for many old value/new value pairs
df[df['column'] == "old value", "column"] = "new value"

#send data back to sql
df.to_sql("your_table", engine, if_exists="replace")

Pandas is a powerful tool, where limited SQL support was just a small feature at first. As time goes by people are trying to use pandas as their only database interface software. I don't think pandas was ever meant to be an end-all for database interaction, but there are a lot of people working on new features all the time. See: https://github.com/pandas-dev/pandas/issues

🌐
GitHub
gist.github.com › pedrovgp › b46773a1240165bf2b1448b3f70bed32
Allow upserting a pandas dataframe to a postgres table (equivalent to df.to_sql(..., if_exists='update') · GitHub
This will set primary keys on the table equal to the index names 1. Create a temp table from the dataframe 2. Insert/update from temp table into table_name Returns: True if successful """ # If the table does not exist, we should just use to_sql to create it if not engine.execute( f"""SELECT EXISTS ( SELECT FROM information_schema.tables WHERE table_schema = '{schema}' AND table_name = '{table_name}'); """ ).first()[0]: df.to_sql(table_name, engine, schema=schema, dtype=dtypes) return True # If it already exists...
🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation
GitHub · X · Mastodon · DataFrame.to_sql(name, con, *, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]# Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten. Warning · The pandas library does not attempt to sanitize inputs provided via a to_sql call.
🌐
Reddit
reddit.com › r/learnpython › pandas to sql, if row exists then replace, otherwise append.
r/learnpython on Reddit: Pandas to SQL, if row exists then replace, otherwise append.
April 11, 2022 -

Good morning all, hoping you can help.

I'm a bit of programming noob, but I've written a python script that does the below:

  • first queries my MariaDB SQL database and retrieves the maximum datetime from a table column (dateLastAction).

  • This datetime is then used as a filter in an API request to retrieve any items updated after the max datetime from my SQL table.

  • I then transform the response and normalize it to a pandas dataframe which matches the structure of the SQL table exactly.

The dataframe now contains some rows which do exist in the database, and some which aren't present at all.

So my question is, is it possible check my MariaDB table for each of the 'ticketId' column in my pandas dataframe (this is the primary key for the table), and if the 'ticketId' is present, replace the row, and if it's not present, append to the table?

If none of the rows were present in MariaDB then I would append rows of my dataframe to the SQL table using:

df.to_sql('tickets', engine, index=False, if_exists='append')

the 'if_exists' portion is relating to the table itself though, not the individual rows.

Can anyone share some insight on how I can achieve this? Is it easier to split my dataframe into two, one for new rows, and then rows to be replaced?

Code outline for what I'm trying to achieve:

from sqlalchemy import create_engine
import datetime
import requests
import pandas as pd


## STEP 1: Retrieve the max 'dateLastAction' value from MariaDB 'tickets' table
hostname = 'hostname'
dbname = 'dbname'
uname = 'uname'
pwd = 'pwd'

engine = create_engine('mysql+pymysql://{user}:{pw}@{host}/{db}'.format(host=hostname, db=dbname, user=uname, pw=pwd))

query = 'SELECT dateLastAction FROM tickets WHERE dateLastAction IN (SELECT max(dateLastAction) FROM tickets)'
result = engine.execute(query).fetchone()
lastAction = result[0]
lastAction = lastAction.strftime('%Y-%m-%dT%H:%M:%S')


## STEP 2: Request tickets from PSA with dateLastAction larger than that stored in MariaDB
filter = "?$filter=LastActivityUpdate+gt+DateTime'" + str(lastAction) + "'"
request = requests.get('url' + filter, headers=headers)
response = request.json()
df = pd.json_normalize(response)


## STEP 3: Add results to MariaDB
???

Thank you in advance!

🌐
Beautiful Soup
tedboy.github.io › pandas › _modules › pandas › io › sql.html
pandas.io.sql — Pandas Doc
If None, all rows will be written ... name to SQL type, default None Optional specifying the datatype for columns. The SQL type should be a SQLAlchemy type, or a string for sqlite3 fallback connection. If all columns are of the same type, one single value can be used. """ if if_exists not in ('fail', 'replace', 'append'): raise ValueError("'{0}' is not valid for if_exists".format(if_exists)) pandas_sql = ...
Find elsewhere
🌐
GitHub
github.com › pandas-dev › pandas › issues › 29138
Pandas.DataFrame.to_sql() Default Insert Behavior Assumes Table Creation · Issue #29138 · pandas-dev/pandas
October 21, 2019 - This table created by Pandas is ... process creates. A fix might be to add a new parameter, if_not_exists, that allows a user to set the behavior of to_sql() if a table is not detected....
Author   JordanPavlic
🌐
Python.org
discuss.python.org › python help
Panda dataframe update existing table with to_sql - Python Help - Discussions on Python.org
May 30, 2024 - Hi I am new to Python and is trying to make my first python application. But I have some problem with panda. How do I update an existing table with panda dataframe with out getting duplicate errors saying key already exists. Is it possible to skip record that already exists or what is best practice?
🌐
GitHub
github.com › jwcook23 › mssql_dataframe
GitHub - jwcook23/mssql_dataframe: Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.
These more advanced methods are ... offers 3 options if the SQL table already exists with the parameter if_exists={'fail', 'replace', 'append'}. See QUICKSTART for a full overview of functionality....
Starred by 12 users
Forked by 4 users
Languages   Python 99.2% | PowerShell 0.8% | Python 99.2% | PowerShell 0.8%
🌐
GitHub
github.com › pandas-dev › pandas › issues › 37210
ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table · Issue #37210 · pandas-dev/pandas
October 17, 2020 - pandas-dev / pandas Public · There was an error while loading. Please reload this page. Notifications · You must be signed in to change notification settings · Fork 19.6k · Star 47.8k · New issueCopy link · New issueCopy link · Closed · #60376 · Closed · ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table#37210 ·
Author   tokorhon
Top answer
1 of 3
10

I can think of two options, but number 1 might be cleaner/faster:

1) Make SQL decide on the update/insert. Check this other question. You can iterate by rows of your 'df', from i=1 to n. Inside the loop for the insertion you can write something like:

query = """INSERT INTO table (id, name, age) VALUES(%s, %s, %s)
ON DUPLICATE KEY UPDATE name=%s, age=%s"""
engine.execute(query, (df.id[i], df.name[i], df.age[i], df.name[i], df.age[i]))

2) Define a python function that returns True or False when the record exists and then use it in your loop:

def check_existence(user_id):
    query = "SELECT EXISTS (SELECT 1 FROM your_table where user_id_str = %s);"
    return list(engine.execute(query,  (user_id, ) ) )[0][0] == 1

You could iterate over rows and do this check before inserting

Please also check the solution in this question and this one too which might work in your case.

2 of 3
1

Pangres is the tool for this job.

Overview here: https://pypi.org/project/pangres/

Use the function pangres.fix_psycopg2_bad_cols to "clean" the columns in the DataFrame.

Code/usage here: https://github.com/ThibTrip/pangres/wiki https://github.com/ThibTrip/pangres/wiki/Fix-bad-column-names-postgres Example code:

# From: <https://github.com/ThibTrip/pangres/wiki/Fix-bad-column-names-postgres>
import pandas as pd

# fix bad col/index names with default replacements (empty string for '(', ')' and '%'):

df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')
print(df)

test()  foo()%
     0      0

# clean cols, index w/ no replacements
df_fixed = fix_psycopg2_bad_cols(df)

print(df_fixed)

test    foo
   0      0

# fix bad col/index names with custom replacements - you MUST provide replacements for '(', ')' and '%': 

# reset df
df = pd.DataFrame({'test()':[0],
                   'foo()%':[0]}).set_index('test()')

# clean cols, index w/ user-specified replacements
df_fixed = fix_psycopg2_bad_cols(df, replacements={'%':'percent', '(':'', ')':''})

print(df_fixed)
test    foopercent
   0             0

Will only fix/correct some of the bad characters:

Replaces '%', '(' and ')' (characters that won't play nicely or even at all)

But, useful in that it handles cleanup and upsert.

(p.s., I know this post is over 4 years old, but still shows up in Google results when searching for "pangres upsert determine number inserts and updates" as the top SO result, dated May 13, 2020.)

🌐
GitHub
github.com › pandas-dev › pandas › issues › 46933
ENH: pandas.DataFrame.to_sql - if_exists replace with cascade option · Issue #46933 · pandas-dev/pandas
May 3, 2022 - ENH: pandas.DataFrame.to_sql - if_exists replace with cascade option#46933 · Copy link · Labels ·
Author   r7butler
🌐
GitHub
github.com › pandas-dev › pandas › issues › 35594
BUG: to_sql if_exists not working properly when schema is set on PostgreSQL · Issue #35594 · pandas-dev/pandas
August 7, 2020 - (optional) I have confirmed this bug exists on the master branch of pandas. import sqlalchemy import pandas as pd uri = "some uri to postgresql engine = sqlalchemy.create_engine(uri) conn = engine.connect() df = pd.DataFrame([[1,2,3], [2,1,3]], columns=["A", "B", "C"]) # works df.to_sql("tt", conn, schema="pg_temp", if_exists="append", index=False) # causes error df.to_sql("tt", conn, schema="pg_temp", if_exists="append", index=False) # works df.to_sql("tt", conn, if_exists="append", index=False)
Author   westhyena
🌐
LinuxTut
linuxtut.com › en › b329ce4f9d6c229382ff
I tried Pandas' Sql Upsert
November 30, 2019 - !! It means to do Insert and Update. There are two main functions of Upsert in Sql. Based on the Primary Key, nothing exists, and Insert (** upsert_keep **) does not exist. Based on Primary Key, if it exists, update it, and if it does not exist, insert (** upsert_overwrite **)
Top answer
1 of 4
20

I think the easiest way would be to:

first delete those rows that are going to be "upserted". This can be done in a loop, but it's not very efficient for bigger data sets (5K+ rows), so i'd save this slice of the DF into a temporary MySQL table:

# assuming we have already changed values in the rows and saved those changed rows in a separate DF: `x`
x = df[mask]  # `mask` should help us to find changed rows...

# make sure `x` DF has a Primary Key column as index
x = x.set_index('a')

# dump a slice with changed rows to temporary MySQL table
x.to_sql('my_tmp', engine, if_exists='replace', index=True)

conn = engine.connect()
trans = conn.begin()

try:
    # delete those rows that we are going to "upsert"
    engine.execute('delete from test_upsert where a in (select a from my_tmp)')
    trans.commit()

    # insert changed rows
    x.to_sql('test_upsert', engine, if_exists='append', index=True)
except:
    trans.rollback()
    raise

PS i didn't test this code so it might have some small bugs, but it should give you an idea...

2 of 4
9

A MySQL specific solution using Panda's to_sql "method" arg and sqlalchemy's mysql insert on_duplicate_key_update features:

def create_method(meta):
    def method(table, conn, keys, data_iter):
        sql_table = db.Table(table.name, meta, autoload=True)
        insert_stmt = db.dialects.mysql.insert(sql_table).values([dict(zip(keys, data)) for data in data_iter])
        upsert_stmt = insert_stmt.on_duplicate_key_update({x.name: x for x in insert_stmt.inserted})
        conn.execute(upsert_stmt)

    return method

engine = db.create_engine(...)
conn = engine.connect()
with conn.begin():
    meta = db.MetaData(conn)
    method = create_method(meta)
    df.to_sql(table_name, conn, if_exists='append', method=method)
🌐
GitHub
github.com › pandas-dev › pandas › issues › 40647
ENH: df.to_sql lacks a parameter if_not_exist with options [create, fail] · Issue #40647 · pandas-dev/pandas
March 26, 2021 - tableName = '...' schemaName = '...' dbEngine = .... # dataframe df is holding the data for target table named in [tableName] # currently quote some boilerplate is needed to fail safe, if database table does not exist to avoid creating one if tableExists(tableName): df.to_sql(tableName, schema=schemaName, con=dbEngine, index=True, if_exists='append')** print(f"INSERT-I-done ") else print(f"INSERT-I-failed: table with expected name [{schemaName}].[{tableName}] does not yet or nomore exist") This missing feature leads to another level of encapsulation in a wrapper class. It would be better to integrate this as a option directly to PANDAS.
Author   olippuner
🌐
pandas
pandas.pydata.org › pandas-docs › dev › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.0rc0+40.gecf28e538a documentation
GitHub · X · Mastodon · DataFrame.to_sql(name, con, *, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]# Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten. Warning · The pandas library does not attempt to sanitize inputs provided via a to_sql call.