Yes, at the end of the day it will be commited automatically.
Pandas calls SQLAlchemy method executemany (for SQL Alchemy connections):
conn.executemany(self.insert_statement(), data_list)
And according to the SQL Alchemy docs executemany issues commit at the end.
For SQLite connection commit is called explicitly:
def run_transaction(self):
cur = self.con.cursor()
try:
yield cur
self.con.commit()
except:
self.con.rollback()
raise
finally:
cur.close()
Answer from MaxU - stand with Ukraine on Stack Overflowsql server - pandas.DataFrame.to_sql inserts data, but doesn't commit the transaction - Stack Overflow
Pandas to_sql gives no errors but isn’t inserting in to a table.
Rollback for pandas.DataFrame.to_sql?
sql - Airflow + pandas read_sql_query() with commit - Stack Overflow
I had a similar problem: when trying to write use df.to_sql (from pandas) with a sqlalchemy engine created with mssql+pymssql.
sqlalchemy.exc.OperationalError: (pymssql._pymssql.OperationalError) Cannot commit transaction: (3902, b'The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')
Turns out that the issue had to do with properly managing query commitments and connections closing. The easiest way to manage this was by using SQLAlchemy's built in compatibility with Python's with
SQL_CONNECTION = sqlalchemy.create_engine('mssql+pymssql://'+ 'SQL_USERNAME' +':' + qp(SQL_PASSWORD) + '@'+ SQL_SERVER + '/'+ SQL_DB) #TODO make username dynamic
with SQL_CONNECTION.connect() as connection:
with connection.begin():
df.to_sql(SQL_TABLE, connection, schema='dbo', if_exists='replace')
I had the same issue, I realised you need to tell pyodbc which database you want to use. For me the default was master, so my data ended up there.
There are two ways you can do this, either:
connection.execute("USE <dbname>")
Or define the schema in the df.to_sql():
df.to_sql(name=<TABELENAME>, conn=connection, schema='<dbname>.dbo')
In my case the schema was <dbname>.dbo I think the .dbo is default so it could be something else if you define an alternative schema
This was referenced in this answer, it took me a bit longer to realise what the schema name should be.
It connects (I can read back data) but it isn’t inserting. It also gives no errors. Using sqlalchemy and pymssql as a driver. My first thought was to set autocommit to true but that hasn’t fixed it.
Any ideas?
I'm trying to execute a set of SQLAlchemy commands (a delete and an insert) and then finally write a pandas.DataFrame to the DB. I call to_sql for that.
I need a way to roll it all back if anything goes wrong. The SQLAlchemy commands are easy since they can be contained within a transaction. Not so with to_sql.
pandas.DataFrame.to_sql takes named argument con of type SQLAlchemy engine. Presumably, it creates it's own connection.
Here's essentially what I'm doing...
connection = engine.connect()
transaction = connection.begin()
try:
delete(my_table, clause).execute()
my_dataframe.to_sql('ahother_table', connection)
transaction.commit()
except:
transaction.rollback()How do I rollback to_sql?
I had a similar use case -- load data into SQL Server with Pandas, call a stored procedure that does heavy lifting and writes to tables, then capture the result set into a new DataFrame.
I solved it by using a context manager and explicitly committing the transaction:
# Connect to SQL Server
engine = sqlalchemy.create_engine('db_string')
with engine.connect() as connection:
# Write dataframe to table with replace
df.to_sql(name='myTable', con=connection, if_exists='replace')
with connection.begin() as transaction:
# Execute verification routine and capture results
df_processed = pandas.read_sql(sql='exec sproc', con=connection)
transaction.commit()
read_sql won't commit because as that method name implies, the goal is to read data, not write. It's good design choice from pandas. This is important because it prevents accidental writes and allows interesting scenarios like running a procedure, read its effects but nothing is persisted. read_sql's intent is to read, not to write. Expressing intent directly is a gold standard principle.
A more explicit way to express your intent would be to execute (with commit) explicitly before fetchall. But because pandas offers no simple way to read from a cursor object, you would lose the ease of mind provided by read_sql and have to create the DataFrame yourself.
So all in all your solution is fine, by setting autocommit=True you're indicating that your database interactions will persist whatever they do so there should be no accidents. It's a bit weird to read, but if you named your sql_template variable something like write_then_read_sql or explain in a docstring, the intent would be clearer.
You can set the connection to autocommit by:
db_engine = db_engine.execution_options(autocommit=True)
From https://docs.sqlalchemy.org/en/13/core/connections.html#understanding-autocommit:
The “autocommit” feature is only in effect when no
Transactionhas otherwise been declared. This means the feature is not generally used with the ORM, as theSessionobject by default always maintains an ongoingTransaction.
In your code you have not presented any explicit transactions, and so the engine used as the con is in autocommit mode (as implemented by SQLA).
Note that SQLAlchemy implements its own autocommit that is independent from the DB-API driver's possible autocommit / non-transactional features.
Hence the "the simplest, safest way for adding auto-commit behavior ― or explicitly committing after every dataframe write" is what you already had, unless to_sql() emits some funky statements that SQLA does not recognize as data changing operations, which it has not, at least of late.
It might be that the SQLA autocommit feature is on the way out in the next major release, but we'll have to wait and see.
» pip install fast-to-sql