Simply add the primary key after uploading the table with pandas.

group_export.to_sql(con=engine, name=example_table, if_exists='replace', 
                    flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')
Answer from tomp on Stack Overflow
Top answer
1 of 5
83

Simply add the primary key after uploading the table with pandas.

group_export.to_sql(con=engine, name=example_table, if_exists='replace', 
                    flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')
2 of 5
37

Disclaimer: this answer is more experimental then practical, but maybe worth mention.

I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:

Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:

  1. create pandas.io.SQLDatabase instance

    engine = sa.create_engine('postgresql:///somedb')
    pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)
    
  2. define function analoguous to pandas.io.SQLDatabase.to_sql() but with additional *kwargs argument which is passed to pandas.io.SQLTable object created inside it (i've just copied original to_sql() method and added *kwargs):

    def to_sql_k(self, frame, name, if_exists='fail', index=True,
               index_label=None, schema=None, chunksize=None, dtype=None, **kwargs):
        if dtype is not None:
            from sqlalchemy.types import to_instance, TypeEngine
            for col, my_type in dtype.items():
                if not isinstance(to_instance(my_type), TypeEngine):
                    raise ValueError('The type of %s is not a SQLAlchemy '
                                     'type ' % col)
    
        table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                         if_exists=if_exists, index_label=index_label,
                         schema=schema, dtype=dtype, **kwargs)
        table.create()
        table.insert(chunksize)
    
  3. call this function with your SQLDatabase instance and the dataframe you want to save

    to_sql_k(pandas_sql, df2save, 'tmp',
            index=True, index_label='id', keys='id', if_exists='replace')
    

And we get something like

CREATE TABLE public.tmp
(
  id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)

in the database.

PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.

🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.1 documentation
>>> from sqlalchemy.dialects.postgresql import insert >>> def insert_on_conflict_nothing(table, conn, keys, data_iter): ... # "a" is the primary key in "conflict_table" ... data = [dict(zip(keys, row)) for row in data_iter] ... stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"]) ...
Discussions

[pandas] How can I create a primary key when writing a datafield to sql (df.to_sql)?
Without some code or proper visualization of the process, we can only guess the steps you're taking and the constraints you're dealing with More on reddit.com
🌐 r/learnpython
9
1
January 16, 2020
python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql - Stack Overflow
I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file. I need to: set the primary key for each table usin... More on stackoverflow.com
🌐 stackoverflow.com
Adding (Insert or update if key exists) option to `.to_sql`
Suppose you have an existing SQL table called person_age, where id is the primary key: age id 1 18 2 42 and you also have new data in a DataFrame called extra_data age id 2 44 3 95 then it would be useful to have an option on extra_data.... More on github.com
🌐 github.com
59
September 13, 2016
Using SQLAlchemy and Pandas to create a database for your Flask app

I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!

More on reddit.com
🌐 r/Python
5
46
December 16, 2013
🌐
GitHub
github.com › pandas-dev › pandas › issues › 62980
ENH: Add primary_key and auto_increment parameters to DataFrame.to_sql() · Issue #62980 · pandas-dev/pandas
November 4, 2025 - Data Migration: Transferring data ... (primary_key and auto_increment) to enable primary key creation and auto-increment functionality when writing DataFrames to SQL databases....
Author   eddiethedean
🌐
Reddit
reddit.com › r/learnpython › [pandas] how can i create a primary key when writing a datafield to sql (df.to_sql)?
r/learnpython on Reddit: [pandas] How can I create a primary key when writing a datafield to sql (df.to_sql)?
January 16, 2020 -

I have a bunch of Excel files, I read them in, do stuff with them and then write them into a SQLite DB.

I need to query those tables, my basic query takes 32 minutes because there are no primary keys set. Doing that by hand doesn't work because I read Excel files and write to DB on a regular base, redoing the primary keys every day is not a solution.

Haven't found anything in the documentation so far.

Any ways to do that in pandas?

Top answer
1 of 7
20

Unfortunately there is no way right now to set a primary key in the pandas df.to_sql() method. Additionally, just to make things more of a pain there is no way to set a primary key on a column in sqlite after a table has been created.

However, a work around at the moment is to create the table in sqlite with the pandas df.to_sql() method. Then you could create a duplicate table and set your primary key followed by copying your data over. Then drop your old table to clean up.

It would be something along the lines of this.

import pandas as pd
import sqlite3

df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

#write the pandas dataframe to a sqlite table
df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

#connect to the database
conn = sqlite3.connect('database')
c = conn.curser()

c.executescript('''
    PRAGMA foreign_keys=off;

    BEGIN TRANSACTION;
    ALTER TABLE table RENAME TO old_table;

    /*create a new table with the same column names and types while
    defining a primary key for the desired column*/
    CREATE TABLE new_table (col_1 TEXT PRIMARY KEY NOT NULL,
                            col_2 TEXT);

    INSERT INTO new_table SELECT * FROM old_table;

    DROP TABLE old_table;
    COMMIT TRANSACTION;

    PRAGMA foreign_keys=on;''')

#close out the connection
c.close()
conn.close()

In the past I have done this as I have faced this issue. Just wrapped the whole thing as a function to make it more convenient...

In my limited experience with sqlite I have found that not being able to add a primary key after a table has been created, not being able to perform Update Inserts or UPSERTS, and UPDATE JOIN has caused a lot of frustration and some unconventional workarounds.

Lastly, in the pandas df.to_sql() method there is a a dtype keyword argument that can take a dictionary of column names:types. IE: dtype = {col_1: TEXT}

2 of 7
8

In pandas version 0.15, to_sql() got an argument dtype, which can be used to set both dtype and the primary key attribute for all columns:

import sqlite3
import pandas as pd

df = pd.DataFrame({'MyID': [1, 2, 3], 'Data': [3, 2, 6]})
with sqlite3.connect('foo.db') as con:
    df.to_sql('df', con=con, dtype={'MyID': 'INTEGER PRIMARY KEY',
                                    'Data': 'FLOAT'})
🌐
GitHub
github.com › pandas-dev › pandas › issues › 14553
Adding (Insert or update if key exists) option to `.to_sql` · Issue #14553 · pandas-dev/pandas
September 13, 2016 - import pandas as pd from sqlalchemy import create_engine import sqlite3 conn = sqlite3.connect('example.db') c = conn.cursor() c.execute('''DROP TABLE IF EXISTS person_age;''') c.execute(''' CREATE TABLE person_age (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL) ''') conn.commit() conn.close() ##### Create original table engine = create_engine("sqlite:///example.db") sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]}) sql_df.to_sql('person_age', engine, if_exists='append', index=False) #### Extra data to insert/update extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]}) extra_data.set_index('id', inplace=True) #### extra_data.to_sql() with row update or insert option expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]}) expected_df.set_index('id', inplace=True)
Author   cdagnino
🌐
PyPI
pypi.org › project › pandabase
Client Challenge
JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
Find elsewhere
🌐
Reddit
reddit.com › r/python › using sqlalchemy and pandas to create a database for your flask app
r/Python on Reddit: Using SQLAlchemy and Pandas to create a database for your Flask app
December 16, 2013 -

Had an issue with this today and figured others might benefit from the solution.

I've been creating some of the tables for the Postgres database in my Flask app with Pandas to_sql method (the datasource is messy and Pandas handles all the issues very well with very little coding on my part). The rest of the tables are initialized with a SQLAlchemy model, which allows my to easily query the tables with session.query:

db.session.query(AppUsers).filter_by(id = uId).first()

I couldn't however query the tables created by Pandas because they weren't part of the model. Enter automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):

metadata = MetaData()
metadata.reflect(db.engine, only=tableNamesDict.values())
Base = automap_base(metadata=metadata)
Base.prepare()

Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:

for df in dfs.keys():
    cols = dfs[df].columns
    cols = [str(col) for col in cols if 'id' in col.lower()]
    schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
    db.engine.execute('DROP TABLE ' + df + ';')
    db.engine.execute(schema)
    dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')

I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.

Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:

alchemyClassDict = {}
for t in Base.classes.keys():
    alchemyClassDict[t] = Base.classes[t]

And query with:

res = db.session.query(alchemyClassDict['user']).first()

Hope this helps someone.

🌐
GitHub
github.com › pandas-dev › pandas › issues › 15988
When using to_sql(), continue if duplicate primary keys are detected? · Issue #15988 · pandas-dev/pandas
April 12, 2017 - It would make sense for to_sql(if_exists='append') to merely warn the user which rows had duplicate keys and just continue to add the new rows, not completely stop executing.
Author   rosstripi
🌐
Medium
medium.com › @erkansirin › worst-way-to-write-pandas-dataframe-to-database-445ec62025e0
Worst Way to Write Pandas Dataframe to Database | by Erkan Şirin | Medium
December 27, 2022 - By specifying the schema as we write. Now let’s create a schema. This schema is given as the dtype argument to the to_sql method and this argument type is a dictionary. In this dictionary, keys represent column names and values represent data types. So what about data types? mysql or pandas?
🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation
>>> from sqlalchemy.dialects.postgresql import insert >>> def insert_on_conflict_nothing(table, conn, keys, data_iter): ... # "a" is the primary key in "conflict_table" ... data = [dict(zip(keys, row)) for row in data_iter] ... stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"]) ...
🌐
DNMTechs
dnmtechs.com › creating-a-table-with-primary-key-using-python-pandas-to_sql
Creating a Table with Primary Key using Python Pandas to_sql – DNMTechs – Sharing and Storing Technology Knowledge
After executing the code, the table users will be created in the SQLite database with the specified columns and data. The id column will serve as the primary key, ensuring uniqueness for each record. Using Pandas to_sql, we can easily create tables with primary keys in a database.
🌐
w3resource
w3resource.com › pandas › dataframe › dataframe-to_sql.php
Pandas DataFrame: to_sql() function - w3resource
August 19, 2022 - DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
🌐
Stack Overflow
stackoverflow.com › questions › tagged › pandas-to-sql
Unanswered 'pandas-to-sql' Questions - Stack Overflow
I want to add some data to the database with pandas .to_sql command. Is there a way I can get the auto-generated primary-key of the inserted objects as I need it for creating foreign keys?
🌐
GitHub
github.com › pandas-dev › pandas › issues › 7984
API to_sql method doesn't provide the option to specify unique indexes · Issue #7984 · pandas-dev/pandas
June 2, 2014 - sqlite> .schema CREATE TABLE "Sniffs" ( "ID" TEXT, "Day" TEXT, "ModelName" TEXT, "PassRate" FLOAT, "RtxName" TEXT, "Shadow" TEXT, "Time" TEXT, "Unit" TEXT ); CREATE INDEX "ix_Sniffs_ID" ON "Sniffs" ("ID"); Looking at these docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql
Author   manistal
🌐
Reddit
reddit.com › r/pythontips › how to update an sql database with a dataframe referencing a primary key/column id?
r/pythontips on Reddit: How to update an SQL database with a dataframe referencing a primary key/column ID?
December 17, 2021 -

Lets say I have an excel with columns [Class ID (reference key), Mean Score, Mode Score, Median Score] and I have an SQL database with columns [Class_ID (primary key), Mean_Score, Mode_Score, Median_Score, Last_updated]

  1. How can I use pandas on python to read in the excel and update the SQL database with reference to the primary key?

  2. Would the difference in column names matter?

  3. How can I keep track of when was the last time the class id was updated?

  4. What if there is new class ID of data that does not exist in the SQL database?

  5. Looking at the df.itterrows() function, what if I have a lot of columns is there a way I tuple,list,dictionary it?

Top answer
1 of 2
4
Pandas Dataframe has a .to_sql() function. You would use .set_index() to set the primary key. Difference in column names will matter. My suggestion is you just export your DataFrame .to_dict() and write a function to pass that dict to a sql alchemy connection.
2 of 2
2
Read the excel to a DF then you could rename your columns in the DF and drop any columns in your DF that are not in your SQL table. You can convert it to a dictionary which can be passed to SQLAlchemy for update/insert/delete. Do you have control of the table you are loading to? If so add a trigger to set a “lastmodified” column whenever the row is updated or set this in your create/update that you pass. Since you are using SQLAlchemy to create the engine anyways why not utilize it to handle your checks and updates? You can bulk check if your PKs exists (using get_primary_keys()) to determine if there are any truly new rows to insert. If you then have logic to update existing rows you can loop through your data and use the query to check which rows need to be updated if there is a change. You then pass that dict in an update. Edit: Additional thought - Depending on the use case you could setup python as the Extract and load in your ELT. Always append data to a raw table with a time stamp then have a stored proc in the DB to do a merge statement (could also apply a SCD type 2 if you desired with a is_current and start_date/end_date) to maintain a historical tracking. You then have your “raw” table that has every row that has been inserted/updated during each run in case anything needs to be rolled back.
🌐
Stack Overflow
stackoverflow.com › questions › 77944319 › is-it-possible-to-create-foreign-keys-with-pandas-to-sql › 77978949
sqlalchemy - Is it possible to create foreign keys with pandas to_sql - Stack Overflow
... Pandas has no concept of primary or foreign key, or indeed any relations between dataframes, so your dataframes don't have this relationship (except in the sense that it exists in your mind).
🌐
pandas
pandas.pydata.org › pandas-docs › dev › reference › api › pandas.DataFrame.to_sql.html
pandas.DataFrame.to_sql — pandas 3.0.0rc0+40.gecf28e538a documentation
>>> from sqlalchemy.dialects.postgresql import insert >>> def insert_on_conflict_nothing(table, conn, keys, data_iter): ... # "a" is the primary key in "conflict_table" ... data = [dict(zip(keys, row)) for row in data_iter] ... stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"]) ...
🌐
Stack Overflow
stackoverflow.com › questions › 52070463 › how-to-get-primary-keys-from-pandas-to-sql-insert
django models - How to get primary keys from pandas.to_sql insert - Stack Overflow
Hi, does this solution help? stackoverflow.com/questions/26770489/… I've had the same question before, and I also just used the (inelegant) solution of returning the MAX(id), then adding it to the dataframe/other SQL table as a foreign key. ... how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql