pandas to_sql primary key

stackoverflow.com › questions › 30867390 › python-pandas-to-sql-how-to-create-a-table-with-a-primary-key

mysql - Python Pandas to_sql, how to create a table with a primary key? - Stack Overflow

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_sql.html

1 of 5

Simply add the primary key after uploading the table with pandas.

group_export.to_sql(con=engine, name=example_table, if_exists='replace', 
                    flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')

2 of 5

Disclaimer: this answer is more experimental then practical, but maybe worth mention.

I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:

Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:

create pandas.io.SQLDatabase instance

engine = sa.create_engine('postgresql:///somedb')
pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)

define function analoguous to pandas.io.SQLDatabase.to_sql() but with additional *kwargs argument which is passed to pandas.io.SQLTable object created inside it (i've just copied original to_sql() method and added *kwargs):

def to_sql_k(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None, **kwargs):
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype, **kwargs)
    table.create()
    table.insert(chunksize)

call this function with your SQLDatabase instance and the dataframe you want to save

to_sql_k(pandas_sql, df2save, 'tmp',
        index=True, index_label='id', keys='id', if_exists='replace')

And we get something like

CREATE TABLE public.tmp
(
  id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)

in the database.

PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.

Pandas

pandas.DataFrame.to_sql — pandas 3.0.1 documentation

>>> from sqlalchemy.dialects.postgresql import insert >>> def insert_on_conflict_nothing(table, conn, keys, data_iter): ... # "a" is the primary key in "conflict_table" ... data = [dict(zip(keys, row)) for row in data_iter] ... stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"]) ...

Discussions

[pandas] How can I create a primary key when writing a datafield to sql (df.to_sql)?

Without some code or proper visualization of the process, we can only guess the steps you're taking and the constraints you're dealing with More on reddit.com

r/learnpython

January 16, 2020

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql - Stack Overflow

I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file. I need to: set the primary key for each table usin... More on stackoverflow.com

stackoverflow.com

Adding (Insert or update if key exists) option to `.to_sql`

Suppose you have an existing SQL table called person_age, where id is the primary key: age id 1 18 2 42 and you also have new data in a DataFrame called extra_data age id 2 44 3 95 then it would be useful to have an option on extra_data.... More on github.com

github.com

September 13, 2016

Using SQLAlchemy and Pandas to create a database for your Flask app

I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!

Videos

youtube.com

- YouTube

youtube.com

Python Pandas to_sql, how to create a table with a primary key?

youtube.com

how to set the primary key when writing a pandas ... - YouTube

View all

github.com › pandas-dev › pandas › issues › 62980

ENH: Add primary_key and auto_increment parameters to DataFrame.to_sql() · Issue #62980 · pandas-dev/pandas

November 4, 2025 - Data Migration: Transferring data ... (primary_key and auto_increment) to enable primary key creation and auto-increment functionality when writing DataFrames to SQL databases....

Author eddiethedean

reddit.com › r/learnpython › [pandas] how can i create a primary key when writing a datafield to sql (df.to_sql)?

r/learnpython on Reddit: [pandas] How can I create a primary key when writing a datafield to sql (df.to_sql)?

January 16, 2020 -

I have a bunch of Excel files, I read them in, do stuff with them and then write them into a SQLite DB.

I need to query those tables, my basic query takes 32 minutes because there are no primary keys set. Doing that by hand doesn't work because I read Excel files and write to DB on a regular base, redoing the primary keys every day is not a solution.

Haven't found anything in the documentation so far.

Any ways to do that in pandas?

Without some code or proper visualization of the process, we can only guess the steps you're taking and the constraints you're dealing with

1 of 2

2 of 2

What you're looking for is MySQL's AUTO_INCREMENT option. If you add an integer column as the primary key with this option, it will generate a new PK for each row you insert.

stackoverflow.com › questions › 39407254 › how-to-set-the-primary-key-when-writing-a-pandas-dataframe-to-a-sqlite-database

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql - Stack Overflow

1 of 7

Unfortunately there is no way right now to set a primary key in the pandas df.to_sql() method. Additionally, just to make things more of a pain there is no way to set a primary key on a column in sqlite after a table has been created.

However, a work around at the moment is to create the table in sqlite with the pandas df.to_sql() method. Then you could create a duplicate table and set your primary key followed by copying your data over. Then drop your old table to clean up.

It would be something along the lines of this.

import pandas as pd
import sqlite3

df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

#write the pandas dataframe to a sqlite table
df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

#connect to the database
conn = sqlite3.connect('database')
c = conn.curser()

c.executescript('''
    PRAGMA foreign_keys=off;

    BEGIN TRANSACTION;
    ALTER TABLE table RENAME TO old_table;

    /*create a new table with the same column names and types while
    defining a primary key for the desired column*/
    CREATE TABLE new_table (col_1 TEXT PRIMARY KEY NOT NULL,
                            col_2 TEXT);

    INSERT INTO new_table SELECT * FROM old_table;

    DROP TABLE old_table;
    COMMIT TRANSACTION;

    PRAGMA foreign_keys=on;''')

#close out the connection
c.close()
conn.close()

In the past I have done this as I have faced this issue. Just wrapped the whole thing as a function to make it more convenient...

In my limited experience with sqlite I have found that not being able to add a primary key after a table has been created, not being able to perform Update Inserts or UPSERTS, and UPDATE JOIN has caused a lot of frustration and some unconventional workarounds.

Lastly, in the pandas df.to_sql() method there is a a dtype keyword argument that can take a dictionary of column names:types. IE: dtype = {col_1: TEXT}

2 of 7

In pandas version 0.15, to_sql() got an argument dtype, which can be used to set both dtype and the primary key attribute for all columns:

import sqlite3
import pandas as pd

df = pd.DataFrame({'MyID': [1, 2, 3], 'Data': [3, 2, 6]})
with sqlite3.connect('foo.db') as con:
    df.to_sql('df', con=con, dtype={'MyID': 'INTEGER PRIMARY KEY',
                                    'Data': 'FLOAT'})

github.com › pandas-dev › pandas › issues › 14553

Adding (Insert or update if key exists) option to `.to_sql` · Issue #14553 · pandas-dev/pandas

September 13, 2016 - import pandas as pd from sqlalchemy import create_engine import sqlite3 conn = sqlite3.connect('example.db') c = conn.cursor() c.execute('''DROP TABLE IF EXISTS person_age;''') c.execute(''' CREATE TABLE person_age (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL) ''') conn.commit() conn.close() ##### Create original table engine = create_engine("sqlite:///example.db") sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]}) sql_df.to_sql('person_age', engine, if_exists='append', index=False) #### Extra data to insert/update extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]}) extra_data.set_index('id', inplace=True) #### extra_data.to_sql() with row update or insert option expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]}) expected_df.set_index('id', inplace=True)

Author cdagnino

PyPI

pypi.org › project › pandabase

Client Challenge

JavaScript is disabled in your browser · Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Find elsewhere

Google Bing Mojeek

reddit.com › r/python › using sqlalchemy and pandas to create a database for your flask app

r/Python on Reddit: Using SQLAlchemy and Pandas to create a database for your Flask app

December 16, 2013 -

Had an issue with this today and figured others might benefit from the solution.

I've been creating some of the tables for the Postgres database in my Flask app with Pandas to_sql method (the datasource is messy and Pandas handles all the issues very well with very little coding on my part). The rest of the tables are initialized with a SQLAlchemy model, which allows my to easily query the tables with session.query:

db.session.query(AppUsers).filter_by(id = uId).first()

I couldn't however query the tables created by Pandas because they weren't part of the model. Enter automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):

metadata = MetaData()
metadata.reflect(db.engine, only=tableNamesDict.values())
Base = automap_base(metadata=metadata)
Base.prepare()

Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:

for df in dfs.keys():
    cols = dfs[df].columns
    cols = [str(col) for col in cols if 'id' in col.lower()]
    schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
    db.engine.execute('DROP TABLE ' + df + ';')
    db.engine.execute(schema)
    dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')

I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.

Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:

alchemyClassDict = {}
for t in Base.classes.keys():
    alchemyClassDict[t] = Base.classes[t]

And query with:

res = db.session.query(alchemyClassDict['user']).first()

Hope this helps someone.

1 of 3

2 of 3

I'm sure Pandas to_sql has a way to indicate the primary key... nope.

Pandas to_sql method does have an index_label parameter: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

What version of Pandas are using?

github.com › pandas-dev › pandas › issues › 15988

When using to_sql(), continue if duplicate primary keys are detected? · Issue #15988 · pandas-dev/pandas

April 12, 2017 - It would make sense for to_sql(if_exists='append') to merely warn the user which rows had duplicate keys and just continue to add the new rows, not completely stop executing.

Author rosstripi

Medium

medium.com › @erkansirin › worst-way-to-write-pandas-dataframe-to-database-445ec62025e0

Worst Way to Write Pandas Dataframe to Database | by Erkan Şirin | Medium

December 27, 2022 - By specifying the schema as we write. Now let’s create a schema. This schema is given as the dtype argument to the to_sql method and this argument type is a dictionary. In this dictionary, keys represent column names and values represent data types. So what about data types? mysql or pandas?

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.0rc2+20.g501c5052ca documentation

DNMTechs

dnmtechs.com › creating-a-table-with-primary-key-using-python-pandas-to_sql

Creating a Table with Primary Key using Python Pandas to_sql – DNMTechs – Sharing and Storing Technology Knowledge

After executing the code, the table users will be created in the SQLite database with the specified columns and data. The id column will serve as the primary key, ensuring uniqueness for each record. Using Pandas to_sql, we can easily create tables with primary keys in a database.

w3resource

w3resource.com › pandas › dataframe › dataframe-to_sql.php

Pandas DataFrame: to_sql() function - w3resource

August 19, 2022 - DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

stackoverflow.com › questions › tagged › pandas-to-sql

Unanswered 'pandas-to-sql' Questions - Stack Overflow

I want to add some data to the database with pandas .to_sql command. Is there a way I can get the auto-generated primary-key of the inserted objects as I need it for creating foreign keys?

github.com › pandas-dev › pandas › issues › 7984

API to_sql method doesn't provide the option to specify unique indexes · Issue #7984 · pandas-dev/pandas

June 2, 2014 - sqlite> .schema CREATE TABLE "Sniffs" ( "ID" TEXT, "Day" TEXT, "ModelName" TEXT, "PassRate" FLOAT, "RtxName" TEXT, "Shadow" TEXT, "Time" TEXT, "Unit" TEXT ); CREATE INDEX "ix_Sniffs_ID" ON "Sniffs" ("ID"); Looking at these docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql

Author manistal

reddit.com › r/pythontips › how to update an sql database with a dataframe referencing a primary key/column id?

r/pythontips on Reddit: How to update an SQL database with a dataframe referencing a primary key/column ID?

December 17, 2021 -

Lets say I have an excel with columns [Class ID (reference key), Mean Score, Mode Score, Median Score] and I have an SQL database with columns [Class_ID (primary key), Mean_Score, Mode_Score, Median_Score, Last_updated]

How can I use pandas on python to read in the excel and update the SQL database with reference to the primary key?
Would the difference in column names matter?
How can I keep track of when was the last time the class id was updated?
What if there is new class ID of data that does not exist in the SQL database?
Looking at the df.itterrows() function, what if I have a lot of columns is there a way I tuple,list,dictionary it?

Pandas Dataframe has a .to_sql() function. You would use .set_index() to set the primary key. Difference in column names will matter. My suggestion is you just export your DataFrame .to_dict() and write a function to pass that dict to a sql alchemy connection.

1 of 2

2 of 2

Read the excel to a DF then you could rename your columns in the DF and drop any columns in your DF that are not in your SQL table. You can convert it to a dictionary which can be passed to SQLAlchemy for update/insert/delete. Do you have control of the table you are loading to? If so add a trigger to set a “lastmodified” column whenever the row is updated or set this in your create/update that you pass. Since you are using SQLAlchemy to create the engine anyways why not utilize it to handle your checks and updates? You can bulk check if your PKs exists (using get_primary_keys()) to determine if there are any truly new rows to insert. If you then have logic to update existing rows you can loop through your data and use the query to check which rows need to be updated if there is a change. You then pass that dict in an update. Edit: Additional thought - Depending on the use case you could setup python as the Extract and load in your ELT. Always append data to a raw table with a time stamp then have a stored proc in the DB to do a merge statement (could also apply a SCD type 2 if you desired with a is_current and start_date/end_date) to maintain a historical tracking. You then have your “raw” table that has every row that has been inserted/updated during each run in case anything needs to be rolled back.

stackoverflow.com › questions › 77944319 › is-it-possible-to-create-foreign-keys-with-pandas-to-sql › 77978949

sqlalchemy - Is it possible to create foreign keys with pandas to_sql - Stack Overflow

... Pandas has no concept of primary or foreign key, or indeed any relations between dataframes, so your dataframes don't have this relationship (except in the sense that it exists in your mind).

pandas

pandas.pydata.org › pandas-docs › dev › reference › api › pandas.DataFrame.to_sql.html

pandas.DataFrame.to_sql — pandas 3.0.0rc0+40.gecf28e538a documentation

stackoverflow.com › questions › 63701600 › how-to-choose-my-primary-key-column-in-saving-pandas-dataframe-to-sql

python - How to choose my primary key column in saving pandas dataframe to sql - Stack Overflow

1 of 1

Unfortunately, it is not possible with df.to_sql(). But you can do it in below given way:

df.to_sql('dedupe__df', engine, if_exists="replace", index=False)
with engine.connect() as con:
    con.execute('ALTER TABLE dedupe__df ADD PRIMARY KEY (path);')

Table name & index column is referenced from your code.