Simply add the primary key after uploading the table with pandas.
group_export.to_sql(con=engine, name=example_table, if_exists='replace',
flavor='mysql', index=False)
with engine.connect() as con:
con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')
Answer from tomp on Stack OverflowSimply add the primary key after uploading the table with pandas.
group_export.to_sql(con=engine, name=example_table, if_exists='replace',
flavor='mysql', index=False)
with engine.connect() as con:
con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')
Disclaimer: this answer is more experimental then practical, but maybe worth mention.
I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:
Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:
create
pandas.io.SQLDatabaseinstanceengine = sa.create_engine('postgresql:///somedb') pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)define function analoguous to
pandas.io.SQLDatabase.to_sql()but with additional*kwargsargument which is passed topandas.io.SQLTableobject created inside it (i've just copied originalto_sql()method and added*kwargs):def to_sql_k(self, frame, name, if_exists='fail', index=True, index_label=None, schema=None, chunksize=None, dtype=None, **kwargs): if dtype is not None: from sqlalchemy.types import to_instance, TypeEngine for col, my_type in dtype.items(): if not isinstance(to_instance(my_type), TypeEngine): raise ValueError('The type of %s is not a SQLAlchemy ' 'type ' % col) table = pd.io.sql.SQLTable(name, self, frame=frame, index=index, if_exists=if_exists, index_label=index_label, schema=schema, dtype=dtype, **kwargs) table.create() table.insert(chunksize)call this function with your
SQLDatabaseinstance and the dataframe you want to saveto_sql_k(pandas_sql, df2save, 'tmp', index=True, index_label='id', keys='id', if_exists='replace')
And we get something like
CREATE TABLE public.tmp
(
id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)
in the database.
PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.
[pandas] How can I create a primary key when writing a datafield to sql (df.to_sql)?
python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql - Stack Overflow
Adding (Insert or update if key exists) option to `.to_sql`
Using SQLAlchemy and Pandas to create a database for your Flask app
I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!
More on reddit.comVideos
I have a bunch of Excel files, I read them in, do stuff with them and then write them into a SQLite DB.
I need to query those tables, my basic query takes 32 minutes because there are no primary keys set. Doing that by hand doesn't work because I read Excel files and write to DB on a regular base, redoing the primary keys every day is not a solution.
Haven't found anything in the documentation so far.
Any ways to do that in pandas?
Unfortunately there is no way right now to set a primary key in the pandas df.to_sql() method. Additionally, just to make things more of a pain there is no way to set a primary key on a column in sqlite after a table has been created.
However, a work around at the moment is to create the table in sqlite with the pandas df.to_sql() method. Then you could create a duplicate table and set your primary key followed by copying your data over. Then drop your old table to clean up.
It would be something along the lines of this.
import pandas as pd
import sqlite3
df = pd.read_csv("/Users/data/" +filename)
columns = df.columns columns = [i.replace(' ', '_') for i in columns]
#write the pandas dataframe to a sqlite table
df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)
#connect to the database
conn = sqlite3.connect('database')
c = conn.curser()
c.executescript('''
PRAGMA foreign_keys=off;
BEGIN TRANSACTION;
ALTER TABLE table RENAME TO old_table;
/*create a new table with the same column names and types while
defining a primary key for the desired column*/
CREATE TABLE new_table (col_1 TEXT PRIMARY KEY NOT NULL,
col_2 TEXT);
INSERT INTO new_table SELECT * FROM old_table;
DROP TABLE old_table;
COMMIT TRANSACTION;
PRAGMA foreign_keys=on;''')
#close out the connection
c.close()
conn.close()
In the past I have done this as I have faced this issue. Just wrapped the whole thing as a function to make it more convenient...
In my limited experience with sqlite I have found that not being able to add a primary key after a table has been created, not being able to perform Update Inserts or UPSERTS, and UPDATE JOIN has caused a lot of frustration and some unconventional workarounds.
Lastly, in the pandas df.to_sql() method there is a a dtype keyword argument that can take a dictionary of column names:types. IE: dtype = {col_1: TEXT}
In pandas version 0.15, to_sql() got an argument dtype, which can be used to set both dtype and the primary key attribute for all columns:
import sqlite3
import pandas as pd
df = pd.DataFrame({'MyID': [1, 2, 3], 'Data': [3, 2, 6]})
with sqlite3.connect('foo.db') as con:
df.to_sql('df', con=con, dtype={'MyID': 'INTEGER PRIMARY KEY',
'Data': 'FLOAT'})
Had an issue with this today and figured others might benefit from the solution.
I've been creating some of the tables for the Postgres database in my Flask app with Pandas to_sql method (the datasource is messy and Pandas handles all the issues very well with very little coding on my part). The rest of the tables are initialized with a SQLAlchemy model, which allows my to easily query the tables with session.query:
db.session.query(AppUsers).filter_by(id = uId).first()
I couldn't however query the tables created by Pandas because they weren't part of the model. Enter automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):
metadata = MetaData() metadata.reflect(db.engine, only=tableNamesDict.values()) Base = automap_base(metadata=metadata) Base.prepare()
Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:
for df in dfs.keys():
cols = dfs[df].columns
cols = [str(col) for col in cols if 'id' in col.lower()]
schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
db.engine.execute('DROP TABLE ' + df + ';')
db.engine.execute(schema)
dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')
I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.
Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:
alchemyClassDict = {}
for t in Base.classes.keys():
alchemyClassDict[t] = Base.classes[t]And query with:
res = db.session.query(alchemyClassDict['user']).first()
Hope this helps someone.
I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!
I'm sure Pandas to_sql has a way to indicate the primary key... nope.
Pandas to_sql method does have an index_label parameter: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
What version of Pandas are using?
Lets say I have an excel with columns [Class ID (reference key), Mean Score, Mode Score, Median Score] and I have an SQL database with columns [Class_ID (primary key), Mean_Score, Mode_Score, Median_Score, Last_updated]
How can I use pandas on python to read in the excel and update the SQL database with reference to the primary key?
Would the difference in column names matter?
How can I keep track of when was the last time the class id was updated?
What if there is new class ID of data that does not exist in the SQL database?
Looking at the df.itterrows() function, what if I have a lot of columns is there a way I tuple,list,dictionary it?