Using the engine in place of the raw_connection() worked:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]@[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
Not clear on why when I tried this yesterday it gave me the earlier error.
Answer from AsAP_Sherb on Stack Overflowpython - Writing to MySQL database with pandas using SQLAlchemy, to_sql - Stack Overflow
Using SQLAlchemy and Pandas to create a database for your Flask app
I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!
More on reddit.compython - SQLAlchemy ORM conversion to pandas DataFrame - Stack Overflow
sqlalchemy vs pandas
Videos
Using the engine in place of the raw_connection() worked:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]@[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
Not clear on why when I tried this yesterday it gave me the earlier error.
Alternatively, use pymysql package...
import pymysql
from sqlalchemy import create_engine
cnx = create_engine('mysql+pymysql://[user]:[pass]@[host]:[port]/[schema]', echo=False)
data = pd.read_sql('SELECT * FROM sample_table', cnx)
data.to_sql(name='sample_table2', con=cnx, if_exists = 'append', index=False)
Had an issue with this today and figured others might benefit from the solution.
I've been creating some of the tables for the Postgres database in my Flask app with Pandas to_sql method (the datasource is messy and Pandas handles all the issues very well with very little coding on my part). The rest of the tables are initialized with a SQLAlchemy model, which allows my to easily query the tables with session.query:
db.session.query(AppUsers).filter_by(id = uId).first()
I couldn't however query the tables created by Pandas because they weren't part of the model. Enter automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):
metadata = MetaData() metadata.reflect(db.engine, only=tableNamesDict.values()) Base = automap_base(metadata=metadata) Base.prepare()
Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:
for df in dfs.keys():
cols = dfs[df].columns
cols = [str(col) for col in cols if 'id' in col.lower()]
schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
db.engine.execute('DROP TABLE ' + df + ';')
db.engine.execute(schema)
dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')
I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.
Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:
alchemyClassDict = {}
for t in Base.classes.keys():
alchemyClassDict[t] = Base.classes[t]And query with:
res = db.session.query(alchemyClassDict['user']).first()
Hope this helps someone.
I use more of the numerical/scientific computing side of the Python community and I have a project that I want to serve up as a web app. This is perfect for me since the project contains heavy use of pandas. Thanks for this!
I'm sure Pandas to_sql has a way to indicate the primary key... nope.
Pandas to_sql method does have an index_label parameter: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
What version of Pandas are using?
Below should work in most cases:
df = pd.read_sql(query.statement, query.session.bind)
See pandas.read_sql documentation for more information on the parameters.
Just to make this more clear for novice pandas programmers, here is a concrete example,
pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind)
Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2
I handles datas from our company using pandas just fine, and I always know sql is specifically use for data management, yet I don't know the differences between them. Queries works just fine with pandas and I'm quite confident at using pandas.
Not having to link to the cloud, is there any benefits store files using database instead of csvs? If so, which library should I be looking into?
» pip install SQLAlchemy
I have several tables with millions of rows that need to be queried for varying criteria based on data research. Currently working on creating a programmatic approach to querying rather than raw-dogging SQL, causing database issues, and getting yelled at by DBAs.
Here's the issue: using pandas.read_sql_query is not being the most performant. Any insight on how to tinker and make it more efficient would be greatly appreciated.
Here's the setup:
Remote oracle database
Partitioned table
Connection via SQL Alchemy using:
User name/password, port, dialect, driver (CX_Oracle), host, and host name.
Connection variables past into sqlalchemy.engine.create_engine()
Resulting engine is used to .connect() along with:
Execute_options(stream_results = True) (my first effort at optimization, using a server side cursor)
Resulting connection is used in pandas.read_sql_query() using:
Aforementioned streaming connection
Chunksize of 10000 (I did some tinkering with various chunk sizes, smaller and larger. This is where I've found the most performant which is still ~60 seconds per 100k rows.)
The read_sql_query is then concat'd against an empty DataFrame repeatedly as the query iterates through the chunks.
Here's the code snippet of the query:
for chunk in read_sql_query(query, self.__connection, chunksize = 10000):
self.queryResultContainer[database] = concat([self.queryResultContainer[database], chunk], ignore_index = True)One noteworthy mention:
The query being past into read_sql_query performs within seconds as a standalone query.