So I have found a workaround: use pymssql instead of pyodbc (both in the import statement and in the engine). It lets you build your joins using database names and without specifying them in the engine. And there is no need to specify a driver in this case.
There might be a problem if you are using Python 3.6 which is not supported by pymssql oficially yet, but you can find unofficial wheels for your Python 3.6 here. It works as is supposed to with my queries.
Here is the original code with joins, rebuilt to work with pymssql:
import pandas as pd
import sqlalchemy as sql
import pymssql
server = '100.10.10.10'
myQuery = '''SELECT first.Field1, second.Field2
FROM db1.schema.Table1 AS first
JOIN db2.schema.Table2 AS second
ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pymssql://{}'.format(server))
df = pd.read_sql_query(myQuery, engine)
As for the unofficial wheels, you need to download the file for Python 3.6 from the link I gave above, then cd to the download folder and run pip install wheels where 'wheels' is the name of the wheels file.
UPDATE:
Actually, it is possible to use pyodbc too. I am not sure if this should work for any SQL Server setup, but everything worked for me after I had set 'master' as my database in the engine. The resulting code would look like this:
import pandas as pd
import sqlalchemy as sql
import pyodbc
server = '100.10.10.10'
driver = 'SQL+Server'
db = 'master'
myQuery = '''SELECT first.Field1, second.Field2
FROM db1.schema.Table1 AS first
JOIN db2.schema.Table2 AS second
ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pyodbc://{}/{}?driver={}'.format(server, db, driver))
df = pd.read_sql_query(myQuery, engine)
Answer from Sergey Zakharov on Stack OverflowSo I have found a workaround: use pymssql instead of pyodbc (both in the import statement and in the engine). It lets you build your joins using database names and without specifying them in the engine. And there is no need to specify a driver in this case.
There might be a problem if you are using Python 3.6 which is not supported by pymssql oficially yet, but you can find unofficial wheels for your Python 3.6 here. It works as is supposed to with my queries.
Here is the original code with joins, rebuilt to work with pymssql:
import pandas as pd
import sqlalchemy as sql
import pymssql
server = '100.10.10.10'
myQuery = '''SELECT first.Field1, second.Field2
FROM db1.schema.Table1 AS first
JOIN db2.schema.Table2 AS second
ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pymssql://{}'.format(server))
df = pd.read_sql_query(myQuery, engine)
As for the unofficial wheels, you need to download the file for Python 3.6 from the link I gave above, then cd to the download folder and run pip install wheels where 'wheels' is the name of the wheels file.
UPDATE:
Actually, it is possible to use pyodbc too. I am not sure if this should work for any SQL Server setup, but everything worked for me after I had set 'master' as my database in the engine. The resulting code would look like this:
import pandas as pd
import sqlalchemy as sql
import pyodbc
server = '100.10.10.10'
driver = 'SQL+Server'
db = 'master'
myQuery = '''SELECT first.Field1, second.Field2
FROM db1.schema.Table1 AS first
JOIN db2.schema.Table2 AS second
ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pyodbc://{}/{}?driver={}'.format(server, db, driver))
df = pd.read_sql_query(myQuery, engine)
The following code is working for me. I am using SQL server with SQLAlchemy
import pyodbc
import pandas as pd
cnxn = pyodbc.connect('DRIVER=ODBC Driver 17 for SQL Server;SERVER=your_db_server_id,your_db_server_port;DATABASE=pangard;UID=your_db_username;PWD=your_db_password')
query = "SELECT * FROM database.tablename;"
df = pd.read_sql(query, cnxn)
print(df)
BUG: SQLAlchemy 2.0.0 creates incompatibility with Pandas read_sql (fails to execute)
python - How to convert SQL Query to Pandas DataFrame using SQLAlchemy ORM? - Stack Overflow
python - Pandas read_sql_query with SQLAlchemy 2 - Stack Overflow
any way to increase sqlalchemy/pandas write speed?
pyodbc is slow for bulk inserts (https://github.com/mkleehammer/pyodbc/issues/120) only recently a new option has been added to mitigate the problem. (https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API#fast_executemany)
2) Alternatively try turboodbc as the DBI driver. (http://turbodbc.readthedocs.io/en/latest/)
3) You could also rework your data processing workflow. Save the dataframe as CSV and then write a separate script bulk loading the CSV file into DB. This is the approach I ended up using. Oh and I'm using ceODB as the driver which is the fastest based on my testing.
More on reddit.com