Break this up into three parts to help isolate the problem and improve readability:
- Build the SQL string
- Set parameter values
- Execute pandas.read_sql_query
Build SQL
First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.
Example
member_list = (1,2,3)
sql = """select member_id, yearmonth
from queried_table
where yearmonth between {0} and {0}
and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)
Returns
select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)
Set Parameter Values
Now ensure parameter values are organized into a flat tuple
Example
# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
for el in l:
try:
yield from flatten(el)
except TypeError:
yield el
params = tuple(flatten((201601, 201603, member_list)))
print(params)
Returns
(201601, 201603, 1, 2, 3)
Execute
Finally bring the sql and params values together in the read_sql_query call
query = pd.read_sql_query(sql, db2conn, params)
Answer from Bryan on Stack OverflowBreak this up into three parts to help isolate the problem and improve readability:
- Build the SQL string
- Set parameter values
- Execute pandas.read_sql_query
Build SQL
First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.
Example
member_list = (1,2,3)
sql = """select member_id, yearmonth
from queried_table
where yearmonth between {0} and {0}
and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)
Returns
select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)
Set Parameter Values
Now ensure parameter values are organized into a flat tuple
Example
# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
for el in l:
try:
yield from flatten(el)
except TypeError:
yield el
params = tuple(flatten((201601, 201603, member_list)))
print(params)
Returns
(201601, 201603, 1, 2, 3)
Execute
Finally bring the sql and params values together in the read_sql_query call
query = pd.read_sql_query(sql, db2conn, params)
WARNING! Although my proposed solution here works, it is prone to SQL injection attacks. Therefor, it should never be used directly in backend code! It is only safe for offline analysis.
If you're using python 3.6+ you could also use a formatted string litteral for your query (cf https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)
start, end = 201601, 201603
selected_members = (111, 222, 333, 444, 555) # requires to be a tuple
query = f"""
SELECT member_id, yearmonth FROM queried_table
WHERE yearmonth BETWEEN {start} AND {end}
AND member_id IN {selected_members}
"""
df = pd.read_sql_query(query, db2conn)
python - How to pass a list of parameter to Pandas read_sql with Teradata - Stack Overflow
read_sql should accept a sql_params parameter
python - pandas read_sql_query with params matching multiple columns - Stack Overflow
Pandas Parametized Query
The read_sql docs say this params argument can be a list, tuple or dict (see docs).
To pass the values in the sql query, there are different syntaxes possible: ?, :1, :name, %s, %(name)s (see PEP249).
But not all of these possibilities are supported by all database drivers, which syntax is supported depends on the driver you are using (psycopg2 in your case I suppose).
In your second case, when using a dict, you are using 'named arguments', and according to the psycopg2 documentation, they support the %(name)s style (and so not the :name I suppose), see http://initd.org/psycopg/docs/usage.html#query-parameters.
So using that style should work:
df = psql.read_sql(('select "Timestamp","Value" from "MyTable" '
'where "Timestamp" BETWEEN %(dstart)s AND %(dfinish)s'),
db,params={"dstart":datetime(2014,6,24,16,0),"dfinish":datetime(2014,6,24,17,0)},
index_col=['Timestamp'])
I was having trouble passing a large number of parameters when reading from a SQLite Table. Then it turns out since you pass a string to read_sql, you can just use f-string. Tried the same with MSSQL pyodbc and it works as well.
For SQLite, it would look like this:
# write a sample table into memory
from sqlalchemy import create_engine
df = pd.DataFrame({'Timestamp': pd.date_range('2020-01-17', '2020-04-24', 10), 'Value1': range(10)})
engine = create_engine('sqlite://', echo=False)
df.to_sql('MyTable', engine);
# query the table using a query
tpl = (1, 3, 5, 8, 9)
query = f"""SELECT Timestamp, Value1 FROM MyTable WHERE Value1 IN {tpl}"""
df = pd.read_sql(query, engine)
If the parameters are datetimes, it's a bit more complicated but calling the datetime conversion function of the SQL dialect you're using should do the job.
start, end = '2020-01-01', '2020-04-01'
query = f"""SELECT Timestamp, Value1 FROM MyTable WHERE Timestamp BETWEEN STRFTIME("{start}") AND STRFTIME("{end}")"""
df = pd.read_sql(query, engine)
You have a couple of issues:
- The package name
teradatasqlis misspelled in your exampletearadatasql.connect - You must compose the IN-predicate with the same number of question-mark parameter markers as the number of values you intend to bind.
In your example, you intend to bind the three values contained in the column_list variable, so you must compose the IN-predicate with three question-mark parameter markers.
Generally speaking, you should dynamically compose the IN-predicate with the number of question-mark parameter markers equal to the number of values in the parameter-value list that you will bind.
Below is a modified version of your example that corrects these two issues. I actually ran this example and verified that it works.
import teradatasql
import pandas as pd
with teradatasql.connect(host="whomooz",user="guest",password="please") as connection:
with connection.cursor() as cur:
cur.execute("create volatile table table1 (c1 varchar(1), c2 integer) on commit preserve rows")
cur.execute("insert into table1 values ('A', 1) ; insert into table1 values ('B', 2)")
column_list = ['A','B','C']
query = "select c1, c2 from table1 where c1 in ({}) order by c1".format(','.join(['?'] * len(column_list)))
print(query)
print("with params={}".format (column_list))
df = pd.read_sql(query, connection, params=column_list)
print(df)
This example produces the following output:
select c1, c2 from table1 where c1 in (?,?,?) order by c1
with params=['A', 'B', 'C']
c1 c2
0 A 1
1 B 2
You should fill in the %s with some parameters
df = psql.read_sql(('select "column","row" from "table1" '
'where "column" in %(col_list)s'), connection, params={'col_list':column_list})
To make a comparison of exact pairs you could convert your array to a dictionary then to JSON, taking advantage of PostgreSQL JSON functions and operators, like this:
#combine lists into a dictionary then convert to json
json1 = json.dumps(dict(zip(list1, list2)))
then query request should be
df = pd.read_sql_query(
"""
select *
from "table"
where concat('{"',col1,'":',col2,'}')::jsonb <@ %s::jsonb
""",
con = conn,
params = (json1,)
)
Or a more general approach for n columns
list1 = ['a', 'b', 'c']
list2 = [1,2,3]
list3 = ['z', 'y', 'x']
#assemble a json
json1 = ','.join(['"f1": "'+x+'"' for x in list1])
json2 = ','.join(['"f2": '+str(x) for x in list2])
json3 = ','.join(['"f3": "'+x+'"' for x in list3])
json_string = '{'+json1+', '+json2+ ', '+json3+'}'
the query
df = pd.read_sql_query(
"""
select *
from "table"
where row_to_json(row(col1,col2,col3))::jsonb <@ %s::jsonb
""",
con = conn,
params = (json_string,)
)
Tested with python 3.10.6
Found a solution that works for any number of input lists:
df_query = pd.read_sql_query(
"""
select * from toy_table
join (select unnest(%(list1)s) as list1, unnest(%(list2)s) as list2) as tmp
on col1 = list1 and col2 = list2
""",
con=engine,
params={"list1": list1, "list2": list2},
)
# expected output
rows = [(x, y) for x, y in zip(list1, list2)]
df_expected = df.loc[df.apply(lambda x: tuple(x.values) in rows, axis=1)]
assert df_query.equals(df_expected)
Hello Guys,
I'm trying to query data with variables using the select statement with pandas and MySQL. Is there a way I can declare the date variable and pass them to the query during runtime. I haven't come across any ways that work from my online research.
Here is my code :
from datetime import datetime
from email import encoders
import smtplib
import pandas as pd
from sqlalchemy import create_engine
from urllib.parse import quote
import mysql.connector as sql
db = create_engine('mysql://root:%s@localhost:3306/store' % quote('Mypass@12!'))
now = datetime.now()
startdate = now.replace(day=1, hour=0, minute=0, second=0, microsecond=0).strftime("%y-%m-%d %H:%M:%S")
currentdate = now.replace(day=3,hour=23, minute=00, second=0).strftime("%y-%m-%d %H:%M:%S")
df1 = pd.read_sql("select * from orders where datecreated > %s and datecreated < %s ", con=db)
pdwriter = pd.ExcelWriter('report.xlsx', engine='xlsxwriter')
df1.to_excel(pdwriter, sheet_name='GEN')
pdwriter.save()I would like to pass the date variables to the query.
Any suggestion is greatly appreciated. Thank you.