It seems that your data are measured with resolution 0.1 and that the range is at least 18.7. My guess given the mention of "weather" is that they are Celsius temperatures.

Let's guess that the variable has a range 50 in those units: the tails beyond the quartiles are often longer than the difference between the quartiles. That would mean of the order of 500 distinct values.

It seems that your sample size is of the order of 500000, so on average each distinct value occurs about 1000 times, and ties are everywhere.

It's also entirely possible that your data are quirkier than that if human readings are involved. Many observers use some final digits rather than others, although the quirks can vary, including preferences for 0 and 5 as final digits or for even digits.

Ties are likely to be the issue, together with a rule that the same values must be assigned to the same bin.

Answer from Nick Cox on Stack Exchange
🌐
Fizzy
fizzy.cc › fetch-data-from-postgresql-databases-in-python
Fetch Data from PostgreSQL Databases in Python - Fizzy
January 3, 2020 - import datetime import time import psycopg2 import pandas.io.sql as sqlio import pandas as pd def get_data(): bizdate = (datetime.date.today() - datetime.timedelta(1)).strftime('%Y%m%d') print('Define bizdate = ' + bizdate) print('Create Database Connection') conn = psycopg2.connect( dbname="dbname", user="username", password="password", host="postgresql_host", port="443" ) print('Read SQL Queries') sql_query = """ SELECT * FROM your_table_name WHERE ds = "${bizdate}"; """ print('Fetch Data') df = sqlio.read_sql_query(sql_query.replace('${bizdate}',bizdate), conn) print('Export CSV Files') df.to_csv('./csv/weixin_xiaoxi.csv', index=False, encoding = 'utf-8') print('Close Database Connection') conn = None if __name__ == "__main__": get_data() print('Finish!') now = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) print(now) print('==================================')
🌐
Joseantunes
joseantunes.tech › random code › 2018 › 09 › 07 › pandas-and-psycopg2.html
Use psycopg2 to query into a DataFrame · Jose Antunes
September 7, 2018 - import pandas.io.sql as sqlio SQL_QUERY = "SELECT * FROM test_table WHERE id = ANY(%s)" test_ids = [1, 2, 3] result_df = sqlio.read_sql_query(SQL_QUERY, params=(test_ids,), conn)
🌐
Stack Overflow
stackoverflow.com › questions › 73734510 › how-to-execute-sql-query-with-parameters-in-pandas
python - How to execute SQL query with parameters in Pandas - Stack Overflow
import pandas.io.sql as sqlio def getAnalysisMetaStatsDF(self): session = self.connection() ids = self.getAnalysisIds() # this is a list of integers data = sqlio.read_sql_query("Select * from analysis_stats where analysis_id in %s", [tuple(ids)], session) print(data)
Top answer
1 of 3
27

Break this up into three parts to help isolate the problem and improve readability:

  1. Build the SQL string
  2. Set parameter values
  3. Execute pandas.read_sql_query

Build SQL

First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.

Example

member_list = (1,2,3)
sql = """select member_id, yearmonth
         from queried_table
         where yearmonth between {0} and {0}
         and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)

Returns

select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)

Set Parameter Values

Now ensure parameter values are organized into a flat tuple

Example

# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
    for el in l:
        try:
            yield from flatten(el)
        except TypeError:
            yield el

params = tuple(flatten((201601, 201603, member_list)))
print(params)

Returns

(201601, 201603, 1, 2, 3)

Execute

Finally bring the sql and params values together in the read_sql_query call

query = pd.read_sql_query(sql, db2conn, params)
2 of 3
15

WARNING! Although my proposed solution here works, it is prone to SQL injection attacks. Therefor, it should never be used directly in backend code! It is only safe for offline analysis.

If you're using python 3.6+ you could also use a formatted string litteral for your query (cf https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)

start, end = 201601, 201603
selected_members = (111, 222, 333, 444, 555)  # requires to be a tuple

query = f"""
    SELECT member_id, yearmonth FROM queried_table
    WHERE yearmonth BETWEEN {start} AND {end}
      AND member_id IN {selected_members}
"""

df = pd.read_sql_query(query, db2conn)
🌐
Medium
yiruchen1993.medium.com › pandas-to-postgresql-3ab3b7216faa
Pandas to PostgreSQL. Write a pandas DataFrame to a SQL… | by imflorachen | Medium
November 24, 2020 - # DB table to df query_sql = “SELECT * FROM %s;” % ‘mytable’ table_data = sqlio.read_sql_query(query_sql, postgreSQLConnection)
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_sql_query.html
pandas.read_sql_query — pandas documentation
>>> from sqlalchemy import create_engine >>> engine = create_engine("sqlite:///database.db") >>> sql_query = "SELECT int_column FROM test_data" >>> with engine.connect() as conn, conn.begin(): ... data = pd.read_sql_query(sql_query, conn)
Find elsewhere
🌐
Programtalk
programtalk.com › python-more-examples › pandas.io.sql.read_sql_query
pandas.io.sql.read_sql_query Example - Program Talk
def test_chunksize_read_type(self): frame = tm.makeTimeDataFrame() frame.index.name = "index" drop_sql = "DROP TABLE IF EXISTS test" cur = self.conn.cursor() cur.execute(drop_sql) sql.to_sql(frame, name="test", con=self.conn) query = "select * from test" chunksize = 5 chunk_gen = read_sql_query( sql=query, con=self.conn, chunksize=chunksize, index_col="index" ) chunk_df = next(chunk_gen) tm.assert_frame_equal(frame[:chunksize], chunk_df) def test_execute(self): 3 View Source File : query.py License : MIT License Project Creator : paul-wolf · def dataframe(self, context=None): """Return a pandas dataframe. This only works if pandas is installed. """ import pandas.io.sql as sqlio self.context(context) sql = self.parse() df = sqlio.read_sql_query(sql, self.connection) df.columns = self.column_headers return df def dicts(self, data=None):
🌐
MSSQLTips
mssqltips.com › home › benchmarking sql server io with sqlio
Benchmarking SQL Server IO with SQLIO
October 4, 2010 - This is achieved when SQL Server is able to do seeks or scans on indexes or heaps that are not fragmented. To test sequential reads I’ll run these tests: sqlio -dL -BH -kR -fsequential -t1 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t2 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t4 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o2 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o4 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o8 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o16 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o32 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o64 -s90 -b64 testfile.dat sqlio -dL -BH -kR -fsequential -t8 -o128 -s90 -b64 testfile.dat
🌐
GitHub
gist.github.com › jakebrinkmann › de7fd185efe9a1f459946cf72def057e
Read SQL query from psycopg2 into pandas dataframe · GitHub
import pandas as pd import psycopg2 with psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host, port, dbname, username, pwd)) as conn: sql = "select count(*) from table;" dat = pd.read_sql_query(sql, conn)
🌐
Brent Ozar Unlimited®
brentozar.com › archive › 2008 › 09 › finding-your-san-bottlenecks-with-sqlio
Finding Your SAN Bottlenecks with SQLIO
February 13, 2017 - SQLIO isn’t CPU-bound at all, and you can use more threads than you have processors. The more load we throw at storage, the faster it goes – to a point. ... -b8 and -b64: the size of our IO requests in kilobytes. SQL Server does a lot of random stuff in 8KB chunks, and we’re also testing sequential stuff in 64KB chunks. -frandom and -fsequential: random versus sequential access. Many queries ...
🌐
Spark By {Examples}
sparkbyexamples.com › home › pandas › pandas read sql query or table with examples
Pandas Read SQL Query or Table with Examples - Spark By {Examples}
December 2, 2024 - Pandas read_sql() function is used to read data from SQL queries or database tables into DataFrame. This function allows you to execute SQL queries and
🌐
Red Gate Software
red-gate.com › home › the sql server sqlio utility
The SQL Server Sqlio Utility | Simple Talk
August 24, 2021 - The next step is to define a set of sqlio commands that use a variety of I/O sizes and types to test each I/O path. Note, however, that you’re not trying to simulate SQL Server I/O patterns. Instead, you’re trying to determine your I/O subsystem’s capacity. That means running tests for both read ...
Top answer
1 of 3
10

You need to use the params keyword argument:

f = pd.read_sql_query('SELECT open FROM NYSEMSFT WHERE date = (?)', conn, params=(date,))
2 of 3
5

As @alecxe and @Ted Petrou have already said, use explicit parameter names, especially for the params parameter as it's a fourth parameter in the pd.read_sql_query() function and you used it as a third one (which is coerce_float)

But beside that you can improve your code by getting rid of the for date in dates: loop using the following trick:

import sqlite3

dates=['2001-01-01','2002-02-02']
qry = 'select * from aaa where open in ({})'

conn = sqlite3.connect(r'D:\temp\.data\a.sqlite')

df = pd.read_sql(qry.format(','.join(list('?' * len(dates)))), conn, params=dates)

Demo:

Source SQLite table:

sqlite> .mode column
sqlite> .header on
sqlite> select * from aaa;
open
----------
2016-12-25
2001-01-01
2002-02-02

Test run:

In [40]: %paste
dates=['2001-01-01','2002-02-02']
qry = 'select * from aaa where open in ({})'
conn = sqlite3.connect(r'D:\temp\.data\a.sqlite')

df = pd.read_sql(qry.format(','.join(list('?' * len(dates)))), conn, params=dates)
## -- End pasted text --

In [41]: df
Out[41]:
         open
0  2001-01-01
1  2002-02-02

Explanation:

In [35]: qry = 'select * from aaa where open in ({})'

In [36]: ','.join(list('?' * len(dates)))
Out[36]: '?,?'

In [37]: qry.format(','.join(list('?' * len(dates))))
Out[37]: 'select * from aaa where open in (?,?)'

In [38]: dates.append('2003-03-03')   # <-- let's add a third parameter

In [39]: qry.format(','.join(list('?' * len(dates))))
Out[39]: 'select * from aaa where open in (?,?,?)'