Brave Search

stats.stackexchange.com › questions › 491500 › difference-between-quantiles

It seems that your data are measured with resolution 0.1 and that the range is at least 18.7. My guess given the mention of "weather" is that they are Celsius temperatures.

Let's guess that the variable has a range 50 in those units: the tails beyond the quartiles are often longer than the difference between the quartiles. That would mean of the order of 500 distinct values.

It seems that your sample size is of the order of 500000, so on average each distinct value occurs about 1000 times, and ties are everywhere.

It's also entirely possible that your data are quirkier than that if human readings are involved. Many observers use some final digits rather than others, although the quirks can vary, including preferences for 0 and 5 as final digits or for even digits.

Ties are likely to be the issue, together with a rule that the same values must be assigned to the same bin.

Answer from Nick Cox on Stack Exchange

Joseantunes

joseantunes.tech › random code › 2018 › 09 › 07 › pandas-and-psycopg2.html

Use psycopg2 to query into a DataFrame · Jose Antunes

September 7, 2018 - import pandas.io.sql as sqlio SQL_QUERY = "SELECT * FROM test_table WHERE id = ANY(%s)" test_ids = [1, 2, 3] result_df = sqlio.read_sql_query(SQL_QUERY, params=(test_ids,), conn)

Stack Exchange

stats.stackexchange.com › questions › 491500 › difference-between-quantiles

pandas - Difference between quantiles - Cross Validated

Top answer

1 of 1

It seems that your data are measured with resolution 0.1 and that the range is at least 18.7. My guess given the mention of "weather" is that they are Celsius temperatures.

It seems that your sample size is of the order of 500000, so on average each distinct value occurs about 1000 times, and ties are everywhere.

Ties are likely to be the issue, together with a rule that the same values must be assigned to the same bin.

Stack Overflow

stackoverflow.com › questions › 73734510 › how-to-execute-sql-query-with-parameters-in-pandas

python - How to execute SQL query with parameters in Pandas - Stack Overflow

import pandas.io.sql as sqlio def getAnalysisMetaStatsDF(self): session = self.connection() ids = self.getAnalysisIds() # this is a list of integers data = sqlio.read_sql_query("Select * from analysis_stats where analysis_id in %s", [tuple(ids)], session) print(data)

Stack Overflow

stackoverflow.com › questions › 36840438 › binding-list-to-params-in-pandas-read-sql-query-with-other-params

python - Binding list to params in Pandas read_sql_query with other params - Stack Overflow

Top answer

1 of 3

Break this up into three parts to help isolate the problem and improve readability:

Build the SQL string
Set parameter values
Execute pandas.read_sql_query

Build SQL

First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.

Example

member_list = (1,2,3)
sql = """select member_id, yearmonth
         from queried_table
         where yearmonth between {0} and {0}
         and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)

Returns

select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)

Set Parameter Values

Now ensure parameter values are organized into a flat tuple

Example

# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
    for el in l:
        try:
            yield from flatten(el)
        except TypeError:
            yield el

params = tuple(flatten((201601, 201603, member_list)))
print(params)

Returns

(201601, 201603, 1, 2, 3)

Execute

Finally bring the sql and params values together in the read_sql_query call

query = pd.read_sql_query(sql, db2conn, params)

2 of 3

WARNING! Although my proposed solution here works, it is prone to SQL injection attacks. Therefor, it should never be used directly in backend code! It is only safe for offline analysis.

If you're using python 3.6+ you could also use a formatted string litteral for your query (cf https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)

start, end = 201601, 201603
selected_members = (111, 222, 333, 444, 555)  # requires to be a tuple

query = f"""
    SELECT member_id, yearmonth FROM queried_table
    WHERE yearmonth BETWEEN {start} AND {end}
      AND member_id IN {selected_members}
"""

df = pd.read_sql_query(query, db2conn)

Medium

yiruchen1993.medium.com › pandas-to-postgresql-3ab3b7216faa

Pandas to PostgreSQL. Write a pandas DataFrame to a SQL… | by imflorachen | Medium

November 24, 2020 - # DB table to df query_sql = “SELECT * FROM %s;” % ‘mytable’ table_data = sqlio.read_sql_query(query_sql, postgreSQLConnection)

Stack Overflow

stackoverflow.com › questions › 70892143 › psycopg2-connection-sql-database-to-pandas-dataframe

python - Psycopg2 connection sql database to pandas dataframe - Stack Overflow

Top answer

1 of 2

You can use pandas sqlio module to run and save query within pandas dataframe.

Let's say you have a connection of psycopg2 connection then you can use pandas sqlio like this.

import pandas.io.sql as sqlio
data = sqlio.read_sql_query("SELECT * FROM table", connection)
# Now data is a pandas dataframe having the results of above query.
data.head()

For me, sqlio pandas module is working fine. Please have a look at it and let me know if this is what you are looking for.

2 of 2

This may be helpful for your case:

import pandas.io.sql as sqlio

df = sqlio.read_sql_query(query, connection)

Where in your case, query = "select * from table"

Pandas

pandas.pydata.org › pandas-docs › version › 0.20 › generated › pandas.read_sql_query.html

pandas.read_sql_query — pandas 0.20.3 documentation

Read SQL query into a DataFrame.

Stack Overflow

stackoverflow.com › questions › 27770729 › how-to-read-sql-query-to-pandas-dataframe-python-django

mysql - How to read sql query to pandas dataframe / python / django - Stack Overflow

Top answer

1 of 2

I think aus_lacy is a bit off in his solution - first you have to convert the QuerySet to a string containing the SQL backing the QuerySet

from django.db import connection

query = str(ModelToRetrive.objects.all().query)
df = pandas.read_sql_query(query, connection)

Also there is a less memory efficient but still valid solution:

df = DataFrame(list(ModelToRetrive.objects.values('id','some_attribute_1','some_attribute_2')))

2 of 2

You need to use Django's built in QuerySet API. More information on it can be seen here. Once you create a QuerySet you can then use pandas read_sql_query method to construct the data frame. The simplest way to construct a QuerySet is simply query the entire database which can be done like so:

db_query = YourModel.objects.all()

You can use filters which are passed in as args when querying the database to create different QuerySet objects depending on what your needs are.

Then using pandas you could do something like:

d_frame = pandas.read_sql_query(db_query, other_args...)

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_sql_query.html

pandas.read_sql_query — pandas 3.0.1 documentation

>>> from sqlalchemy import create_engine >>> engine = create_engine("sqlite:///database.db") >>> sql_query = "SELECT int_column FROM test_data" >>> with engine.connect() as conn, conn.begin(): ... data = pd.read_sql_query(sql_query, conn)

Find elsewhere

Google Bing Mojeek

Dataquest Community

community.dataquest.io › q&a › dq courses

Read_sql() vs read_sql_query()? - DQ Courses - Dataquest Community

December 19, 2019 - What is the difference between read_sql() and read_sql_query()?

MSSQLTips

mssqltips.com › home › benchmarking sql server io with sqlio

Benchmarking SQL Server IO with SQLIO

October 4, 2010 - SQL Server does random reads when doing bookmark lookups or when reading from fragmented tables. Here are the tests that I ran: sqlio -dL -BH -kR -frandom -t1 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -frandom -t2 -o1 -s90 -b64 testfile.dat sqlio -dL -BH -kR -frandom -t4 -o1 -s90 -b64 ...

GitHub

gist.github.com › jakebrinkmann › de7fd185efe9a1f459946cf72def057e

Read SQL query from psycopg2 into pandas dataframe · GitHub

import pandas as pd import psycopg2 with psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host, port, dbname, username, pwd)) as conn: sql = "select count(*) from table;" dat = pd.read_sql_query(sql, conn)

Brent Ozar Unlimited®

brentozar.com › archive › 2008 › 09 › finding-your-san-bottlenecks-with-sqlio

Finding Your SAN Bottlenecks with SQLIO

February 13, 2017 - SQLIO isn’t CPU-bound at all, and you can use more threads than you have processors. The more load we throw at storage, the faster it goes – to a point. ... -b8 and -b64: the size of our IO requests in kilobytes. SQL Server does a lot of random stuff in 8KB chunks, and we’re also testing sequential stuff in 64KB chunks. -frandom and -fsequential: random versus sequential access. Many queries ...

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas read sql query or table with examples

Pandas Read SQL Query or Table with Examples - Spark By {Examples}

December 2, 2024 - Pandas read_sql() function is used to read data from SQL queries or database tables into DataFrame. This function allows you to execute SQL queries and

Red Gate Software

red-gate.com › home › the sql server sqlio utility

The SQL Server Sqlio Utility | Simple Talk

August 24, 2021 - The next step is to define a set of sqlio commands that use a variety of I/O sizes and types to test each I/O path. Note, however, that you’re not trying to simulate SQL Server I/O patterns. Instead, you’re trying to determine your I/O subsystem’s capacity. That means running tests for both read ...

Stack Overflow

stackoverflow.com › questions › 67494746 › pandas-io-sql-syntax-error-at-or-near-top

python - pandas.io.sql: syntax error at or near "top" - Stack Overflow

sql = """SELECT * FROM "%s" where 1=1 top 100;""" % table_name df = sqlio.read_sql_query(sql, conn)

Stack Overflow

stackoverflow.com › questions › 17156084 › unpacking-a-sql-select-into-a-pandas-dataframe

python - unpacking a sql select into a pandas dataframe - Stack Overflow

Top answer

1 of 5

You can pass a cursor object to the DataFrame constructor. For postgres:

import psycopg2
conn = psycopg2.connect("dbname='db' user='user' host='host' password='pass'")
cur = conn.cursor()
cur.execute("select instrument, price, date from my_prices")
df = DataFrame(cur.fetchall(), columns=['instrument', 'price', 'date'])

then set index like

df.set_index('date', drop=False)

or directly:

df.index =  df['date']

2 of 5

Update: recent pandas have the following functions: read_sql_table and read_sql_query.

First create a db engine (a connection can also work here):

from sqlalchemy import create_engine
# see sqlalchemy docs for how to write this url for your database type:
engine = create_engine('mysql://scott:tiger@localhost/foo')

See sqlalchemy database urls.

pandas_read_sql_table

table_name = 'my_prices'
df = pd.read_sql_table(table_name, engine)

pandas_read_sql_query

df = pd.read_sql_query("SELECT instrument, price, date FROM my_prices;", engine)

The old answer had referenced read_frame which is has been deprecated (see the version history of this question for that answer).

It's often makes sense to read first, and then perform transformations to your requirements (as these are usually efficient and readable in pandas). In your example, you can pivot the result:

df.reset_index().pivot('date', 'instrument', 'price')

Note: You could miss out the reset_index you don't specify an index_col in the read_frame.

Stack Overflow

stackoverflow.com › questions › 41324503 › pandas-sqlite-query-using-variable

python - Pandas Sqlite query using variable - Stack Overflow

Top answer

1 of 3

You need to use the params keyword argument:

f = pd.read_sql_query('SELECT open FROM NYSEMSFT WHERE date = (?)', conn, params=(date,))

2 of 3

As @alecxe and @Ted Petrou have already said, use explicit parameter names, especially for the params parameter as it's a fourth parameter in the pd.read_sql_query() function and you used it as a third one (which is coerce_float)

But beside that you can improve your code by getting rid of the for date in dates: loop using the following trick:

import sqlite3

dates=['2001-01-01','2002-02-02']
qry = 'select * from aaa where open in ({})'

conn = sqlite3.connect(r'D:\temp\.data\a.sqlite')

df = pd.read_sql(qry.format(','.join(list('?' * len(dates)))), conn, params=dates)

Demo:

Source SQLite table:

sqlite> .mode column
sqlite> .header on
sqlite> select * from aaa;
open
----------
2016-12-25
2001-01-01
2002-02-02

Test run:

In [40]: %paste
dates=['2001-01-01','2002-02-02']
qry = 'select * from aaa where open in ({})'
conn = sqlite3.connect(r'D:\temp\.data\a.sqlite')

df = pd.read_sql(qry.format(','.join(list('?' * len(dates)))), conn, params=dates)
## -- End pasted text --

In [41]: df
Out[41]:
         open
0  2001-01-01
1  2002-02-02

Explanation:

In [35]: qry = 'select * from aaa where open in ({})'

In [36]: ','.join(list('?' * len(dates)))
Out[36]: '?,?'

In [37]: qry.format(','.join(list('?' * len(dates))))
Out[37]: 'select * from aaa where open in (?,?)'

In [38]: dates.append('2003-03-03')   # <-- let's add a third parameter

In [39]: qry.format(','.join(list('?' * len(dates))))
Out[39]: 'select * from aaa where open in (?,?,?)'

ProgramCreek

programcreek.com › python › example › 101334 › pandas.read_sql_query

Python Examples of pandas.read_sql_query

def test_nan_fullcolumn(self): # full NaN column (numeric float column) df = DataFrame({'A': [0, 1, 2], 'B': [np.nan, np.nan, np.nan]}) df.to_sql('test_nan', self.conn, index=False) # with read_table result = sql.read_sql_table('test_nan', self.conn) tm.assert_frame_equal(result, df) # with read_sql -> not type info from table -> stays None df['B'] = df['B'].astype('object') df['B'] = None result = sql.read_sql_query('SELECT * FROM test_nan', self.conn) tm.assert_frame_equal(result, df)

Stack Overflow

stackoverflow.com › questions › 51170169 › clean-up-database-connection-with-sqlalchemy-in-pandas

python - Clean-up database connection with SQLAlchemy in Pandas - Stack Overflow

Top answer

1 of 2

Backgrounds:

When using sqlalchemy with pandas read_sql_query(query, con) method, it will create a SQLDatabase object with an attribute connectable to self.connectable.execute(query). And the SQLDatabase.connectable is initialized as con as long as it is an instance of sqlalchemy.engine.Connectable (i.e. Engine and Connection).

Case I: when passing `Engine` object as `con`

Just as example code in your question:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)

Internally, pandas just use result = engine.execute(query), which means:

Where above, the execute() method acquires a new Connection on its own, executes the statement with that object, and returns the ResultProxy. In this case, the ResultProxy contains a special flag known as close_with_result, which indicates that when its underlying DBAPI cursor is closed, the Connection object itself is also closed, which again returns the DBAPI connection to the connection pool, releasing transactional resources.

In this case, you don't have to worry about the Connection itself, which is closed automatically, but it will keep the connection pool of engine.

So you can either disable pooling by using:

engine = create_engine('...', poolclass=NullPool)

or dispose the engine entirely with engine.dispose() at the end.

But following the Engine Disposal doc (the last paragraph), these two are alternative, you don't have to use them at the same time. So in this case, for simple one-time usage of read_sql_query and clean-up, I think this should be enough:

# Clean up entirely after every query.
engine = create_engine('...')
df = pd.read_sql_query(query, con=engine)
engine.dispose()

Case II: when passing `Connection` object as `con`:

connection = engine.connect()
print(connection.closed) # False
df = pd.read_sql_query(query, con=connection)
print(connection.closed) # False again
# do_something_else(connection)
connection.close()
print(connection.closed) # True
engine.dispose()

You should do this whenever you want greater control over attributes of the connection, when it gets closed, etc. For example, a very import example of this is a Transaction, which lets you decide when to commit your changes to the database. (from this answer)

But with pandas, we have no control inside the read_sql_query, the only usefulness of connection is that it allows you to do more useful things before we explicitly close it.

So generally speaking:

I think I would like to use following pattern, which gives me more control of connections and leaves the future extensibility:

engine = create_engine('...')
# Context manager makes sure the `Connection` is closed safely and implicitly
with engine.connect() as conn:
    df = pd.read_sql_query(query, conn)
    print(conn.in_transaction()) # False
    # do_something_with(conn)
    trans = conn.begin()
    print(conn.in_transaction()) # True
    # do_whatever_with(trans)
    print(conn.closed) # False
print('Is Connection with-OUT closed?', conn.closed) # True
engine.dispose()

But for simple usage cases such as your example code, I think both ways are equally clean and simple for clean-up DB IO resources.

2 of 2

I have tested and even after the connection is closed (connection.close()), it is still present on the table sys.sysprocesses (of the database) throughout the execution of the script. Thus, if the script (after the connection) lasts another 10 minutes, the connection remains present on the sys.sysprocesses table for 10 minutes.

I think it is significant to draw attention to this fact: connection closed YES, process in the database closed NO.

Here are some scripts I used for testing:

sql = "select * from tbltest"
s_con = '...' #connection information

con = URL.create("mssql+pyodbc", query={"odbc_connect": s_con})
engine = create_engine(con)

with engine.connect() as con:
    frame = pd.read_sql(sql=sql, con=con)
    print(con.closed) # False

print(con.closed) # True
engine.dispose()

from time import sleep
sleep(20) # Pause for 20 seconds to launch the query with SSMS

Use of SSMS

Query for check connection
SELECT * FROM sys.sysprocesses

Build SQL

Example

Returns

Set Parameter Values

Example

Returns

Execute

pandas_read_sql_table

pandas_read_sql_query

Backgrounds:

Case I: when passing Engine object as con

Case II: when passing Connection object as con:

So generally speaking:

Case I: when passing `Engine` object as `con`

Case II: when passing `Connection` object as `con`: