pandas read_sql vs read_sql_query

difference between pandas read sql query and read sql table

stackoverflow.com › questions › 48171611 › difference-between-pandas-read-sql-query-and-read-sql-table

I tried this countless times and, despite what I read above, I do not agree with most of either the process or the conclusion.

The process

If you're to compare two methods, adding thick layers of SQLAlchemy or pandasSQL_builder (that is pandas.io.sql.pandasSQL_builder, without so much as an import) and other such non self-contained fragments is not helpful to say the least. The only way to compare two methods without noise is to just use them as clean as possible and, at the very least, in similar circumstances.

The assumptions

There is a saying about assumptions... Between assuming the difference is not noticeable and bringing up useless considerations about pd.read_sql_query, the point gets severely blurred. The only obvious consideration here is that if anyone is comparing pd.read_sql_query and pd.read_sql_table, it's the table, the whole table and nothing but the table. Invoking where, join and others is just a waste of time.

Furthermore, the question explicitly asks for the difference between read_sql_table and read_sql_query with a SELECT * FROM table.

My code

I ran this over and over again on SQLite, MariaDB and PostgreSQL. I use SQLAlchemy exclusively to create the engines, because pandas requires this. The data comes from the coffee-quality-database and I preloaded the file data/arabica_data_cleaned.csv in all three engines, to a table called arabica in a DB called coffee

Here's a summarised version of my script:

import time
import pandas as pd
from sqlalchemy import create_engine

sqlite_engine = create_engine('sqlite:///coffee.db', echo=False)
mariadb_engine = create_engine('mariadb+mariadbconnector://root:[email protected]:3306/coffee')
postgres_engine = create_engine('postgresql://postgres:[email protected]:5432/coffee')

for engine in [sqlite_engine, mariadb_engine, postgres_engine]:
    print(engine)
    print('\tpd.read_sql_query:')
    startTime = time.time()
    for i in range(100):
        pd.read_sql_query('SELECT * FROM arabica;', engine)
    print(f"\t[-- TIME --] {time.time()-startTime:.2f} sec\n")
    print('\tpd.read_sql_table:')
    startTime = time.time()
    for i in range(100):
        pd.read_sql_table('arabica', engine)
    print(f"\t[-- TIME --] {time.time()-startTime:.2f} sec\n")

The versions are:

Python: 3.9.0
pandas: 1.2.4
SQLAlchemy: 1.4.13
time: built-in

My results

Here's a sample output:

Engine(sqlite:///coffee.db)
        pd.read_sql_query:
        [-- TIME --] 2.58 sec

        pd.read_sql_table:
        [-- TIME --] 3.60 sec

Engine(mariadb+mariadbconnector://root:***@127.0.0.1:3306/coffee)
        pd.read_sql_query:
        [-- TIME --] 2.84 sec

        pd.read_sql_table:
        [-- TIME --] 4.15 sec

Engine(postgresql://postgres:***@127.0.0.1:5432/coffee)
        pd.read_sql_query:
        [-- TIME --] 2.18 sec

        pd.read_sql_table:
        [-- TIME --] 4.01 sec

Conclusion

The above are a sample output, but I ran this over and over again and the only observation is that in every single run, pd.read_sql_table ALWAYS takes longer than pd.read_sql_query. This sounds very counter-intuitive, but that's why we actually isolate the issue and test prior to pouring knowledge here.

I haven't had the chance to run a proper statistical analysis on the results, but at first glance, I would risk stating that the differences are significant, as both "columns" (query and table timings) come back within close ranges (from run to run) and are both quite distanced. In some runs, table takes twice the time for some of the engines.

If/when I get the chance to run such an analysis, I will complement this answer with results and a matplotlib evidence.

Context

My initial idea was to investigate the suitability of SQL vs. MongoDB when tables reach thousands of columns. pdmongo.read_mongo (from the pdmongo package) devastates pd.read_sql_table — which performs very poorly against large tables — but falls short of pd.read_sql_query.

With around 900 columns, pd.read_sql_query outperforms pd.read_sql_table by 5 to 10 times!

Answer from Ricardo on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 48171611 › difference-between-pandas-read-sql-query-and-read-sql-table

python - difference between pandas read sql query and read sql table - Stack Overflow

The process

If you're to compare two methods, adding thick layers of SQLAlchemy or pandasSQL_builder (that is pandas.io.sql.pandasSQL_builder, without so much as an import) and other such non self-contained fragments is not helpful to say the least. The only way to compare two methods without noise is to just use them as clean as possible and, at the very least, in similar circumstances.

The assumptions

There is a saying about assumptions... Between assuming the difference is not noticeable and bringing up useless considerations about pd.read_sql_query, the point gets severely blurred. The only obvious consideration here is that if anyone is comparing pd.read_sql_query and pd.read_sql_table, it's the table, the whole table and nothing but the table. Invoking where, join and others is just a waste of time.

Furthermore, the question explicitly asks for the difference between read_sql_table and read_sql_query with a SELECT * FROM table.

My code

I ran this over and over again on SQLite, MariaDB and PostgreSQL. I use SQLAlchemy exclusively to create the engines, because pandas requires this. The data comes from the coffee-quality-database and I preloaded the file data/arabica_data_cleaned.csv in all three engines, to a table called arabica in a DB called coffee

Here's a summarised version of my script:

import time
import pandas as pd
from sqlalchemy import create_engine

sqlite_engine = create_engine('sqlite:///coffee.db', echo=False)
mariadb_engine = create_engine('mariadb+mariadbconnector://root:[email protected]:3306/coffee')
postgres_engine = create_engine('postgresql://postgres:[email protected]:5432/coffee')

for engine in [sqlite_engine, mariadb_engine, postgres_engine]:
    print(engine)
    print('\tpd.read_sql_query:')
    startTime = time.time()
    for i in range(100):
        pd.read_sql_query('SELECT * FROM arabica;', engine)
    print(f"\t[-- TIME --] {time.time()-startTime:.2f} sec\n")
    print('\tpd.read_sql_table:')
    startTime = time.time()
    for i in range(100):
        pd.read_sql_table('arabica', engine)
    print(f"\t[-- TIME --] {time.time()-startTime:.2f} sec\n")

The versions are:

Python: 3.9.0
pandas: 1.2.4
SQLAlchemy: 1.4.13
time: built-in

My results

Here's a sample output:

Engine(sqlite:///coffee.db)
        pd.read_sql_query:
        [-- TIME --] 2.58 sec

        pd.read_sql_table:
        [-- TIME --] 3.60 sec

Engine(mariadb+mariadbconnector://root:***@127.0.0.1:3306/coffee)
        pd.read_sql_query:
        [-- TIME --] 2.84 sec

        pd.read_sql_table:
        [-- TIME --] 4.15 sec

Engine(postgresql://postgres:***@127.0.0.1:5432/coffee)
        pd.read_sql_query:
        [-- TIME --] 2.18 sec

        pd.read_sql_table:
        [-- TIME --] 4.01 sec

Conclusion

The above are a sample output, but I ran this over and over again and the only observation is that in every single run, pd.read_sql_table ALWAYS takes longer than pd.read_sql_query. This sounds very counter-intuitive, but that's why we actually isolate the issue and test prior to pouring knowledge here.

I haven't had the chance to run a proper statistical analysis on the results, but at first glance, I would risk stating that the differences are significant, as both "columns" (query and table timings) come back within close ranges (from run to run) and are both quite distanced. In some runs, table takes twice the time for some of the engines.

If/when I get the chance to run such an analysis, I will complement this answer with results and a matplotlib evidence.

Context

My initial idea was to investigate the suitability of SQL vs. MongoDB when tables reach thousands of columns. pdmongo.read_mongo (from the pdmongo package) devastates pd.read_sql_table — which performs very poorly against large tables — but falls short of pd.read_sql_query.

With around 900 columns, pd.read_sql_query outperforms pd.read_sql_table by 5 to 10 times!

2 of 4

2

I don't think you will notice this difference.

Here is a source code for both functions:

In [398]: pd.read_sql_query??
Signature: pd.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)
Source:
def read_sql_query(sql, con, index_col=None, coerce_float=True, params=None,
                   parse_dates=None, chunksize=None):
    pandas_sql = pandasSQL_builder(con)
    return pandas_sql.read_query(
        sql, index_col=index_col, params=params, coerce_float=coerce_float,
        parse_dates=parse_dates, chunksize=chunksize)

and

In [399]: pd.read_sql_table??
Signature: pd.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None
)
Source:
def read_sql_table(table_name, con, schema=None, index_col=None,
                   coerce_float=True, parse_dates=None, columns=None,
                   chunksize=None):
    con = _engine_builder(con)
    if not _is_sqlalchemy_connectable(con):
        raise NotImplementedError("read_sql_table only supported for "
                                  "SQLAlchemy connectable.")
    import sqlalchemy
    from sqlalchemy.schema import MetaData
    meta = MetaData(con, schema=schema)
    try:
        meta.reflect(only=[table_name], views=True)
    except sqlalchemy.exc.InvalidRequestError:
        raise ValueError("Table %s not found" % table_name)

    pandas_sql = SQLDatabase(con, meta=meta)
    table = pandas_sql.read_table(
        table_name, index_col=index_col, coerce_float=coerce_float,
        parse_dates=parse_dates, columns=columns, chunksize=chunksize)

    if table is not None:
        return table
    else:
        raise ValueError("Table %s not found" % table_name, con)

NOTE: i have iintentionally cut off docstrings...

Dataquest Community

community.dataquest.io › q&a › dq courses

Read_sql() vs read_sql_query()? - DQ Courses - Dataquest Community

December 19, 2019 - What is the difference between read_sql() and read_sql_query()?

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_sql.html

pandas.read_sql — pandas 3.0.1 documentation - PyData |

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table.

Alibaba Cloud

topic.alibabacloud.com › a › the-difference-between-pandas-read_sql-and-read_sql_table-and-read_sql_query_8_8_30019239.html

The difference between pandas Read_sql and Read_sql_table and Read_sql_query

From the above can be seen, in fact, Read_sql is a combination of read_sql_table and read_sql_query, so the general use of Read_sql is good, save again to distinguish those things. Third: Data written to the database See me in another article: http://www.cnblogs.com/cymwill/p/8288667.html · The difference between pandas Read_sql and Read_sql_table and Read_sql_query

iheavy

iheavy.com › home › pd.read_sql vs pd.read_sql_query | what’s the difference?

pd.read_sql vs pd.read_sql_query | What’s the Difference? - iheavy - Devops + Cloud Solutions Architect

January 15, 2026 - In scenarios where SQL queries demand greater flexibility and precision, pd.read_sql_query should be the preferred option. Its adaptability to complex querying requirements makes it a reliable choice if you’re seeking a more customized approach ...

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas read sql query or table with examples

Pandas Read SQL Query or Table with Examples - Spark By {Examples}

December 2, 2024 - Pandas supports parameterized queries through the params argument in both read_sql_query() and read_sql_table() functions. In this pandas read SQL into DataFrame you have learned how to run the SQL query and convert the result into DataFrame.

Liora

liora.io › blog › understanding the pandas read sql function: a deep dive

Understanding the Pandas Read SQL Function: A Deep Dive

March 20, 2024 - Pandas Read_SQL is a feature of the Python library that extracts the results of a SQL query directly into the Panda dataframe.

Medium

medium.com › @stephenhan › getting-data-from-an-sql-query-in-pandas-b19a109a9bc6

Mini-tutorial: Reading SQL into Pandas | by Stephen Han | Medium

February 26, 2022 - You should now have a pandas.DataFrame object with rows and columns from the data table. You’ll notice there are three methods for reading SQL, all three of which are demonstrated above. Both pd.read_sql_from_query and pd.read_sql should behave identically (at least as of the time of this writing).

DataLemur

datalemur.com › blog › sql-pandas-read_sql

How to Use Pandas read_sql to Write and Run SQL?

April 21, 2025 - The Python library Pandas provides the capability to interpret SQL queries using its Pandas functions. The function in the Pandas library reads the results of a SQL query from SQL databases into Panda DataFrames.

Find elsewhere

Google Bing Mojeek

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_sql_query.html

pandas.read_sql_query — pandas 3.0.1 documentation

Returns a DataFrame object that contains the result set of the executed SQL query, in relation to the specified database connection. ... Read SQL database table into a DataFrame.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.read_sql.html

pandas.read_sql — pandas 3.0.0rc0+27.g47fea804d6 documentation

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table.

Databricks

api-docs.databricks.com › python › pyspark › latest › pyspark.pandas › api › pyspark.pandas.read_sql.html

pyspark.pandas.read_sql — PySpark master documentation

Read SQL query or database table into a DataFrame · This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a ...

Pandas

pandas.pydata.org › pandas-docs › version › 1.5 › reference › api › pandas.read_sql.html

pandas.read_sql — pandas 1.5.2 documentation

November 12, 2021 - This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table.

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.read_sql_query.html

pyspark.pandas.read_sql_query — PySpark 4.1.1 documentation

pyspark.pandas.read_sql_query(sql, con, index_col=None, **options)[source]# Read SQL query into a DataFrame. Returns a DataFrame corresponding to the result set of the query string. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default index will be used.

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_sql.html

pandas.read_sql — pandas 2.2.3 documentation - PyData |

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table.

Pandas

pandas.pydata.org › pandas-docs › version › 1.1 › reference › api › pandas.read_sql.html

pandas.read_sql — pandas 1.1.5 documentation

January 28, 2016 - This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table.

Pandas

pandas.pydata.org › pandas-docs › stable › generated › pandas.read_sql.html

pandas.read_sql — pandas 2.2.2 documentation - PyData |

The page has been moved to this page

GitHub

github.com › pandas-dev › pandas › issues › 24988

Version 0.24.0 breaks read_sql compatibility with ...

January 28, 2019 - import pandas as pd from sqlalchemy import create_engine engine = create_engine('mysql+pymysql://'+name+':'+pw+'@'+server+'/?charset=utf8') sql = 'select * from MyDatabase.my_temp_table' df = pd.read_sql(sql,engine) --------------------------------------------------------------------------- InternalError Traceback (most recent call last) ~/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args) 1235 self.dialect.do_execute( -> 1236 cursor, statement, parameters, context 1237 ) ~/miniconda3/lib/python3.6/site

Author Shellcat-Zero

Pandas

pandas.pydata.org › pandas-docs › version › 0.22 › generated › pandas.read_sql.html

pandas.read_sql — pandas 0.22.0 documentation

February 21, 2024 - This function is a convenience wrapper around read_sql_table and read_sql_query (and for backward compatibility) and will delegate to the specific function depending on the provided input (database table name or SQL query).

TutorialsPoint

tutorialspoint.com › python_pandas › python_pandas_read_sql_method.htm

Python Pandas read_sql() Method

This method is a convenient wrapper ... the task to the appropriate function, read_sql_query() for executing raw SQL queries and read_sql_table() for loading an entire table by name....