unfortunately UNION is the only way here:
WITH bar (baz) AS
(select 'a' union select 'b' union select 'c')
SELECT * from bar;
Answer from AlexYes on Stack Overflowunfortunately UNION is the only way here:
WITH bar (baz) AS
(select 'a' union select 'b' union select 'c')
SELECT * from bar;
TLDR: The most efficient way to simulate a multi-row VALUES clause is to create an array-of-arrays for the rows and columns of the data, and then unpack it and (if necessary) cast to the desired data types:
select rowdata[0]::varchar, rowdata[1]::decimal
from
(select array(
array('a', 1),
array('b', 2)
) as arr) as data,
data.arr as rowdata
(The data.arr as rowdata bit is to unnest the array.)
UNION ALL has the unfortunate behavior that each of the SELECT statements will be distributed across the cluster:
explain select * from (select 'a' union all select 'b' union all select 'c')
XN Subquery Scan derived_table1 (cost=0.00..0.09 rows=3 width=32)
-> XN Append (cost=0.00..0.06 rows=3 width=0)
-> XN Network (cost=0.00..0.02 rows=1 width=0)
Distribute Round Robin
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0)
-> XN Result (cost=0.00..0.01 rows=1 width=0)
-> XN Network (cost=0.00..0.02 rows=1 width=0)
Distribute Round Robin
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..0.02 rows=1 width=0)
-> XN Result (cost=0.00..0.01 rows=1 width=0)
-> XN Network (cost=0.00..0.02 rows=1 width=0)
Distribute Round Robin
-> XN Subquery Scan "*SELECT* 3" (cost=0.00..0.02 rows=1 width=0)
-> XN Result (cost=0.00..0.01 rows=1 width=0)
On more than a few rows, this incurs an absurd overhead and makes queries extremely inefficient. Fortunately, we can use the SUPER data type as a workaround; when we select a single value (the array-of-arrays) the query planner sees this as a single query which it only needs to distribute to one node, which is much more efficient to execute.
sql - How to select multiple rows filled with constants in Amazon Redshift? - Stack Overflow
Multi-Select fails with redshift queries prepared statements
Function/Procedure to Return a Result in Query in Redshift
Best way to count distinct values
This seems like some basic proc/UDF functionality that I just can't figure out in Redshift. I currently have external tables that I'm partitioning by date. I just wanted to query the latest date in the table:
select *from some_external_tablewhere date = (select max(substring(values, 3, 10))::datefrom svv_external_partitionswhere tablename = 'some_external_table');
That query to svv_external_partition is rather ugly and I wanted to wrap it into a UDF or proc. The restrictions on using SQL for functions is super restrictive (can't use the FROM clause?) so I'm trying to figure out if it's possible to use a procedure.
Here's my proc:
CREATE OR REPLACE PROCEDURE get_last_ds(schema_param IN varchar(256),table_param IN varchar(256),last_ds OUT date)AS $$BEGINEXECUTE 'SELECT max(substring(values, 3, 10))::dateFROM svv_external_partitionsWHERE schemaname = ''' || schema_param || '''AND tablename = ''' || table_param || ''';' INTO last_ds;END;$$ LANGUAGE plpgsql;
This works just fine but can only be executed using call:
begin;call get_last_ds('some_external_schema', 'some_external_table');end;
Is there a way to achieve the following?
select *from some_external_tablewhere date = get_last_ds('some_external_schema', 'some_external_table');