Opinions on Amazon Redshift versus other RDBMS?
SQL databases closest or most adaptable to Amazon Redshift?
postgresql - Bad practice to use MySQL and RedShift together? - Stack Overflow
amazon web services - Is AWS Redshift to PostgreSQL the same as AWS Aurora to MySQL? - Stack Overflow
Videos
Has anyone used both Amazon Redshift and at least one other major RBDMS like PostgreSQL, MySQL, Microsoft SQL Server, or Oracle SQL? I just joined a new company and the first project that we've been tasked with is building out an ODS for our Marketing, Sales, Finance, and Product data. I'm arriving just in time to help them decide on the RDBMS and one of the suggestions from another team member is Amazon Redshift.
I've had plenty of experience working within the four db's mentioned above and I'm very comfortable in what they're capable of. I've worked inside of a large ODS built in Microsoft SQL Server that supported multiple databases with tables that had tens of millions of records. The schemas were well designed and the database rarely suffered any performance issues or hiccups. More recently, I architected a smaller marketing/sales database in MySQL, hooked it up to Zapier for data inputs and Chartio (BI tool) for reads, and it worked like a charm. I'm confident that with the data we're looking to capture and report off of, a "traditional" RBDMS would work just fine.
That said, I want to be open to Redshift though and give it a fair shot. What can Redshift bring to the table that db's like PGSQL and MySQL cannot? What would we sacrifice by choosing Redshift? How easy will it be for me to become comfortable designing and working within Redshift if I already know PostgreSQL? I could go on asking questions but generally I'm just looking to understand if Redshift has any distinct advantages over the db's I've worked with before.
So the startup I am potentially looking at is a small outfit and much of their data is mostly coming from Java/MyBatis microservices. They are already hosted on Amazon (I believe).
However from what I know, the existing user base and/or data size is very small (20k users; likely to have duplicates).
The POC here is an analytics project to mine data from said users via surveys or LLM chats (there is some monetization involved on user side).
Said data will then be used for
-
Advertising profiles/segmentation
Since the current data volume is so small, and reading several threads here, it seems the consensus is to use RDS for small outfits like this. However obviously they will want to expand to down the road and given their ecosystem I believe Redshift is eventually the best option.
That loops back to the question in the title, namely what setups in your experience are most adaptable to RDS?
Redshift is not PostgreSQL. It is a column store engine that uses a very heavily modified part of a very old PostgreSQL version as its front-end. Under the hood it's powered by ParAccel, a very heavily modified fork of PostgreSQL 8.0.2.
Imagine someone took MySQL 4.1 or something from that era, deleted InnoDB and MyISAM, added their own hardwired storage engine, removed a whole bunch of features and added a bunch of different ones - changing the supported SQL dialect in the process. That gives you some idea.
It's a dramatically different product for different needs. It's heavily optimised for OLAP workloads and pays a heavy price for OLTP workloads.
In general you should use PostgreSQL (on AWS RDS, or elsewhere) for your day to day transaction processing. If you want data warehousing and analytics and have outgrown PostgreSQL for that then you might consider Redshift as one of your options... though it's likely you haven't really outgrown PostgreSQL, just AWS RDS.
Maybe you're looking for something more like Postgres-XL ?
The other answer is accurate regarding Redshift not being the PostgreSQL equivalent of Aurora. Generally you'd use Redshift when you needed to run some very heavy queries on a large dataset (the stuff that might take hours or more to finish running). Redshift is a columnar datastore that essentially auto-normalizes every piece of data that comes in and can execute queries that would otherwise take days in seconds. When you're done, you delete it and then repeat the process when you need it again.
In terms of getting an Aurora equivalent for PostgreSQL, I don't know how far off that is but I'm pretty sure an enterprising person could build their own with AWS EFS (https://aws.amazon.com/efs/). I'm fairly certain that's a big part of the Aurora formula.
How different are the 2 database platforms? I have quite a bit of experience creating and maintaining custom tables and data feeds in Redshift using S3 and other automation tools. How different and difficult would learning how to administrate and run a MSS database be? Thank you!