Redshift — Concept

What it is

Amazon Redshift = a managed petabyte-scale data warehouse (OLAP). Columnar storage, massively parallel processing (MPP), SQL on TB–PB of historical data.

Why it exists

RDS/Aurora are OLTP (transactional, row-based) and can't efficiently scan billions of rows. Redshift is OLAP: columnar storage, compression, parallel scans — for BI dashboards, reporting, analytics over historical data.

Architecture

Cluster = 1 leader node + many compute nodes.
Leader parses SQL, plans, aggregates results.
Compute nodes hold partitions ("slices") and run in parallel.
Node types: RA3 (managed storage, separate compute/storage), DC2 (legacy SSD).

Redshift Serverless

No cluster sizing — scale automatically.
Pay per RPU-hour.
Best for variable / unpredictable analytics workloads.

Spectrum

Query data directly in S3 without loading.
Schema in Glue Data Catalog.
Good for data lake federation; cheap because compute is separate from S3 storage.

Concurrency

Concurrency Scaling = adds extra clusters on demand for read peaks; you get free credits.
Workload Management (WLM) = queues with priorities; auto-WLM available.

Loading & integrations

COPY from S3 (best practice, parallel).
Federated Query: Postgres / RDS / Aurora live tables.
Redshift Streaming Ingestion from Kinesis / MSK.
AWS DMS for migration from other DBs.

Backups

Automated snapshots to S3 (1-day default, configurable up to 35).
Cross-region copy for DR.

Security

VPC, KMS encryption, IAM auth, audit logs to S3.
Enhanced VPC routing for COPY/UNLOAD over VPC.

When to use vs alternatives

Use ...	Instead of ...	When ...
Redshift	Aurora	TB–PB analytical reporting, complex aggregations
Athena	Redshift	Ad-hoc queries on S3, low/no setup, pay per query
EMR / Spark	Redshift	Heavy custom transformations, ML pipelines
OpenSearch	Redshift	Log search and full-text
DynamoDB	Redshift	OLTP, not analytics
Redshift Serverless	Provisioned	Variable / unknown load, less ops
Redshift Spectrum	COPY into Redshift	Data lake; want to query S3 directly

Common exam scenarios

"BI dashboards over 50 TB of historical data" → Redshift.
"Ad-hoc SQL on data already in S3 / no infra" → Athena.
"Query S3 + warehouse together" → Redshift Spectrum.
"Streaming ingestion of clickstream into warehouse" → Kinesis → Redshift Streaming Ingestion.
"Variable analytics workload, no ops" → Redshift Serverless.
"Federated SQL over Aurora + Redshift without ETL" → Federated Query.

Exam tip

Athena = serverless SQL over S3, ad-hoc, cheap, no infra. Redshift = persistent warehouse, faster on big joins/aggregations at scale, you load data in. EMR = full Hadoop/Spark stack for custom big-data pipelines.

References

https://docs.aws.amazon.com/redshift/