Skip to content

Other Databases

Links: 101 AWS SAA Index


Database types

  • RDBMS

    • SQL / OLTP(online transaction processing)
    • RDS, Aurora
    • great for joins
  • NoSQL database

    • DynamoDB (~JSON), ElastiCache (key / value pairs)
    • no joins, no SQL.
  • Object Store

    • S3 (for big objects) / Glacier (for backups / archives).
    • S3 and glacier don’t look like database but they are databases which are used to store large objects.
    • It is a key value store for large objects
  • Data Warehouse

    • (= SQL Analytics / BI)
    • Redshift (OLAP), Athena
  • Search

    • ElasticSearch (JSON)
    • free text, unstructured searches
  • Graphs

    • Neptune displays relationships between data
    • Neptune is fully managed.
    • social media apps

Redshift

  • Based on postgres.
  • It can run complex analytical queries.
  • Used of warehousing, OLAP (online analytical processing), BI/analytics.
  • Enterprise-level, petabyte scale, fully managed data warehousing service.
  • Columnar storage of data.
  • Massive parallel query execution.
  • We have leader (query planning) and compute nodes (performing queries).
  • Data can be loaded into Redshift using Kinesis data firehose, S3 via copy command and EC2 instances.
  • It is faster than Athena. But it is not serverless like Athena. Also it is costlier than Athena.

Disaster Recovery

  • There is no multi AZ mode in redshift.
  • For Disaster Recovery (DR) we have to use snapshots.
  • You can take snapshots manually and restore new cluster from snapshots.
  • All the snapshots are stored in S3.
  • You can configure Amazon Redshift to automatically copy snapshots (automated or manual) of a cluster to another AWS Region.
    • For this to work automated snapshots must be enabled.
    • Enable Cross-Region Snapshots Copy in your Amazon Redshift Cluster.

Redshift Spectrum

  • Perform queries directly against S3
  • No need to load data.
  • Keywords for questions are Least amount of effort, minimum cost.

Redshift enhanced VPC routing

  • Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs

Glue

  • It is an ETL (Extract, Transform and Load) service in AWS.
  • It is used to prepare and transform data for analytics.
  • Fully serverless.
  • Glue data catalog
  • Takes data from databases like RDS, DynamoDB, S3 and then after ETL jobs sends it to BI tools like Redshift, Athena, EMR.

ElasticSearch

  • In DynamoDB you can only find by the primary keys or indexes.
  • With ElasticSearch we can search any field and even partial matches.
  • It is used to complement other databases.
  • Comes with Kibana (visualisation) & Logstash (log ingestion) - ELK stack

Last updated: 2023-01-15