DynamoDB

fully managed noSQL database
no practical storage limitations
runs on SSD behind the scences. single-digit ms latency
DynamoDB is a collection of Tables

Table:

You can specify the performance requirements:
- Write Capacity Units , WCU - number of 1 KB blocks/sec. (always rounded , 0.5 KB still counts as 1 WCU)
- Read Capacity Units, RCU - number of 4 KB blocks/sec. (also rounded, 2 KB -> 1 RCU)
Read Consistency
- Eventually consistent reads by default
  - uses 0.5 RCU per 4 KB block (therefore cheaper)
- Strongly consistent reads
  - uses 1 RCU per 4 KB block
Data structure (data schema) is not fixed at the table level.
Elements of a table:
- Each row is called an item.
- Each row has one or more Attributes (similar to columns in SQL).
  - Items don't all need to have the same attributes.
- Special attributes:
  - Partition Key (aka Hash Key)
  - Sort Key (aka Range Key)
    - Allows 1-to-many relationship
    - Provides sorted items and efficient range queries
  - The value of the Partition Key + Sort key must unique in the table. If no sort key, then partition key value must be unique.
- Attribute types:
  - String, Number, Binary (base64), Boolean, Null, Document(List/Map)= Json, Set = array
When creating a table you must specify:
- table name
- WCU/ RCU
- Partition key
DynamoDB integrations:
- Redshift COPY command can write directly from DynamoDB to Redshift.
- On EMR, you can use Apache Hive to read and write to DynamoDB using SQL-like language.
  - can join dynamo DB tables, copy to S3, HDFS, etc
- Use data pipeline to import/ export data from DynamoDB (which behind the scenes spawns transient EMR clusters)
- Can specify triggers that automatically react to dynamoDB tables and call AWS Lambda
- Kinesis Streams Connector library

DynamoDB Partitions

A partition relates to the underlying storage and processing nodes of DynamoDB
Initially one table = one parition
Initially all data for a table is stored in the single partition.
You have no direct control over the number of partitions (only indirect)
Max capacity of a single partition:
- 10 GB
- 3000 RCU
- 1000 WCU
Note that the configured table capacity (WCU/RCU) is split accross partitions
Data distribution over partitions:
- the node is selected based on the partition key (hashed)
- A given partition key value is mapped to one, and only one partition.
- However, a partition can hold many different partition key values.
Scaling (increase / decrease number of partitions)
- When capacity is exceeded a new partition is added automatically and data is spread between them over time.
- There is no automatic decrease in number of partitions
  - This can cause performance issues whereby a low WCU/RCU is divided among many partitions, which means the actual WCU and RCU limit for a given partition is even lower.
- in other words, table capacity is divided across partitions!
Calculating the minimum number of partitions:
- max of:
  - Desired_RCU / 3000 + Desired_WCU / 1000
  - Data size / 10 GB
Good attributes for partition keys:
- many distinct values
- uniform write pattern accross partitions
- writes distributed uniformly across time (can't usually control this)
Creating more uniform distribution of hot keys:
- add random suffix (e.g. PARTITION_KEY=CONCAT(ORIGINAL_PARTITION_KEY, RAND(0,10)))
- querying requires generating all the suffixes for a given ORIGINAL_PARTITION_KEY and querying each one of them.
To handle bursts of writes, use SQS to buffer writes
Burst Capacity
- DynamoDB provides some flexibility in your per-partition throughput provisioning by providing burst capacity, as follows. Whenever you are not fully using a partition's throughput, DynamoDB reserves a portion of that unused capacity for later bursts of throughput to handle usage spikes.
- retains up to five minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table. DynamoDB can also consume burst capacity for background maintenance and other tasks without prior notice.

DynamoDB operations:

SCAN
- very inefficient linear search
QUERY:
- a single partition key or partition key + sort key
- a partition key and range of sort keys
- Indexed partition key + sort keys

Indexes

Local Secondary Index (LSI)
- Must be created at table-creation, can't be created after the fact.
- The index itself contains:
  - old partition key (same as base table)
  - new sort key (must be a single scalar attribute)
  - old sort key (as a regular, non-key value)
  - a subset of attributes (called projected attributes)
- LSI are sparse indexes, it will only have items (=rows) that contain non null sort key attribute.
- Querying (Reads):
  - If you query an attribute that is not projected, you are charged for the entire ITEM cost as it must be pulled from the main table DynamoDB must query first the index and then the main table
    - Each sub query is rounded to 4KB separately
- Writes:
  - ADD: two writes if it belongs to LSI
  - DELETE: two writes if it belongs to LSI
  - UPDATE: may require two updates on the LSI, first to delete old entry, one to add a new entry.
- LSI limits the number of sort keys for a given partition key to 10 GB
  - (Called ItemCollections).
- LSI limitations:
  - can perform efficient queries for specified partition key values. However, cannot efficiently query the sparse index if one wants to query without specifying the partition key example: partition key= weather station, LSI sort key=intrusion detected. Can't query all items which have intrusion detected regardless of which weather station it is. Need GSI for this. In that case the intrusion_detected attribute would be defined as the partition key and the old partition (=weather station) would become the sort key of the GSI).
    - Global Secondary Index (GSI)
    - can be thought of as a copy of the original table whose replication is handled by dynamoDB
    - can have its own alternative Partition and sort key
    - has its own capacity, RCU/WCU, independent from main table
    - GSI is updated asynchronously
    - Options for attribute projection:
  - KEYS_ONLY:
    - new partition key
    - new sort key
    - old partition key
    - old sort key
  - INCLUDE:
    - specify custom projection values
  - ALL:
    - projects all attributes
- Note: GSI can cause bottlenecks to updates to the original table.
  - because GSI has its own WCU/RCU limits

DynamoDB stream

an ordered record of updates to a DynamoDB table
stores changes for 24hrs
endpoint: streams.dynamodb.us-west-2.amazonaws.com
- note: the streams API has its own endpoint, not the same has the main dynamodb api.
All changes show up in the stream once and only once.
Latency is low (near real-time)
What gets written to the stream is configurable with four 'views':
- KEYS_ONLY: only the key attributes are written to the stream (partition, sort)
- NEW_IMAGE: entire item after update
- OLD_IMAGE: entire item before update
- OLD_AND_NEW_IMAGE: both old and new
Use cases:
- Replication
  - replicate a database table in one AWS region to another in near-realtime for resiliency
  - sync writes in different regions?
  - Note: as of Nov 2017, replication is now supported natively by DynamoDB with the 'Global Tables' feature.
- Triggers
  - Lambda function triggered, e.g.
    - when new user is added to the users table
    - when items are added, for performing analytics

AWS DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that
- makes it simple and cost-effective to store and retrieve any amount of data and serve any level of request traffic.
- provides fast and predictable performance with seamless scalability
DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases to AWS, without having to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
DynamoDB tables do not have fixed schemas, and table consists of items and each item may have a different number of attributes.
DynamoDB synchronously replicates data across three facilities in an AWS Region, giving high availability and data durability.
DynamoDB supports fast in-place updates. A numeric attribute can be incremented or decremented in a row using a single API call
DynamoDB uses proven cryptographic methods to securely authenticate users and prevent unauthorized data access
Durability, performance, reliability, and security are built in, with SSD (solid state drive) storage and automatic 3-way replication.
DynamoDB supports two different kinds of primary keys:
- Partition Key (previously called the Hash key)
  - A simple primary key, composed of one attribute
  - DynamoDB uses the partition key’s value as input to an internal hash function; the output from the hash function determine the partition where the item will be stored.
  - No two items in a table can have the same partition key value.
- Partition Key and Sort Key (previously called the Hash and Range key)
  - A composite primary key composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
  - DynamoDB uses the partition key value as input to an internal hash function; the output from the hash function determines the partition where the item will be stored.
  - All items with the same partition key are stored together, in sorted order by sort key value.
  - It is possible for two items to have the same partition key value, but those two items must have different sort key values.
DynamoDB Secondary indexes
- add flexibility to the queries, without impacting performance.
- are automatically maintained as sparse objects, items will only appear in an index if they exist in the table on which the index is defined making queries against an index very efficient
DynamoDB throughput and single-digit millisecond latency makes it a great fit for gaming, ad tech, mobile, and many other applications
ElastiCache can be used in front of DynamoDB in order to offload high amount of reads for non frequently changed data

DynamoDB Performance

Automatically scales horizontally
runs exclusively on Solid State Drives (SSDs).
- SSDs help achieve the design goals of predictable low-latency response times for storing and accessing data at any scale.
- SSDs High I/O performance enables it to serve high-scale request workloads cost efficiently, and to pass this efficiency along in low request pricing
allows provisioned table reads and writes
- Scale up throughput when needed
- Scale down throughput four times per UTC calendar day
automatically partitions, reallocates and re-partitions the data and provisions additional server capacity as the
- table size grows or
- provisioned throughput is increased
Global Secondary indexes (GSI)
- can be created upfront or added later

DynamoDB Consistency

Each DynamoDB table is automatically stored in the three geographically distributed locations for durability
Read consistency represents the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item
DynamoDB allows user to specify whether the read should be eventually consistent or strongly consistent at the time of the request
- Eventually Consistent Reads (Default)
  - Eventual consistency option maximizes the read throughput.
  - Consistency across all copies is usually reached within a second
  - However, an eventually consistent read might not reflect the results of a recently completed write.
  - Repeating a read after a short time should return the updated data.
- Strongly Consistent Reads
  - Strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
Query, GetItem, and BatchGetItem operations perform eventually consistent reads by default
- Query and GetItem operations can be forced to be strongly consistent
- Query operations cannot perform strongly consistent reads on Global Secondary Indexes
- BatchGetItem operations can be forced to be strongly consistent on a per-table basis

DynamoDB Security

Fine Grained Access Control (FGAC) gives a high degree of control over data in the table
FGAC helps control who (caller) can access which items or attributes of the table and perform what actions (read/write capability).
FGAC is integrated with IAM, which manages the security credentials and the associated permissions.

DynamoDB Advanced Topics

Refer toDynamoDB Advanced post, which covers DynamoDB Streams, Triggers, Cross Region Replication, DAX, VPC Endpoints etc.

DynamoDB Encryption

Data in Transit Encryption
- can be done by encrypting sensitive data on the client side or using encrypted connections (TLS)
DynamoDB supports Encryption at rest
- Encryption at rest enables encryption for the data persisted (data at rest) in the DynamoDB tables.
- Encryption at rest includes the base tables, secondary indexes
- Encryption at rest automatically integrates with AWS KMS for managing the keys used for encrypting the tables.
- Encryption at rest can be enabled only for a new table and not for an existing table
- Encryption once enabled for a table, cannot be disabled
- DynamoDB Streams do not support encryption
- On-Demand Backups of encrypted DynamoDB tables are encrypted using S3’s Server-Side Encryption
- Encryption at rest encrypts your data using 256-bit AES encryption.

DynamoDB Costs

Index Storage
- DynamoDB is an indexed data store
  - Billable Data = Raw byte data size + 100 byte per-item storage indexing overhead
Provisioned throughput
- Pay flat, hourly rate based on the capacity reserved as the throughput provisioned for the table
- one Write Capacity Unit provides one write per second for items <1KB in size.
- one Read Capacity Unit provides one strongly consistent read (or two eventually consistent reads) per second for items <4KB in size.
- Provisioned throughput charges for every 10 units of Write Capacity and every 50 units of Read Capacity.
Reserved capacity
- Significant savings over the normal price
- Pay a one-time upfront fee

DynamoDB Best Practices

Keep item size small
Store metadata in DynamoDB and large BLOBs in Amazon S3
Use table per day, week, month etc for storing time series data
Use conditional or Optimistic Concurrency Control (OCC) updates
- Optimistic Concurrency Control is like Optimistic locking in the RDMS
- OCC is generally used in environments with low data contention, conflicts are rare and transactions can be completed without the expense of managing locks and transactions
- OCC assumes that multiple transactions can frequently be completed without interfering with each other.
- Transactions are executed using data resources without acquiring locks on those resources and waiting for other transaction locks to be cleared
- Before a transaction is committed, it is verified if the data was modified by any other transaction. If so, it would be rollbacked and needs to be restarted with the updated data
- OCC leads to higher throughput as compared to other concurrency control methods like pessimistic locking, as locking can drastically limit effective concurrency even when deadlocks are avoided
Avoid hot keys and hot partitions

AWS DynamoDB Secondary Indexes

DynamoDB provides fast access to items in a table by specifying primary key values
DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key
DynamoDB Secondary indexes
- is a data structure that contains a subset of attributes from a table
- is associated with exactly one table, from which it obtains its data
- requires an alternate key for the index partition key and sort key
- additionally can define projected attributes which are copied from the base table into the index along with the primary key attributes
- is automatically maintained by DynamoDB
- any addition, modification, or deletion of items in the base table, any indexes on that table are also updated to reflect these changes.
- helps reduce the size of the data as compared to the main table, depending upon the project attributes and hence helps improve provisioned throughput performance
- are automatically maintained as sparse objects. Items will only appear in an index if they exist in the table on which the index is defined, making queries an index very efficient
DynamoDB Secondary indexes supports two types
- Global secondary index – an index with a partition key and a sort key that can be different from those on the base table
- Local secondary index – an index that has the same partition key as the base table, but a different sort key

Global Secondary Indexes (GSI)

DynamoDB creates and maintains indexes for the primary key attributes for efficient access of data in the table, which allows applications to quickly retrieve data by specifying primary key values.
Global Secondary Indexes (GSI) are indexes that contain partition or composite partition-and-sort keys that can be different from the keys in the table on which the index is based.
Global secondary index is considered “global” because queries on the index can span all items in a table, across all partitions.
Multiple secondary indexes can be created on a table, and queries issued against these indexes.
Applications benefit from having one or more secondary keys available to allow efficient access to data with attributes other than the primary key.
GSIs support non-unique attributes, which increases query flexibility by enabling queries against any non-key attribute in the table
GSIs support eventual consistency . DynamoDB automatically handles item additions, updates and deletes in a GSI when corresponding changes are made to the table asynchronously
Data in a secondary index consists of GSI alternate key, primary key and attributes that are projected, or copied, from the table into the index.
Attributes that are part of an item in a table, but not part of the GSI key, primary key of the table, or projected attributes are not returned on querying the GSI index
GSIs manage throughput independently of the table they are based on and the provisioned throughput for the table and each associated GSI needs to be specified at creation time
- Read provisioned throughput
  - provides one Read Capacity Unit with two eventually consistent reads per second for items < 4KB in size.
  - provides one Write Capacity Unit with one write per second for items <1KB in size.
- Write provisioned throughput
  - consumes 1 write capacity unit if,
    - new item is inserted into table
    - existing item is deleted from table
    - existing items is updated for project attributes
  - consumes 2 write capacity units if
    - existing item is updated for key attributes, which results in deletion and addition of the new item into the index

Local Secondary Indexes

Local secondary index are indexes that has the same partition key as the table, but a different sort key.
Local secondary index is “local” cause every partition of a local secondary index is scoped to a table partition that has the same partition key.
LSI allows search using a secondary index in place of the sort key, thus expanding the number of attributes that can be used for queries which can be conducted efficiently
LSI are updated automatically when the primary index is updated and reads support both strong and eventually consistent options
LSIs can only be queried via the Query API
LSIs cannot be added to existing tables at this time
LSIs cannot be modified once it is created at this time
LSI cannot be removed from a table once they are created at this time
LSI consumes provisioned throughput capacity as part of the table with which it is associated
- Read Provisioned throughput
  - if data read is index and projected attributes
    - provides one Read Capacity Unit with one strongly consistent read (or two eventually consistent reads) per second for items < 4KB
    - data size includes the index and projected attributes only
  - if data read is index and a non projected attribute
    - consumes double the read capacity, with one to read from the index and one to read from the table with the entire data and not just the non projected attribute
- Write provisioned throughput
  - consumes 1 write capacity unit if,
    - new item is inserted into table
    - existing item is deleted from table
    - existing items is updated for project attributes
  - consumes 2 write capacity units if
    - existing item is updated for key attributes, which results in deletion and addition of the new item into the index

DynamoDB

DynamoDB

AWS DynamoDB

DynamoDB Performance

DynamoDB Consistency

DynamoDB Security

DynamoDB Advanced Topics

DynamoDB Encryption

DynamoDB Costs

DynamoDB Best Practices

AWS DynamoDB Secondary Indexes

Global Secondary Indexes (GSI)

Local Secondary Indexes

results matching ""

No results matching ""