Published on

DynamoDB Lexicographical Sorting

Authors
  • Name
    jonathan Bradbury
    Twitter

Summary

The purpose of this article is to show how DynamoDB lexicographical sorting can be used for certain access patterns. For this example we will be tracking a restaurant's speed with service data, and how we can make a Leaderboard out of it.

Access Patterns

  • Query raw car data for a full business_date sorted by time
  • Query top 10 stores with lowest average for the last 15 minutes

Design for raw data storage

  • PK: STORE_ID
  • SK: YYYY-MM-DD HH:MM:SS#LANE{number}

Sort Key Breakdown

Here we are taking advantage of DynamoDB lexicographical sorting by using a consistent date timestamp for all records in the sort key. If you look at the data in the image you'll notice that it starts at 00:00 hours and ends at 00:09.

Lexicographical sorting unlocks the ability to query a date range of data. This is important since in this example we want to show who was the fastest store in the past 15 minutes (query start_time 15 mins ago, query end_time is now). This is also helpful when we want to send a final daily file (4am - 4am)

PUT

GET

With this function I can get data for any range of time, full day or 15 minutes at a time. ScanIndexForward means start at the smallest and go to the largest.

Response

Example Query: data = get_range_data('000001', '2022-08-13 00:00:00#LANE1', '2022-08-13 00:09:00#LANE2') (shorten for brevity, notice its in order)

Design for getting top 10 stores at specific time of day

  • PK: LEADERBOARD
  • SK: STORE_ID

PK

Using LEADERBOARD at all stores allows us to group all stores into 1 collection

SK

Using the store number here let's all stores easily update their average, while also allowing the store number to become an attribute in the global secondary index query.

Global Secondary Index

  • gsipk-1: LEADERBOARD
  • gsisk-1: YYYY-MM-DD#fifteenth_hour_increment#OTDAverage(as 6 digits)

gsipk-1

This is simply just the PK again. We have to reuse it so we can search the gsisk-1 as an attribute, otherwise we could only query it.

gsisk-1

  • YYYY-MM-DD: Using a date string lets us query for the current date we're on.
  • fifteenth_hour_increment: 15 minutes passes 96 times a day. We can calculate which 15th minute in the day it is which allows us to only get data for stores who are sending for the current time we want.
  • OTDAverage - Order to delivery average is the (menu1 + menu2 + qmenu1 + qmenu2 + window) / number of cars. We use DynamoDB Lexicographical sorting here to get the top 10 stores (since they have the lowest averages)

Note

You might be wondering "why do you zero pad the average OTD"? The answer is that in DynamoDB Lexicographical sorting puts 10 before 2. So you can have 1, 10, 2.

If you zero pad the numbers, you get the ordering you expect. We just have to choose a zero padded amount that won't ever be realistically reached.

Query Top 10 for the last 15 minutes

When you use a global secondary index, you can have the same attribute multiple times for that gsi sort key. This allows you to have multiple stores with the same exact OTD not overwrite each others data.

To unlock querying the most recent top 10 stores we use a "begins_with" query on gsisk-1 with ScanIndexForward equal to True. The Store number becomes an attribute that we read when we get the top 10 back.

PUT store average data

GET Top 10

You can visualize the data being sorted like this for the above query

Example call for test data:

top_stores = get_current_global_top_ten('2022-08-13#2')

Response

As you can see from the response, get back the stores with the lowest OTD times. Note that its in order