- Published on
DynamoDB Lexicographical Sorting
- Authors
- Name
- jonathan Bradbury
Summary
The purpose of this article is to show how DynamoDB lexicographical sorting can be used for certain access patterns. For this example we will be tracking a restaurant's speed with service data, and how we can make a Leaderboard out of it.
Access Patterns
- Query raw car data for a full business_date sorted by time
- Query top 10 stores with lowest average for the last 15 minutes
Design for raw data storage
- PK: STORE_ID
- SK: YYYY-MM-DD HH:MM:SS#LANE{number}
Sort Key Breakdown
Here we are taking advantage of DynamoDB lexicographical sorting by using a consistent date timestamp for all records in the sort key. If you look at the data in the image you'll notice that it starts at 00:00 hours and ends at 00:09.
Lexicographical sorting unlocks the ability to query a date range of data. This is important since in this example we want to show who was the fastest store in the past 15 minutes (query start_time 15 mins ago, query end_time is now). This is also helpful when we want to send a final daily file (4am - 4am)
PUT
GET
With this function I can get data for any range of time, full day or 15 minutes at a time. ScanIndexForward means start at the smallest and go to the largest.
Response
Example Query: data = get_range_data('000001', '2022-08-13 00:00:00#LANE1', '2022-08-13 00:09:00#LANE2') (shorten for brevity, notice its in order)
Design for getting top 10 stores at specific time of day
- PK: LEADERBOARD
- SK: STORE_ID
PK
Using LEADERBOARD at all stores allows us to group all stores into 1 collection
SK
Using the store number here let's all stores easily update their average, while also allowing the store number to become an attribute in the global secondary index query.
Global Secondary Index
- gsipk-1: LEADERBOARD
- gsisk-1: YYYY-MM-DD#fifteenth_hour_increment#OTDAverage(as 6 digits)
gsipk-1
This is simply just the PK again. We have to reuse it so we can search the gsisk-1 as an attribute, otherwise we could only query it.
gsisk-1
- YYYY-MM-DD: Using a date string lets us query for the current date we're on.
- fifteenth_hour_increment: 15 minutes passes 96 times a day. We can calculate which 15th minute in the day it is which allows us to only get data for stores who are sending for the current time we want.
- OTDAverage - Order to delivery average is the (menu1 + menu2 + qmenu1 + qmenu2 + window) / number of cars. We use DynamoDB Lexicographical sorting here to get the top 10 stores (since they have the lowest averages)
Note
You might be wondering "why do you zero pad the average OTD"? The answer is that in DynamoDB Lexicographical sorting puts 10 before 2. So you can have 1, 10, 2.
If you zero pad the numbers, you get the ordering you expect. We just have to choose a zero padded amount that won't ever be realistically reached.
Query Top 10 for the last 15 minutes
When you use a global secondary index, you can have the same attribute multiple times for that gsi sort key. This allows you to have multiple stores with the same exact OTD not overwrite each others data.
To unlock querying the most recent top 10 stores we use a "begins_with" query on gsisk-1 with ScanIndexForward equal to True. The Store number becomes an attribute that we read when we get the top 10 back.
PUT store average data
GET Top 10
You can visualize the data being sorted like this for the above query
Example call for test data:
top_stores = get_current_global_top_ten('2022-08-13#2')
Response
As you can see from the response, get back the stores with the lowest OTD times. Note that its in order