The key differentiator between a NoSQL database like AWS
Dynamodb and a relational database like Oracle is of the way data is modelled.
In a relational model; data is normalized, de-duplicated and relationships are
established between entities. In a NoSQL database like Dynamodb, data can be
duplicated and entities can be fused together. This is done to facilitate
scalability which is hallmark of today's hyper-scale cloud based applications.
One of the most common relationship within data modelling is
One-to-Many. For example, we have 2 entities; Painting and Gallery. A painting
can only reside in one gallery, whereas a gallery can have multiple paintings.
This is one to many relationship. In relational modelling world, we would have
2 tables; Painting and Gallery, such as:
Museum
|
Painting
|
|
|||
MuseumId
|
MuseumName
|
PaintingId
|
PaintingName
|
MuseumId
|
|
M1
|
Sofía
|
P1
|
Guernica
|
M1
|
|
M2
|
NSW Gallery
|
P2
|
The Great Masturbator
|
M1
|
|
M3
|
Louvre
|
P3
|
Sofala
|
M2
|
|
P4
|
Mona Lisa
|
M3
|
In above One-to-Many relationship, we have joined both
Museum and Painting with primary key and foreign key relationship. So e.g. if
want to check all paintings in Museum M1, then the query would be:
select p.Title, m.MuseumName from Painting p, Museum m where
p.MuseumId=m.MuseumId and m.MuseumId=’M1’;
Now joins are expensive, and they become more expensive as
the data grows. In huge cloud scale databases this hampers scalability and at
times become impractical to have these joins, hence the NoSQL databases like
Dynamodb.
So how do you model One-to-Many relationship in a Dynamodb
table when we cannot have joins and aggregations?
We know what the entities are, and we have an access
pattern, so now let’s design our table.
MuseumPaintingTable
|
||||
Primary
Key
|
Attributes
|
|||
PK
|
SK
|
MuseumName
|
Title
|
Artist
|
Museum#M1
|
Painting#P1
|
Sofía
|
Guernica
|
Picasso
|
Painting#P2
|
Sofía
|
The Great Masturbator
|
Salvador Dali
|
|
Museum#M2
|
Painting#P3
|
NSW Gallery
|
Sofala
|
Russel Drysdale
|
Museum#M3
|
Painting#P4
|
Louvre
|
Mona Lisa
|
Leonardo
|
If you are coming from a relational world, the above table
may seem like blasphemy even though I haven’t yet used different attributes for
each item to keep it simple. This is where you need to appreciate the flexibility
of NoSQL databases to facilitate scalability.
In above table, we have define a composite primary key,
consisting of a partition key and a sort key. A primary key in Dynamodb table
defines a unique item, where item corresponds to a row. Remember that all
access patterns are driven through the primary key, which means that the entry
point for your queries must be the primary key. Every museum and painting has
its own unique id, so we have combined them together to generate a unique item.
Note that how we have defined a one-to-many relationship in above table
using partition key and sort key within the primary key. For one Museum#M1, we have 2 paintings; Painting#P1 and Painting#P2,
and then both of these paintings have their own attributes, which can even
differ in structure. We have fused both Painting and Museum entities
together.
Now how would you write the previous relational query in DynamoDB,
where we wanted to check all paintings in the Museum M1? It would be like this:
aws dynamodb query \
--table-name
MuseumPaintingTable \
--key-condition-expression "PK = :MID" \
--expression-attribute-values
'{":MID":{"S":"M1"}}'
--projection-expression
'Title,MuseumName'
The above command is a RESTful API call to Dynamodb table.
No comments:
Post a Comment