Schema Design Workshop

Gary J. Murakami, Ph.D.
10gen (the MongoDB company)
@GaryMurakami

# Agenda 1. Basic schema design principles for MongoDB 2. Schema design over an application's lifetime 3. Common design patterns 4. Sharding

# Goals * Learn the schema design process in MongoDB * Practice applying common principles via exercises * Understand the implications of sharding

## What is a schema and why is it important?

# Schema * Map concepts and relationships to data * Set expectations for the data * Minimize overhead of iterative modifications * Ensure compatibility

## Normalization

users

username
first_name
last_name

←

books

title
isbn
language
created_by
author

→

authors

first_name
last_name

## Denormalization

users

username
first_name
last_name

←

books

title
isbn
language
created_by
author
- first_name
- last_name

## What is schema design like in MongoDB? * Schema is defined at the application-level * Design is part of each phase in its lifetime * There is no magic formula

## MongoDB Documents Storage in BSON → BSONSpec.org * Scalars * Doubles * Integers (32 or 64-bit) * UTF-8 strings * UTC Date, timestamp * Binary, regex, code * Object ID * `null` * Rich types * Objects * Arrays

## Terminology

{
    "mongodb"    : "relational db",
    "database"   : "database",
    "collection" : "table",
    "document"   : "row",
    "field"      : "column",
    "index"      : "index",
    "sharding" : {
        "shard"     : "partition",
        "shard key" : "partition key"
    }
}

## Three Considerations in MongoDB Schema Design 1. The data your application needs 2. Your application's read usage of the data 3. Your application's write usage of the data

# Case Study ## Library Web Application Different schemas are possible

## Author Schema

{
    "_id": int,
    "first_name": string,
    "last_name": string
}

## User Schema

{
    "_id": int,
    "username": string,
    "password": string
}

## Book Schema

{
    "_id": int,
    "title": string,
    "slug": string,
    "author": int,
    "available": boolean,
    "isbn": string,
    "pages": int,
    "publisher": {
        "city": string,
        "date": date,
        "name": string
    },
    "subjects": [ string, string ],
    "language": string,
    "reviews": [
       { "user": int, "text": string },
       { "user": int, "text": string }
    ],
}

# Example Documents

## Author Document

> db.authors.findOne()
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald"
}

## User Document

> db.users.findOne()
{
    _id: 1,
    username: "emily@10gen.com",
    password: "slsjfk4odk84k209dlkdj90009283d"
}

## Book Document

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    available: true,
    isbn: "9781857150193",
    pages: 176,
    publisher: {
        name: "Everyman's Library",
        date: ISODate("1991-09-19T00:00:00Z"),
        city: "London"
    },
    subjects: ["Love stories", "1920s", "Jazz Age"],
    language: "English",
    reviews: [
       { user: 1, text: "One of the best…" },
       { user: 2, text: "It's hard to…" }
    ]
}

# Embedded Objects #### AKA embedded or sub-documents What advantages do they have? When should they be used?

## Embedded Objects

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    available: true,
    isbn: "9781857150193",
    pages: 176,
    publisher: {
        name: "Everyman's Library",
        date: ISODate("1991-09-19T00:00:00Z"),
        city: "London"
    },
    subjects: ["Love stories", "1920s", "Jazz Age"],
    language: "English",
    reviews: [
       { user: 1, text: "One of the best…" },
       { user: 2, text: "It's hard to…" }
    ]
}

## Embedded Objects * Great for read performance * One seek to load the entire document * One round trip to the database * Writes can be slow if constantly adding to objects

# Linked Documents What advantages does this approach have? When should they be used?

## Linked Documents

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    available: true,
    isbn: "9781857150193",
    pages: 176,
    publisher: {
        publisher_name: "Everyman's Library",
        date: ISODate("1991-09-19T00:00:00Z"),
        publisher_city: "London"
    },
    subjects: ["Love stories", "1920s", "Jazz Age"],
    language: "English",
    reviews: [
       { user: 1, text: "One of the best…" },
       { user: 2, text: "It's hard to…" }
    ]
}

## Linked Documents * More, smaller documents * Can make queries by ID very simple * Accessing linked document data requires extra read * What effect does this have on the system?

# Data, RAM and Disk

# Arrays When should they be used?

## Array of Scalars

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    available: true,
    isbn: "9781857150193",
    pages: 176,
    publisher: {
        name: "Everyman's Library",
        date: ISODate("1991-09-19T00:00:00Z"),
        city: "London"
    },
    subjects: ["Love stories", "1920s", "Jazz Age"],
    language: "English",
    reviews: [
       { user: 1, text: "One of the best…" },
       { user: 2, text: "It's hard to…" }
    ]
}

## Array of Objects

> db.books.findOne()
{   _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    available: true,
    isbn: "9781857150193",
    pages: 176,
    publisher: {
        name: "Everyman's Library",
        date: ISODate("1991-09-19T00:00:00Z"),
        city: "London"
    },
    subjects: ["Love stories", "1920s", "Jazz Age"],
    language: "English",
    reviews: [
       { user: 1, text: "One of the best…" },
       { user: 2, text: "It's hard to…" }
    ],
}

## Exercise #1 Design a schema for users and their book reviews * Users * username (string) * email (string) * Reviews * text (string) * rating (integer) * created_at (date) *Usernames are immutable*

### Exercise #1: Solution A Reviews may be queried by user or book

// db.users (one document per user)
{   _id: ObjectId("…"),
    username: "bob",
    email: "bob@example.com"
}

// db.reviews (one document per review)
{   _id: ObjectId("…"),
    user: ObjectId("…"),
    book: ObjectId("…"),
    rating: 5,
    text: "This book is excellent!",
    created_at: ISODate("2012-10-10T21:14:07.096Z")
}

### Exercise #1: Solution B Optimized to retrieve reviews by user

// db.users (one document per user with all reviews)
{   _id: ObjectId("…"),
    username: "bob",
    email: "bob@example.com",
    reviews: [
        {   book: ObjectId("…"),
            rating: 5,
            text: "This book is excellent!",
            created_at: ISODate("2012-10-10T21:14:07.096Z")
        }
    ]
}

### Exercise #1: Solution C Optimized to retrieve reviews by book

// db.users (one document per user)
{   _id: ObjectId("…"),
    username: "bob",
    email: "bob@example.com"
}

// db.books (one document per book with all reviews)
{   _id: ObjectId("…"),
    // Other book fields…
    reviews: [
        {   user: ObjectId("…"),
            rating: 5,
            text: "This book is excellent!",
            created_at: ISODate("2012-10-10T21:14:07.096Z")
        }
    ]
}

## Schema Design Over an Application's Lifetime * Development * Production * Iterative Modifications

# Development Phase Basic CRUD functionality

# CreateRUD

author = {
    _id: 2,
    first_name: "Arthur",
    last_name: "Miller"
};

db.authors.insert(author);

* The `_id` field is unique and automatically indexed * MongoDB will generate an [ObjectId](http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification) if not provided

# CReadUD

> db.authors.find({ "last_name": "Miller" })
{
    _id: 2,
    first_name: "Arthur",
    last_name: "Miller"
}

## Reads and Indexing Examine the query after creating an index.

> db.books.ensureIndex({ "slug": 1 })

> db.books.find({ "slug": "the-great-gatsby" }).explain()
{
    "cursor": "BtreeCursor slug_1",
    "isMultiKey" : false,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    // Other fields follow…
}

## Multi-key Indexes Index all values in an array field.

> db.books.ensureIndex({ "subjects": 1 })

## Indexing Embedded Fields Index an embedded object's field.

> db.books.ensureIndex({ "publisher.name": 1 })

## Query operators * Conditional operators * `$gt`, `$gte`, `$lt`, `$lte`, `$ne`, `$all`, `$in`, `$nin`, `$size`, `$and`, `$or`, `$nor`, `$mod`, `$type`, `$exists` * Regular expressions * Value in an array * `$elemMatch` * Cursor methods and modifiers * `count()`, `limit()`, `skip()`, `snapshot()`, `sort()`, `batchSize()`, `explain()`, `hint()`

# CRUpdateD

review = {
    user: 1,
    text: "I did NOT like this book."
};

db.books.update(
    { _id: 1 },
    { $push: { reviews: review }}
);

## Atomic Modifiers Update specific fields within a document * `$set`, `$unset` * `$push`, `$pushAll` * `$addToSet`, `$pop` * `$pull`, `$pullAll` * `$rename` * `$bit`

# CRUDelete

> db.books.remove({ _id: 1 })

# Production Phase Evolve schema to meet the application's read and write patterns

## Read Usage Finding books by an author's first name

authors = db.authors.find({ first_name: /^f.*/i }, { _id: 1 });

authorIds = authors.map(function(x) { return x._id; });

db.books.find({author: { $in: authorIds }});

## Read Usage "Cache" the author name in an embedded document

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    author: {
        first_name: "F. Scott",
        last_name: "Fitzgerald"
    }
    // Other fields follow…
}

Queries are now one step

> db.books.find({ author.first_name: /^f.*/i })

## Write Usage Users can review a book

review = {
    user: 1,
    text: "I thought this book was great!",
    rating: 5
};

> db.books.update(
    { _id: 3 },
    { $push: { reviews: review }}
);

* Document size limit (16MB) * Storage fragmentation after many updates/deletes

## Exercise #2 * Display the 10 most recent reviews by a user * Make efficient use of memory and disk seeks

### Exercise #2: Solution Store users' reviews in monthly buckets

// db.reviews (one document per user per month)
{   _id: "bob-201210",
    reviews: [
        {   _id: ObjectId("…"),
            rating: 5,
            text: "This book is excellent!",
            created_at: ISODate("2012-10-10T21:14:07.096Z")
        },
        {   _id: ObjectId("…"),
            rating: 2,
            text: "I didn't really enjoy this book.",
            created_at: ISODate("2012-10-11T20:12:50.594Z")
        }
    ]
}

### Exercise #2: Solution Adding a new review to the appropriate bucket

myReview = {
    _id: ObjectId("…"),
    rating: 3,
    text: "An average read.",
    created_at: ISODate("2012-10-13T12:26:11.502Z")
};

> db.reviews.update(
      { _id: "bob-201210" },
      { $push: { reviews: myReview }}
);

### Exercise #2: Solution Display the 10 most recent reviews by a user

cursor = db.reviews.find(
    { _id: /^bob-/ },
    { reviews: { $slice: 10 }}
).sort({ _id: -1 });

num = 0;

while (cursor.hasNext() && num < 10) {
    doc = cursor.next();

    for (var i = 0; i < doc.reviews.length && num < 10; ++i, ++num) {
        printjson(doc.reviews[i]);
    }
}

### Exercise #2: Solution Deleting a review

cursor = db.reviews.update(
    { _id: "bob-201210" },
    { $pull: { reviews: { _id: ObjectId("…") }}}
);

# Iterative Modifications Schema design is evolutionary

## Allow users to browse by book subject

> db.subjects.findOne()
{
    _id: 1,
    name: "American Literature",
    sub_category: {
         name: "1920s",
         sub_category: { name: "Jazz Age" }
   }
}

* How can you search this collection? * Be aware of document size limitations * Benefit from hierarchy being in same document

## Tree Structures

> db.subjects.find()
{   _id: "American Literature" }

{   _id : "1920s",
    ancestors: ["American Literature"],
    parent: "American Literature"
}

{   _id: "Jazz Age",
    ancestors: ["American Literature", "1920s"],
    parent: "1920s"
}

{   _id: "Jazz Age in New York",
    ancestors: ["American Literature", "1920s", "Jazz Age"],
    parent: "Jazz Age"
}

## Tree Structures Find sub-categories of a given subject

> db.subjects.find({ ancestors: "1920s" })
{
    _id: "Jazz Age",
    ancestors: ["American Literature", "1920s"],
    parent: "1920s"
}

{
    _id: "Jazz Age in New York",
    ancestors: ["American Literature", "1920s", "Jazz Age"],
    parent: "Jazz Age"
}

## Exercise #3 * Allow users to borrow library books * User sends a loan request * Library approves or not * Requests time out after seven days * Approval process is asynchronous * Requests may be prioritized

### Exercise #3: Solution * Need to maintain order and state * Ensure that updates are atomic

// Create a new loan request
db.loans.insert({
    _id: { borrower: "bob", book: ObjectId("…") },
    pending: false,
    approved: false,
    priority: 1,
});

// Find the highest priority request and mark as pending approval
request = db.loans.findAndModify({
    query: { pending: false },
    sort: { priority: -1 },
    update: { $set: { pending: true, started: new ISODate() }},
    new: true
});

### Exercise #3: Solution * Updated and added fields * Modified document was returned

{
    _id: { borrower: "bob", book: ObjectId("…") },
    pending: true,
    approved: false,
    priority: 1,
    started: ISODate("2012-10-11T22:09:42.542Z")
}

### Exercise #3: Solution

// Library approves the loan request
db.loans.update(
    { _id: { borrower: "bob", book: ObjectId("…") }},
    { $set: { pending: false, approved: true }}
);

### Exercise #3: Solution

// Request times out after seven days
limit = new Date();
limit.setDate(limit.getDate() - 7);

db.loans.update(
    { pending: true, started: { $lt: limit }},
    { $set: { pending: false, approved: false }}
);

## Exercise #4 Allow users to recommend books * Users can recommend each book only once * Display a book's current recommendations

### Exercise #4: Solution

// db.recommendations (one document per user per book)
db.recommendations.insert({
    book: ObjectId("…"),
    user: ObjectId("…")
});

// Unique index ensures users can't recommend twice
db.recommendations.ensureIndex(
    { book: 1, user: 1 },
    { unique: true }
);

// Count the number of recommendations for a book
db.recommendations.count({ book: ObjectId("…") });

### Exercise #4: Solution * Indexes in MongoDB are not counting * Counts are computed via index scans * Denormalize totals on books

db.books.update(
    { _id: ObjectId("…") },
    { $inc: { recommendations: 1 }}
});

# Common Design Patterns

# One-to-one Relationship Let's pretend that authors only write one book.

## Linking Either side, or both, can track the relationship.

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    // Other fields follow…
}

> db.authors.findOne({ _id: 1 })
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald"
    book: 1,
}

## Embedded Object

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: {
        first_name: "F. Scott",
        last_name: "Fitzgerald"
    }
    // Other fields follow…
}

# One-to-many Relationship In reality, authors may write multiple books.

## Array of ID's The "one" side tracks the relationship.

> db.authors.findOne()
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald",
    books: [1, 3, 20]
}

* Flexible and space-efficient

## Single Field with ID The "many" side tracks the relationship.

> db.books.find({ author: 1 })
{
    _id: 1,
    title: "The Great Gatsby",
    slug: "9781857150193-the-great-gatsby",
    author: 1,
    // Other fields follow…
}

{
    _id: 3,
    title: "This Side of Paradise",
    slug: "9780679447238-this-side-of-paradise",
    author: 1,
    // Other fields follow…
}

## Array of Objects

> db.authors.findOne()
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald",
    books: [
        { _id: 1, title: "The Great Gatsby" },
        { _id: 3, title: "This Side of Paradise" }
    ]
    // Other fields follow…
}

Use `$slice` operator to return a subset of books

# Many-to-many Relationship Some books may also have co-authors.

## Array of ID's on Both Sides

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    authors: [1, 5]
    // Other fields follow…
}

> db.authors.findOne()
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald",
    books: [1, 3, 20]
}

## Array of ID's on Both Sides Query for all books by a given author

db.books.find({ authors: 1 });

Query for all authors of a given book

db.authors.find({ books: 1 });

## Array of ID's on One Side

> db.books.findOne()
{
    _id: 1,
    title: "The Great Gatsby",
    authors: [1, 5]
    // Other fields follow…
}

> db.authors.find({ _id: { $in: [1, 5] }})
{
    _id: 1,
    first_name: "F. Scott",
    last_name: "Fitzgerald"
}

{
    _id: 5,
    first_name: "Unknown",
    last_name: "Co-author"
}

## Array of ID's on One Side Query for all books by a given author

db.books.find({ authors: 1 });

Query for all authors of a given book

book = db.books.findOne(
    { title: "The Great Gatsby" },
    { authors: 1 }
);

db.authors.find({ _id: { $in: book.authors }});

## Exercise #5 * Tracking time series data * Graph recommendations per unit of time * Count by: day, hour, minute

### Exercise #5: Solution A

// db.rec_ts (time series buckets, hour and minute sub-docs)
db.rec_ts.insert({
    book: ObjectId("…"),
    day: ISODate("2012-10-11T00:00:00.000Z")
    total: 0,
    hour:   { "0": 0, "1": 0, /* … */ "23": 0 },
    minute: { "0": 0, "1": 0, /* … */ "1439": 0 }
});

// Record a recommendation created one minute before midnight
db.rec_ts.update(
    { book: ObjectId("…"), day: ISODate("2012-10-11T00:00:00.000Z") },
    { $inc: { total: 1, "hour.23": 1, "minute.1439": 1 }}
});

## BSON Storage * Sequence of key/value pairs * Not a hash map * Optimized to scan quickly

minute [0] [1] … [1439]

What is the cost of updating the minute before midnight?

## BSON Storage We can skip sub-documents

hour₀ [0] [1] … [59]

…

hour₂₃ [1380] … [1439]

How could this change the schema?

### Exercise #5: Solution B

// db.rec_ts (time series buckets, each hour a sub-doc)
db.rec_ts.insert({
    book: ObjectId("…"),
    day: ISODate("2012-10-11T00:00:00.000Z")
    total: 148,
    hour: {
        "0": { total: 7, "0": 0, /* … */ "59": 2 },
        "1": { total: 3, "60": 1, /* … */ "119": 0 },
        // Other hours…
        "23": { total: 12, "1380": 0, /* … */ "1439": 3 }
    }
});

// Record a recommendation created one minute before midnight
db.rec_ts.update(
    { book: ObjectId("…"), day: ISODate("2012-10-11T00:00:00.000Z") },
    { $inc: { total: 1, "hour.23.total": 1, "hour.23.1439": 1 }}
});

## Single-collection Inheritance Take advantage of MongoDB's features * Documents need not all have the same fields * [Sparsely index](http://www.mongodb.org/display/DOCS/Indexes#Indexes-sparse%3Atrue) only present fields

## Schema Flexibility

> db.books.findOne()
{
    _id: 47,
    title: "The Wizard Chase",
    type: "series",
    series_title: "The Wizard's Trilogy",
    volume: 2
    // Other fields follow…
}

Find all books that are part of a series

db.books.find({ type: "series" });

db.books.find({ series_title: { $exists: true }});

db.books.find({ volume: { $gt: 0 }});

## Index Only Present Fields Documents without these fields will not be indexed.

db.books.ensureIndex({ series_title: 1 }, { sparse: true });

db.books.ensureIndex({ volume: 1 }, { sparse: true });

## Exercise #6 Users can recommend at most 10 books

## Exercise #6: Solution

// db.user_recs (track user's remaining and given recommendations)
db.user_recs.insert({
    _id: "bob",
    remaining: 8,
    books: [3, 10]
});

// Record a recommendation if possible
db.user_recs.update(
    { _id: "bob", remaining: { $gt: 0 }, books: { $ne: 4 }},
    { $inc: { remaining: -1 }, $push: { books: 4 }}
});

## Exercise #6: Solution * One less unassigned recommendation remaining * Newly-recommended book is now linked

> db.user_recs.findOne()
{
    _id: "bob",
    remaining: 7,
    books: [3, 10, 4]
}

## Exercise #7 * Statistic buckets * Each book has a listing page in our application * Record referring website domains for each book * Count each domain independently

## Exercise #7: Solution A

> db.book_refs.findOne()
{   book: 1,
    referrers: [
        { domain: "google.com", count: 4 },
        { domain: "yahoo.com", count: 1 }
    ]
}

db.book_refs.update(
    { book: 1, "referrers.domain": "google.com" },
    { $inc: { "referrers.$.count": 1 }}
);

## Exercise #7: Solution A Update the position of the first matched element.

db.book_refs.update(
    { book: 1, "referrers.domain": "google.com" },
    { $inc: { "referrers.$.count": 1 }}
);

> db.book_refs.findOne()
{   book: 1,
    referrers: [
        { domain: "google.com", count: 5 },
        { domain: "yahoo.com", count: 1 }
    ]
}

What if a new referring website is used?

## Exercise #7: Solution B

> db.book_refs.findOne()
{   book: 1,
    referrers: {
        "google_com": 5,
        "yahoo_com": 1
    }
}

db.book_refs.update(
    { book: 1 },
    { $inc: { "referrers.bing_com": 1 }},
    true
);

* Replace dots with underscores for key names * Increment to add a new referring website * Upsert in case this is the book's first referrer

# Sharding

## Sharding * Ad-hoc partitioning * Consistent hashing * Amazon DynamoDB * Range based partitioning * Google BigTable * Yahoo! PNUTS * MongoDB

## Sharding in MongoDB * Automated management * Range based partitioning * Convert to sharded system with no downtime * Fully consistent

## Sharding a Collection

db.runCommand({ addshard : "shard1.example.com" });

db.runCommand({ enableSharding: "library" });

db.runCommand({
    shardCollection: "library.books",
    key: { _id : 1}
});

* Keys range from −∞ to +∞ * Ranges are stored as chunks

## Sharding Data by Chunks

db.books.save({ _id: 35, title: "Call of the Wild" });
db.books.save({ _id: 40, title: "Tropic of Cancer" });
db.books.save({ _id: 45, title: "The Jungle" });
db.books.save({ _id: 50, title: "Of Mice and Men" });

(−∞, +∞)

→

(−∞, 40)
[40, +∞)

→

(−∞, 40)
[40, 50)
[50, +∞)

Ranges are split into chunks as data is inserted

## Adding New Shards

shard₁

(−∞, 40)
[40, 50)
[50, 60)
[60, +∞)

## Adding New Shards

db.runCommand({ addshard : "shard2.example.com" });

shard₁

(−∞, 40)
[50, 60)

shard₂

[40, 50)
[60, +∞)

Chunks are migrated to balance shards

## Adding New Shards

db.runCommand({ addshard : "shard3.example.com" });

shard₁

(−∞, 40)

shard₂

[40, 50)
[60, +∞)

shard₃

[50, 60)

## Sharding Components * `mongos` * Config servers * Shards * `mongod` * Replica sets

## Sharded Writes * Inserts * Shard key required * Routed * Updates and removes * Shard key optional * May be routed or scattered

## Sharded Reads * Queries * By shard key: routed * Without shard key: scatter/gather * Sorted queries * By shard key: routed in order * Without shard key: distributed merge sort

## Exercise #8 Users can upload images for books

images

image_id: ???
data: binary

The collection will be sharded by `image_id`. What should `image_id` be?

## Exercise #8: Solutions What's the best shard key for our use case? * Auto-increment (ObjectId) * MD5 of data * Time (e.g. month) and MD5

Right-balanced Access

Random Access

Segmented Access

## Over-generalizations ### Normalization/Denormalization 1. Normalization reduces redundancy
good principle of Don't Repeat Yourself (DRY) 2. Denormalization introduces redundancy
common practices such as caching for performance ### Consistency techniques * background or nightly process for eventual consistency * look-aside cache for immediate consistency, etc.

## Over-generalizations ### The needs of your application * High consistency - yes: link, no: embed * High read performance - yes: embed, no: link * High write performance - yes: link, no: embed * High scaling - yes: link, no: embed

## Over-generalizations ### Relationships * one-to-one - yes: embed, no: ... * one-to-many - yes: embed, no: ... * many-to-many - yes: link, no: ...

## Over-generalizations ### The bad news There is no magic formula (that requires no thought). ### The good news Flexible document schema to match your application.

# Summary * Schema design is different in MongoDB. * Basic data design principles apply. * It's about your application. * It's about your data and how it's used. * It's about the entire lifetime of your application.

# Thanks! ### Questions?