relational-documents/concepts/a-brief-history-of-relational-data.md

98 lines
6.6 KiB
Markdown

# A Brief History of Relational Data
## Relational Data
Relational databases were not the first data structure, but when people talk about document databases, their behavior is usually contrasted with relational databases. We do not need a PhD-level knowledge of these databases, but some high-level concepts will serve us well.
The "relation" in "relational database" names the concept that instances of data can be linked (related) to other pieces of data. Think of a system that keeps track of books for a library; some items they would need to track the book's title, its author, how many copies they have, and who has a copy checked out. If they were to make a new "book" entry for each physical copy, they would end up repeating nearly everything. With a relational database, though, we could structure tables where very little would be duplicated.
- An `author` table could hold the author's name, plus biographical information, etc.
- A `patron` table could hold the library cardholder's information
- The `book` table could have the name of the book and how many copies the library owns
- A `book_author` table has the ID of a book and the ID of an author
- A `book_checked_out` table could have the ID of a book, the ID of a patron, and the return date
In this example, we have 5 tables to hold the information, and two of those are there solely for the purpose of associating entities with each other. When we think about this structure, there are some interesting ways of parsing the data that weren't covered by the description above.
- Books can have multiple authors; this structure also provides an easy way to find books that an author has written.
- We can count the occurrences of a book in the `book_checked_out` table and subtract that from the copies we own to determine how many copies are available for check out.
- We can easily track what book a single patron has checked out.
- If an author's name changes, when we update the `author` table, the system picks up the new name.
Notice the word "could" in the descriptions of the tables; there are different ways to define relations among entities, and database purists could also come up with scenarios that this structure does not cover. The intent here is to present an isolated yet non-trivial working example that we can use as we think through how this data is structured.
## The ORM Bridge
In high-level programming languages, developers create structures with data representing some entity. Most database drivers deal in tables, rows, and columns. If we were to use one of those libraries, we would end up writing several queries and lots of property setting to constitute a domain object from the relational data in our tables.
An Object-Relational Mapper (ORM) library helps to translate between these structures and the relational database. There are lots of these tools, and they have some pretty solid up-sides in most cases. Take, for example, C# objects and Microsoft's Entity Framework Core (EF Core).
```csharp
// This is the definition of a book
public class Book
{
public long Id {get; set;} = 0L;
public string Title {get; set;} = "";
public int CopiesOnHand {get; set;} = 0;
}
// This is how we retrieve a book
// - ctx is an open database context (connection)
// - theId is the variable with the ID of the book we want to retrieve
var book = await ctx.Books.FirstOrDefaultAsync(book => book.Id = theId);
```
This illustrates the simplicity of code using an ORM library. Rather than create a query, execute it, check to see if anything was returned, then assign each column from the result to a property in the class - it's the single line above. (If no `Book` for that ID exists, the `book` variable will be `null`.)
That's great for the book information, but that's not all we need to access; we need authors for the book and we need patrons to check them out. We're not trying to become EF Core experts, but adding this information looks something like this...
```csharp
public class Author
{
public long Id {get; set;} = 0L;
public string Name {get; set;} = ""; // naive
// dates of birth/death, bio, etc.
// A "navigation property" to find books by author
public ICollection<Book> Books {get; init;} = new List<Book>();
}
public class Patron
{
public long Id {get; set;} = 0L;
public string Name {get; set;} = "";
public string Phone {get; set;} = "";
// skipping the navigation property here
}
public class CheckOut
{
public long BookId {get; set;} = 0L;
public long PatronId {get; set;} = 0L;
public DateTime ReturnDate {get; set;} = DateTime.Now;
// "Navigation properties"
public Book Book {get; set;} = default!;
public Patron Patron {get; set;} = default!;
}
// A new Book class
public class Book
{
// properties as before, then...
// ...more navigation properties
public ICollection<Author> Authors {get; init;} = new List<Author>();
public ICollection<CheckOut> CheckOuts {get; init;} = new List<CheckOut>();
}
```
Notice that the `Author` has a collection of `Book`s, and a `Book` has a collection of `Author`s. This is how the `book_author` table is represented. For checked-out books, we actually have a type that links a `Book` to a `Patron` and also stores the return date. EF Core's "navigation properties" are how it exposes these other entities within .NET code. If we do that same simple `Book` retrieval command from the first example, we can now traverse these properties to display the author's name, who has books checked out, and when they're due back.
## Referential Integrity
The term "referential integrity" is used to describe how these relations are kept in sync. In our example above, we wouldn't want a `book_author` record pointing to either a book or an author that does not exist. We would not want to allow a patron to be deleted while they still had books checked out. (And, we may not want to allow them to be deleted at all! If we need to keep records of who checked out what book, when, and when they were returned, we would lose those records if we deleted the patron.)
This is an area where relational databases excel. The IDs we have in some tables that point to tables where rows have that ID are called foreign keys. Relational databases allow us to define foreign keys that must exist; ones that must exist or can be missing; and what should happen when the parent record is deleted. To accomplish this, indexes will be applied to these key fields; this not only lets the integrity checks happen quickly, these can also be used to create queries to join the information efficiently.
## So... What's the Problem?
There are no problems, only trade-offs. Let's look at how we get to documents from here.