# What Are Documents? ## Structure Optional The majority of the [previous page][prev] was dedicated to describing a conceptual structure of our data, and how that is structured in a high-level language with an ORM library. This is not a bad thing on its own; most data has a defined structure. What happens when that structure changes? Or, what happens when we may not know the structure? This is where the document database can provide benefits. We did not show the SQL to create the tables in the library example, but our book type might look something like this in SQLite: ```sql CREATE TABLE book ( id NUMBER NOT NULL PRIMARY KEY, title TEXT NOT NULL, copies_on_hand INTEGER NOT NULL DEFAULT 0); ``` If we wanted to add, for example, the date the library obtained the book, we would have to change the structure of the table... ```sql ALTER TABLE book ADD COLUMN date_obtained DATE; ``` Document databases do not require anything like this. For example, creating a `book` collection in MongoDB, using their JavaScript API, is... ```javascript db.createCollection('book') ``` The only structure requirement is that each document have some field that can serve as an identifier for documents in that table. MongoDB uses `_id` by default, but that can be configured by collection. ## Mapping the Entities In our library, we had books, authors, and patrons as entities. In an equivalent document database setup, we would likely still have separate collections for each. A `book` document might look something like... ```json { "Id": 342136, "Title": "Little Women", "CopiesOnHand": 3 } ``` Because no assumptions are made on structure, if we began adding books with a `DateObtained` field, the database would simply add it, no questions asked. ```json { "Id": 452343, "Title": "The Hunt for Red October", "DateObtained": "1986-10-20", "CopiesOnHand": 1 } ``` The only field the database cares about is `Id`, assuming we specified that for our collection's ID. ## Mapping the Relations We certainly could bring `book_author` and `book_checked_out` across as documents in their own collection. However, document databases do not (generally) have the concept of foreign keys. Let's first tackle the book/author relationship. JSON has an array type, which allows multiple entries of the same type to be entered. We can add an `Authors` property to our `book` document: ```json { "Id": 342136, "Title": "Little Women", "Authors": [55923], "CopiesOnHand": 3 } ``` With this structure, if we're rendering search results and want to display the author's name(s) next to the title, we will either need to query the `author` collection for each ID in our `Authors` array, or come up with a projection that crosses two collections. Since we're still storing properties of a `book`, though, we could include the author's name. ```json { "Id": 342136, "Title": "Little Women", "Authors": [{ "Id": 55923, "Name": "Alcott, Louisa May" }], "CopiesOnHand": 3 } ``` This document does a lot for us; we can now see the title and the authors all together, and the IDs being there would allow us to dig into the data further. If we were writing a Single-Page Application (SPA), this could be used without any transformation at all. Conversely, any application code would have to be aware of this structure. Our C# code from the last page would now likely need a `DisplayAuthor` type, and `Authors` would be `ICollection`. We also see our first instance of repeated data. The next page will be a deeper discussion of the trade-offs we should consider. For now, though, we still need to represent the checked out books. We can use a similar technique as we did for authors, including the return date. ```json { "Id": 342136, "Title": "Little Women", "Authors": [{ "Id": 55923, "Name": "Alcott, Louisa May" }], "CopiesOnHand": 3, "CheckedOut": [{ "Id": 45112, "Name": "Anderson, Alice", "ReturnDate": "2025-04-02" }, { "Id": 38472, "Name": "Brown, Barry", "ReturnDate": "2025-03-27" }] } ``` ## Structure Reconsidered One of the big marketing points for document databases is their ability to handle "unstructured data." I won't go as far as saying that's something that doesn't exist, but the _vast_ majority of data described this way is data whose structure is unknown to the person considering doing something with it. The data itself has structure, but they do not know what it is when they get started - usually a prerequisite for creating the data store. On rare occasions, there may be data sets with several structures mixed together in the same set; even in these data sets, though, the cacophony usually turns out to be a finite set of structures, mixed inconsistently. Keep that in mind as we look at some of the trade-offs between document and relational databases. Just as your body needs its skeletal structure against which your muscles and organs can work, your data _has_ structure. Document databases do not abstract that away. [prev]: ./a-brief-history-of-relational-data.md "A Brief History of Relational Data • Relational Documents • Bit Badger Solutions"