2025-04-14 12:02:20 +00:00
7 changed files with 113 additions and 5 deletions
--- a/concepts/application-trade-offs.md
+++ b/concepts/application-trade-offs.md
@ -20,6 +20,7 @@ One big downside, when contrasted with ORMs, is that document database drivers a

 Another difference is that an "update," in document database terms, usually refers to replacing an entire document. The only property which cannot be changed is the ID; any other property can be added, updated, or removed by an update. To partially update a document, the term is "patch." A patch specifies the JSON which should be present after the update, and search criteria to identify documents to be patched. (Patching by ID is fine.)

+> [!WARNING]
 > This is where some consistency risk increases. There is no currency check on what the value was when it was obtained, and even if some other process has patched it, the patch command will not be rejected. While patches that use commands like "increment" or "append" (provided by some document databases) will succeed as expected, others may not. Imagine a library patron whose last name was entered as `Johnsen`. The patron goes to the check-out desk to have it updated to `Janssen` - but someone in the back was doing a quality check on new patrons, and decided to "fix the typo" by correcting it to `Johnson`. If both these updates happened after users had retrieved `Johnsen`'s record, the last one to press "Save" would overwrite the previous one.

 ### Overfetching
--- a/concepts/document-design-considerations.md
+++ b/concepts/document-design-considerations.md
@ -0,0 +1,3 @@
+# Document Design Considerations
+
+_todo_
--- a/concepts/hybrid-data-stores.md
+++ b/concepts/hybrid-data-stores.md
@ -0,0 +1,93 @@
+# Hybrid Data Stores
+
+If you have been reading this series in order, you likely have thought about applications you may have worked with in the past, and considered how a document structure may look. Perhaps there were many thoughts like "Oh yeah, that makes sense," or "Oh, that would be way easier than...." _(If so, then this series of articles has had its intended effect - so far.)_
+
+Conversely, there may have been other thoughts - things like "We can't do that because..." or "We really need ACID guarantees for \[scenario\]." While some information may work as documents, others cannot. This limiting factor means that the data store will stay with a relational database, in spite of the extra work it takes to maintain a rigid structure throughout the application. Besides, we're already used to it!
+
+If both paragraphs above capture your thoughts, you are in the target market for a hybrid data store. Rather that jettison one data structure and switch to another, we can incorporate the good parts from both in a single data store.
+
+## Relational Data Structures
+
+### Can Be Flexible...
+
+There is a reason that relational data stores are the default representation for data at rest throughout the industry, both professional and hobbyist. Despite its depictions as rigid and structured, it is incredibly flexible as to the rigid structure one can define - and nullable columns and foreign keys provide the ability to model that a relationship may not exist.
+
+Content management systems (CMSs) are the textbook example of how one can build a flexible application using relational tables. Applications such as WordPress and SharePoint are commonplace - you likely know both without any further research - yet both are built on relational databases (MySQL and SQL Server, respectively). As of this writing, WordPress serves just over 50% of the Internet, and SharePoint is the _de facto_ standard in enterprises, serving countless sites both on the public Internet and in private networks (and serves as the backend store for Microsoft Teams' file sharing).
+
+### ...But Can Be Complex
+
+At one point in his career, this author joked about making a table called `the_table` with two columns (`id` - 32-character string - and `the_field`, an unbounded text field), which would have been the ultimate flexible schema for the application in question (which, at the time, used a data model that was neither relational nor document-oriented). This joke was taken as such (and appreciated) because it pointed out that, for all the complains about how inflexible our current data structure was, it _could_ be replaced with something even worse. Even the human body would be reduced to a puddle of goo were it not for the skeletal structure on which our organs rely.
+
+_As segues go - is there any entity more complex than the human body? Thankfully, database structures are well above the level of molecular biology or physiology!_
+
+The two examples above, though, show different approaches to the flexible-relational paradigm. The [WordPress schema][wp-schema] is not huge - 12 tables as of this writing - yet all the content for some major media sites is contained (mostly; plugins can create tables for their own use) in those tables. In the `wp_posts` table, there are:
+
+- Blog posts (initial post, maybe updates, comments, categorizations, etc.)
+- Static pages (timeless content that may change infrequently; similar to this page)
+- Any other custom content item
+
+The key to this is the `post_type` column, a free-form 20-character field used to indicate what sort of content this "post" represents. But, because of this flexibility, the schema supports data structures that are not valid in the application. Blog posts can have categories (broad topics to which the post applies) and tags (distinct points mentioned in the particular post), and both are stored in the `wp_term_relationships` table. Static pages, also stored in `wp_posts`, have neither of these applied to them, yet there is nothing in the data model that prevents lots of `wp_term_relationships` rows for these as well (which may be valid, if a plugin adds some other way of categorizing pages).
+
+> [!NOTE]
+> The purpose of the above is not to rip on WordPress's schema. What that project has implemented with just a few tables, and the way they ensure that major and minor upgrade version are supported, is nigh-heroic. This author's biggest quibble with the WordPress schema has to do with using plural nouns for their tables; it's the "post" table, not the "posts" table, darn it....
+
+If, from a data structure perspective, WordPress is complex - SharePoint blows it out of the water. It uses _so many tables_ - dynamically creating them in some cases - that any analysis and extension is very difficult. And, once an outside observer figures it out, that observer should harbor no reasonable expectation that the structure will not change with the next release.
+
+Perhaps an enterprise-level application that creates sites for an arbitrary number of organizations, with an arbitrary structure and arbitrary content, needs this level of complexity. (Again - no shade on SharePoint here, but no one can claim it _does not_ use a complex schema!) This author suspects that the average reader here does not.
+
+## Document Data Structures
+
+> Complexity is a subsidy.<br>_<small>&ndash; Jonah Goldberg</small>_
+
+The above quote has, admittedly, been yanked from its original context, but it applies here more than we may initially think. The original context refers to government regulations which impose certain burdens on businesses; any legal business must comply with them. As the compliance cost rises, businesses which cannot absorb the overhead of that compliance become non-viable. What may be "budget dust" for a large business may be a cost-prohibitive capital expenditure for a small one.
+
+What does this have to do with databases? Conceptually, we are dealing with similar issues. Shoe-horning a flexible data structure into a relational database is not without costs; and, while this is the fifth in a series of articles explaining the simpler way, what we are after is really that - a simpler way to manage structurally-flexible data.
+
+### Thank You, {vendor_name}
+
+The heading above is rendered correctly. Nearly every relational data store has incorporated a JSON data type; [Oracle][], [SQL Server][], [MySQL][] and [MariaDB][] _(sadly, diverging implementations implemented mostly after the project fork)_, [PostgreSQL][], and [SQLite][] have all recognized the advantages of documents, and incorporated them in their database engines to varying degrees.
+
+> [!NOTE]
+> As of this writing, PostgreSQL is the winner for document integration. It has two different options for JSON columns (`JSON`, which stores the original text given; and `JSONB`, which stores a parsed binary representation of the text). Additionally, its indexing options can provide efficient document access for any field in the document. It also provides querying options by "containment" (a given document is contained in the field) and by JSON Path (a given document matches an expression). SQLite's implementation was (admittedly) inspired by PostgreSQL's operators.
+
+Thanks to these vendors' efforts, there is a very high likelihood that whatever relational data storage solution you may be currently using may support this hybrid structure today - no upgrades or patching needed!
+
+## Mixing Relational Tables and Documents 
+
+So... how would we tie this all together? The solution sounds simple (and, in some cases, may be) - create tables with document columns for data where the document paradigm fits nicely, while leaving the needs-to-be-relational data in tables. As with many things in the realm of software development, though, a simple idea can lead to a complex implementation.
+
+### Theory
+
+Let's tackle the "simple" part (for no other reason that simplifying the data structure is _the entire point_ of this series). If we have an accounting system with balances, ledgers, debits and credits, etc., we likely have a scenario where we will need a relational, constraint-enforced data store. Customer support calls for this account, though, can vary; they can be general, they can apply to an account overall, or they can apply to a specific transaction. While a relational data store _could_ implement this, a document may be a better choice, particularly for capturing the various calls and actions which may occur over time.
+
+In this scenario, we could have an `account` table along with a `account_transaction` table. Each transaction is tied to an account, as well as the transaction preceding it; we store the "balance forward", the amount of the transaction, and the new balance, as well as a mandatory link to the previous transaction. This prevents miscoded applications (or nefarious database accessors) from removing a large debit transaction, making the account have a higher balance than it should.
+
+We could also have a `support_ticket` table which records communications from the customer from the initial contact through to resolution. We could easily use a document for this, with an array of notes for each communication back-and-forth between the client and the customer. This document could also have an optional link to the account or transaction to which the incident referred, as well as a link to the customer in question.
+
+We - of course - do not want to lose any of that data; however, most of these relationships are optional. What happens if the account is closed - or, beyond that, if a customer is deleted? A purely relational architecture could specifically address this; however, support tickets as documents gives us a different form of traceability by default. We will still have record of the interaction, because the `support_ticket`'s presence did not prevent further business action on the account or customer. At the same time, support tickets did not prevent us from closing an account; the business remained able to take action where needed.
+
+### Practice
+
+As one reviews the APIs in the `Custom` type for any of the projects here, one notices that the document-returning functions take - as their last parameter - a mapping function, which translates between the database row and the expected return item. Each library has predefined functions to return domain items; return a JSON string with matching results; or write the JSON results directly to the output.
+
+As these are designed to expect functions, though, this allows these libraries to be used to not only return deserialized (or raw) JSON documents, but for any required domain item. The function passed to these `Custom` calls can select one field and deserialize it; or, it can pluck various fields from a row result and construct a domain item; or, it can transform these results into some other form.
+
+This enables a true relational-and/or-document (AKA hybrid) data store. Tables against which `Find`, `Document.insert`, etc. are executed are assumed to be document tables, while `Custom` functions/methods allow relational data as well. While not an object-relational model (ORM), writing a to/from mapping for a domain object allows either model to be used in the same data-access paradigm.
+
+> [!NOTE]
+> These libraries provide a nice API for these actions - and, of course, are the reason this page exists! However, even if one were to never use any of these libraries, these principles still stand.
+ 
+## Is This Right for Me?
+
+If one has read from the beginning up this point, but is still looking for permission to take the leap - a dry-erase board is your friend. Diagram the tables / documents, brainstorm their interactions, consider the real-world constraints vs. the ones each paradigm lets you model/enforce via the database, and decide from there. (For this author, this is a much simpler data structure which fits all of his side projects perfectly, and one he wishes he could embrace at his more enterprise-y day job.)
+
+Even if the answer is "no," please skim the top part of the next article; some design considerations transcend the document/table decision. In the final article in this series, we will consider the best way to design data structures.
+
+
+[wp-schema]: https://codex.wordpress.org/Database_Description "Database Description &bull; WordPress Codex"
+[Oracle]: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-in-oracle-database.html "JSON in Oracle Database &bull; Oracle"
+[SQL Server]: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16 "JSON Data in SQL Server &bull; Microsoft Learn"
+[MySQL]: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16 "The JSON Data Type &bull; MySQL"
+[MariaDB]: https://mariadb.com/kb/en/json/ "JSON Data Type &bull; MariaDB"
+[PostgreSQL]: https://www.postgresql.org/docs/current/functions-json.html "JSON Functions &bull; PostgreSQL"
+[SQLite]: https://sqlite.org/json1.html "JSON Functions and Operators &bull; SQLite"
--- a/concepts/toc.yml
+++ b/concepts/toc.yml
@ -5,4 +5,8 @@
 - name: Relational / Document Trade-Offs
  href: relational-document-trade-offs.md
 - name: Application Trade-Offs
-  href: application-trade-offs.md
+  href: application-trade-offs.md
+- name: Hybrid Data Stores
+  href: hybrid-data-stores.md
+- name: Document Design Considerations
+  href: document-design-considerations.md
--- a/docfx.json
+++ b/docfx.json
@ -15,7 +15,8 @@
      {
        "files": [
          "images/**",
-          "bitbadger-doc.png"
+          "bitbadger-doc.png",
+          "favicon.ico"
        ]
      }
    ],
@ -29,6 +30,7 @@
      "_appName": "Relational Documents",
      "_appTitle": "Relational Documents",
      "_appLogoPath": "bitbadger-doc.png",
+      "_appFaviconPath": "favicon.ico",
      "_appFooter": "Hand-crafted documentation created with <a href=https://dotnet.github.io/docfx target=_blank class=external>docfx</a> by <a href=https://bitbadger.solutions target=_blank class=external>Bit Badger Solutions</a>",
      "_enableSearch": true,
      "pdf": false
--- a/favicon.ico
+++ b/favicon.ico
--- a/index.md
+++ b/index.md
@ -6,7 +6,7 @@ _layout: landing

 _(this is a work-in-progress landing page for libraries that allow PostgreSQL and SQLite to be treated as document databases; it will eventually explain the concepts behind this, allowing the documentation for each library to focus more on "how" and less on "why")_

-## Libraries
+## Code

 These libraries provide a convenient <abbr title="Application Programming Interface">API</abbr> to treat PostgreSQL or SQLite as document stores.

@ -19,10 +19,13 @@ Use for PHP applications (8.2+)
 **solutions.bitbadger.documents** ~ Documentation _(soon)_ ~ Git _(soon)_<br>
 Use for <abbr title="Java Virtual Machine">JVM</abbr> applications (Java, Kotlin, Groovy, Scala)

-## Learning
+## Concepts

 When we use the term "documents" in the context of databases, we are referring to a database that stores its entries in a data format (usually a form of JavaScript Object Notation, or JSON). Unlike relational databases, document databases tend to have a relaxed schema; often, document collections or tables are the only definition required - and some even create those on-the-fly the first time one is accessed!

+> [!NOTE]
+> This content was originally hosted on the [Bit Badger Solutions][] main site; references to "the software that runs this site" is referencing [myWebLog][], an application which uses the .NET version of this library to store its data in a hybrid relational / document format.
+
 _Documents marked as "wip" are works in progress (i.e., not complete). All of these pages should be considered draft quality; if you are reading this, welcome to the early access program!_

 **[A Brief History of Relational Data][hist]**<br>Before we dig in on documents, we'll take a look at some relational database concepts
@ -42,8 +45,10 @@ _Documents marked as "wip" are works in progress (i.e., not complete). All of th
 [pdoc-git]: https://git.bitbadger.solutions/bit-badger/pdo-document "PDODocument • Bit Badger Solutions Git"
 [jvm-dox]: ./jvm/ "solutions.bitbadger.documents • Bit Badger Solutions"
 [jvm-git]: https://git.bitbadger.solutions/bit-badger/solutions.bitbadger.documents "solutions.bitbadger.documents • Bit Badger Solutions Git"
+[Bit Badger Solutions]: https://bitbadger.solutions "Bit Badger Solutions"
+[myWebLog]: https://bitbadger.solutions/open-source/myweblog/ "myWebLog &bull; Bit Badger Solutions"
 [hist]: ./concepts/a-brief-history-of-relational-data.md "A Brief History of Relational Data • Bit Badger Solutions"
 [what]: ./concepts/what-are-documents.md "What Are Documents? • Bit Badger Solutions"
 [trade]: ./concepts/relational-document-trade-offs.md "Relational / Document Trade-Offs • Bit Badger Solutions"
 [app]: ./concepts/application-trade-offs.md "Application Trade-Offs • Bit Badger Solutions"
-[hybrid]: ./hybrid-data-stores.html "Hybrid Data Stores • Bit Badger Solutions"
+[hybrid]: ./concepts/hybrid-data-stores.md "Hybrid Data Stores • Bit Badger Solutions"