bit-badger/solutions.bitbadger.documents

Daniel J. Summers 4a820f5904 Finish getting started page

2025-04-19 09:32:23 -04:00

11 KiB

Raw Blame History

Getting Started

Determining the Lay of the Land

While the aim is to provide a library that makes it easy to get up-and-running, thinking through some decisions up front will pay off as we go.

Each module has document access functions which take a JDBC Connection as its last parameter, and a version of those functions which do not. For the latter, the library will create a connection for each execution. For all languages except Java, there are also extension methods on the Connection object itself. We'll discuss connection management more below.
Document IDs default to the id property for the given domain object/document, but can be named whatever you like. The library can also generate three forms of automatic IDs for documents. As with connection management, more discussion and considerations will be presented below.
No assumption is made as to how documents are serialized and deserialized. While this allows the library to also have no dependencies on any (possibly conflicting) JSON library, it does require a two-method interface implementation. (The exception here is kotlinx, which does have a built-in serializer based on kotlinx.)

Projects will need a dependency on their chosen library. The core module is a dependency of the other modules, so Groovy, Scala, and KotlinX projects only need the groovy, scala, or kotlinx dependencies, respectively. Java or reflection-based-serialization Kotlin projects can depend on core directly.

Connections

Any database library will need a connection string, and this one is no different. If you are using a library or framework which provides a way to configure connection strings, use that. If not, a great way to configure connections (or sensitive parts of it) is via environment variables. Java's System.getenv(key) static method can read the value of an environment variable.

However the connection string is configured, the library needs to know about it. The Configuration class (found in the solutions.bitbadger.documents package, part of the core module) has a static property connectionString; set it to the connection string you have configured.

Connection Management

Once the connection string has been configured, Configuration.dbConn() will return new connections to that database. Combining this with extension methods provides several options:

Do nothing. This will result in a new Connection being obtained for each request, which may seem crazy! However, SQLite connections are local actions, and pooling PostgreSQL connections can mitigate the overhead of multiple PostgreSQL connections required to satisfy a particular application action.
Use a connection from your DI container. Combined with the extension methods (or functions with a Connection parameter), this can be a great way to introduce documents into an existing application. All queries will be executed on the given connection, and the DI container can manage the lifetime (in the context of web requests, likely per-request).
Configure this library to provide the DI container's connection. If you can set up your container to run custom code to return its objects (i.e., factories), Configuration.dbConn() can be treated as a connection factory.

Note

Those are a lot of options (and are missing ad-hoc / hybrid options). On the other hand, this is a low-stress decision for those getting started. For some, one of those options will trigger the "Yeah, that's it!" response; in that case, go with that. In others, pick one and get started. For web applications, the DI-provided connection is a good choice. The library still needs to be configured so it knows what type of database it is targeting, but the connection does not have to be provided by the library; any JDBC connection will do.

Document IDs

Naming IDs

As mentioned above, the default configuration is to use the document field id as the identifier for each document. For projects who want to use a different name (e.g., key), set the Configuration.idField property to whatever value will be used.

Unlike the connection strategy, this is a decision to make up front; once documents exist, this cannot be easily changed.

Automatic IDs

Relational databases provide several ways to create automatic IDs, the most common being ever-increasing numbers or UUIDs/GUIDs. This library provides a replacement (or approximation) of these options, all defined in the AutoId enum.

DISABLED - no automatic IDs are applied; your IDs are your ~~problem~~ responsibility.
NUMBER - a MAX + 1-style algorithm is applied if the document has a numeric ID with the value 0. (This is applied as a subquery on the INSERT statement; it should not be considered nearly as robust as a sequence.)
UUID - a String UUID is generated for documents with blank string ID fields.
RANDOM_STRING - a string of random hex characters is generated for documents with blank string ID fields; the length of this string is controlled by Configuration.idStringLength.

In all automatic generation cases, if the document being inserted has an ID value already, it is passed through unmodified.

Warning

For NUMBER auto IDs, both PostgreSQL and SQLite will have trouble if any document with a string ID is written. Numbers can be treated as strings, but strings cannot be treated as numbers. (SQLite will do its best - if a string has a numeric value, it will work - but PostgreSQL will fail spectacularly in this case.)

Tip

AutoId.generateRandomString(length) can be used to generate random hex strings of a specified length, not just the one specified in the configuration. Also, AutoId.generateUUID() can be used to generate a lowercase UUID with no dashes, regardless of the configured AutoId values.

(Non-Kotlin projects may need to specify AutoId.Companion to see these functions.)

Document Serialization

Traditional (AKA "reflection-based")

With many applications already defining a JSON API, a document data store can utilize whatever JSON serialization strategies these applications already employ. In this case, implementing a DocumentSerializer (found in the solutions.bitbadger.documents namespace) is trivial; its methods can delegate to the existing serialization and deserialization process.

For new applications, or applications that do not already have JSON serialization as part of their normal process, the integration tests for the core, groovy, and scala modules have examples of a DocumentSerializer implementation using Jackson's default options. The project will need a dependency on jackson.databind, but that implementation is trivial (thus why it's duplicated in each module's integration tests).

Once the serializer is created, set DocumentConfig.serializer property to an instance of that serializer. (DocumentConfig is in the solutions.bitbadger.documents.java package.)

Using `kotlinx.serialization`

The kotlinx module configures the serializer with the following default options:

Coerce Input Values = true; this means that null values in JSON will be represented by the class's default property value rather than being null.
Encode Defaults = true; this means properties with default values will have those values encoded as part of the output JSON.
Explicit Nulls = false; this means that null values will not be written to the output JSON. For documents with many optional values, this can make a decent size difference once many documents are stored.

Any of the KotlinX Json properties can be set on the options property of DocumentConfig in the solutions.bitbadger.documents.kotlinx package. As with reflection-based serialization, if the project already has a set of Json properties, the existing configuration can be replaced with that set.

Document Tables

Note

If you want to customize the document's id field, this needs be done before tables are created.

The final step to being able to store and retrieve documents is to define one or more tables for them. The Definition class (in the .java, .scala, or .kotlinx packages) provides an ensureTable method that creates both a table and its ID index if that table does not already exist. (ensureTable is also a Connection extension method.)

To create a document table named hotel in Java...

// Function that creates its own connection
Definition.ensureTable("hotel");
// ...or, on a connection variable named "conn"
conn.ensureTable("hotel");

Note

Most operations could throw DocumentException, which is a checked exception. Java consumers must catch or declare it; for other languages, "must" becomes "should consider".

The repeatable nature of this call means that your application can run through a set of ensureTable calls at startup.

Indexing Documents

Both PostgreSQL and SQLite support indexing individual fields in the document, just as they can index columns in a relational table. Definition provides the ensureFieldIndex method to establish these indexes. These functions take a table name, an index name, and a collection of field names from the document which should be indexed.

For example, imagine we had a user document, but we allow users to sign in via their e-mail address. In Java, this may look something like...

// Create an index named idx_user_email on the user table
Definition.ensureFieldIndex("user", "email", List.of("email"));

Multiple-field indexes are also possible.

For PostgreSQL, they provide a GIN index which can index the entire document. This index can be a full document index, which can be used for both containment queries or JSON Path matching queries, or one that is optimized for JSON Path operations. This library has a DocumentIndex enum with values Full and Optimized to specify which type of index is required. ensureDocumentIndex creates the index.

// This index will be named idx_user_doc and is suitable for any operation
Definition.ensureDocumentIndex("user", DocumentIndex.Full);

Indexes are not as important an up-front decision as some other aspects; nothing prevents a developer from adding an index when they realize they may need one.

We have access to a data store, we know how to create documents, and we have a place to store them. Now, it's time to use all that!

11 KiB Raw Blame History