11 KiB
Getting Started
Determining the Lay of the Land
While the aim is to provide a library that makes it easy to get up-and-running, thinking through some decisions up front will pay off as we go.
- Each module has document access functions which take a JDBC
Connection
as its last parameter, and a version of those functions which do not. For the latter, the library will create a connection for each execution. For all languages except Java, there are also extension methods on theConnection
object itself. We'll discuss connection management more below. - Document IDs default to the
id
property for the given domain object/document, but can be named whatever you like. The library can also generate three forms of automatic IDs for documents. As with connection management, more discussion and considerations will be presented below. - No assumption is made as to how documents are serialized and deserialized. While this allows the library to also have no dependencies on any (possibly conflicting) JSON library, it does require a two-method interface implementation. (The exception here is
kotlinx
, which does have a built-in serializer based onkotlinx
.)
Projects will need a dependency on their chosen library. The core
module is a dependency of the other modules, so Groovy, Scala, and KotlinX projects only need the groovy
, scala
, or kotlinx
dependencies, respectively. Java or reflection-based-serialization Kotlin projects can depend on core
directly.
Connections
Any database library will need a connection string, and this one is no different. If you are using a library or framework which provides a way to configure connection strings, use that. If not, a great way to configure connections (or sensitive parts of it) is via environment variables. Java's System.getenv(key)
static method can read the value of an environment variable.
However the connection string is configured, the library needs to know about it. The Configuration
class (found in the solutions.bitbadger.documents
package, part of the core
module) has a static property connectionString
; set it to the connection string you have configured.
Connection Management
Once the connection string has been configured, Configuration.dbConn()
will return new connections to that database. Combining this with extension methods provides several options:
- Do nothing. This will result in a new
Connection
being obtained for each request, which may seem crazy! However, SQLite connections are local actions, and pooling PostgreSQL connections can mitigate the overhead of multiple PostgreSQL connections required to satisfy a particular application action. - Use a connection from your DI container. Combined with the extension methods (or functions with a
Connection
parameter), this can be a great way to introduce documents into an existing application. All queries will be executed on the given connection, and the DI container can manage the lifetime (in the context of web requests, likely per-request). - Configure this library to provide the DI container's connection. If you can set up your container to run custom code to return its objects (i.e., factories),
Configuration.dbConn()
can be treated as a connection factory.
Note
Those are a lot of options (and are missing ad-hoc / hybrid options). On the other hand, this is a low-stress decision for those getting started. For some, one of those options will trigger the "Yeah, that's it!" response; in that case, go with that. In others, pick one and get started. For web applications, the DI-provided connection is a good choice. The library still needs to be configured so it knows what type of database it is targeting, but the connection does not have to be provided by the library; any JDBC connection will do.
Document IDs
Naming IDs
As mentioned above, the default configuration is to use the document field id
as the identifier for each document. For projects who want to use a different name (e.g., key
), set the Configuration.idField
property to whatever value will be used.
Unlike the connection strategy, this is a decision to make up front; once documents exist, this cannot be easily changed.
Automatic IDs
Relational databases provide several ways to create automatic IDs, the most common being ever-increasing numbers or UUIDs/GUIDs. This library provides a replacement (or approximation) of these options, all defined in the AutoId
enum.
DISABLED
- no automatic IDs are applied; your IDs are yourproblemresponsibility.NUMBER
- aMAX + 1
-style algorithm is applied if the document has a numeric ID with the value0
. (This is applied as a subquery on theINSERT
statement; it should not be considered nearly as robust as a sequence.)UUID
- aString
UUID is generated for documents with blank string ID fields.RANDOM_STRING
- a string of random hex characters is generated for documents with blank string ID fields; the length of this string is controlled byConfiguration.idStringLength
.
In all automatic generation cases, if the document being inserted has an ID value already, it is passed through unmodified.
Warning
For
NUMBER
auto IDs, both PostgreSQL and SQLite will have trouble if any document with a string ID is written. Numbers can be treated as strings, but strings cannot be treated as numbers. (SQLite will do its best - if a string has a numeric value, it will work - but PostgreSQL will fail spectacularly in this case.)
Tip
AutoId.generateRandomString(length)
can be used to generate random hex strings of a specified length, not just the one specified in the configuration. Also,AutoId.generateUUID()
can be used to generate a lowercase UUID with no dashes, regardless of the configuredAutoId
values.(Non-Kotlin projects may need to specify
AutoId.Companion
to see these functions.)
Document Serialization
Traditional (AKA "reflection-based")
With many applications already defining a JSON API, a document data store can utilize whatever JSON serialization strategies these applications already employ. In this case, implementing a DocumentSerializer
(found in the solutions.bitbadger.documents
namespace) is trivial; its methods can delegate to the existing serialization and deserialization process.
For new applications, or applications that do not already have JSON serialization as part of their normal process, the integration tests for the core
, groovy
, and scala
modules have examples of a DocumentSerializer
implementation using Jackson's default options. The project will need a dependency on jackson.databind
, but that implementation is trivial (thus why it's duplicated in each module's integration tests).
Once the serializer is created, set DocumentConfig.serializer
property to an instance of that serializer. (DocumentConfig
is in the solutions.bitbadger.documents.java
package.)
Using kotlinx.serialization
The kotlinx
module configures the serializer with the following default options:
- Coerce Input Values = true; this means that
null
values in JSON will be represented by the class's default property value rather than beingnull
. - Encode Defaults = true; this means properties with default values will have those values encoded as part of the output JSON.
- Explicit Nulls = false; this means that
null
values will not be written to the output JSON. For documents with many optional values, this can make a decent size difference once many documents are stored.
Any of the KotlinX Json properties can be set on the options
property of DocumentConfig
in the solutions.bitbadger.documents.kotlinx
package. As with reflection-based serialization, if the project already has a set of Json
properties, the existing configuration can be replaced with that set.
Document Tables
Note
If you want to customize the document's
id
field, this needs be done before tables are created.
The final step to being able to store and retrieve documents is to define one or more tables for them. The Definition
class (in the .java
, .scala
, or .kotlinx
packages) provides an ensureTable
method that creates both a table and its ID index if that table does not already exist. (ensureTable
is also a Connection
extension method.)
To create a document table named hotel
in Java...
// Function that creates its own connection
Definition.ensureTable("hotel");
// ...or, on a connection variable named "conn"
conn.ensureTable("hotel");
Note
Most operations could throw
DocumentException
, which is a checked exception. Java consumers must catch or declare it; for other languages, "must" becomes "should consider".
The repeatable nature of this call means that your application can run through a set of ensureTable
calls at startup.
Indexing Documents
Both PostgreSQL and SQLite support indexing individual fields in the document, just as they can index columns in a relational table. Definition
provides the ensureFieldIndex
method to establish these indexes. These functions take a table name, an index name, and a collection of field names from the document which should be indexed.
For example, imagine we had a user
document, but we allow users to sign in via their e-mail address. In Java, this may look something like...
// Create an index named idx_user_email on the user table
Definition.ensureFieldIndex("user", "email", List.of("email"));
Multiple-field indexes are also possible.
For PostgreSQL, they provide a GIN index which can index the entire document. This index can be a full document index, which can be used for both containment queries or JSON Path matching queries, or one that is optimized for JSON Path operations. This library has a DocumentIndex
enum with values Full
and Optimized
to specify which type of index is required. ensureDocumentIndex
creates the index.
// This index will be named idx_user_doc and is suitable for any operation
Definition.ensureDocumentIndex("user", DocumentIndex.Full);
Indexes are not as important an up-front decision as some other aspects; nothing prevents a developer from adding an index when they realize they may need one.
We have access to a data store, we know how to create documents, and we have a place to store them. Now, it's time to use all that!