307 lines
17 KiB
Markdown

# Related Documents and Custom Queries
> [!NOTE]
> This page is longer than the ideal documentation page. Understanding how to assemble custom queries requires understanding how data is stored, and the list of ways to retrieve information can be... a lot. The hope is that one reading will serve as education, and the lists of options will serve as reference lists that will assist you in crafting your queries.
## Overview
Document stores generally have fewer relationships than traditional relational databases, particularly those that arise when data is structured in [Third Normal Form][tnf]; related collections are stored in the document, and ever-increasing surrogate keys (_a la_ sequences and such) do not play well with distributed data. Unless all data is stored in a single document, though, there will still be a natural relation between documents.
Thinking back to our earlier examples, we did not store the collection of rooms in each hotel's document; each room is its own document and contains the ID of the hotel as one of its properties.
```php
class Hotel
{
string $id = '';
// ... more properties
}
class Room
{
string $id = '';
string $hotelId = '';
// ... more properties
}
```
## Document Table SQL in Depth
The library creates tables with a `data` column of type `JSONB` (PostgreSQL) or `TEXT` (SQLite), with a unique index on the configured ID name that serves as the primary key (for these examples, we'll assume it's the default `id`). The indexes created by the library all apply to the `data` column. The by-ID query for a hotel would be...
```sql
SELECT data FROM hotel WHERE data->>'id' = :id
```
...with the ID passed as the `:id` parameter.
> [!TIP]
> Using a `Query` "building block" function, `Query::whereById`, will create the `data->>'id' = :id` criteria using [the configured ID name][id].
Finding all the rooms for a hotel, using our indexes we created earlier, could use a field comparison query...
```sql
SELECT data FROM room WHERE data->>'hotelId' = :field
```
...with `:field` being "abc123"; PostgreSQL could also use a JSON containment query...
```sql
SELECT data FROM room WHERE data @> :criteria
```
...with something like `['hotelId' => 'abc123']` (serialized to JSON) passed as the matching document in the `:criteria` parameter.
So far, so good; but, if we're looking up a room, we do not want to have to make 2 queries just to also be able to display the hotel's name. The `WHERE` clause on the first query above uses the expression `data->>'id'`; this extracts a field from a JSON column as `TEXT` in PostgreSQL (or "best guess" in SQLite, but usually text). Since this is the value our unique index indexes, and we are using a relational database, we can write an efficient JOIN between these two tables.
```sql
SELECT r.data, h.data AS hotel_data
FROM room r
INNER JOIN hotel h ON h.data->>'id' = r.data->>'hotelId'
WHERE r.data->>'id' = :id
```
_(This syntax would work without the unique index; for PostgreSQL, it would default to using the GIN index (`Full` or `Optimized`), if it exists, but it wouldn't be quite as efficient as a zero-or-one unique index lookup. For SQLite, this would result in a full table scan. Both PostgreSQL and SQLite also support a `->` operator, which extracts the field as a JSON value instead of its text.)_
## Using Building Blocks
Most of the data access methods in both libraries are built up from query fragments and reusable functions; these are exposed for use in building custom queries.
### Queries
For every method or function described in [Basic Usage][], the `Query` static class or `Query` namespace contains the query for that operation.
In the `Query` class, you'll find:
- **selectFromTable** takes a table name and generates a `SELECT` statement from that name.
- **whereByFields** takes an array of field criteria and how they should be matched (`FieldMatch::Any` uses `OR`, while `FieldMatch::All` uses `AND`). `Field` has constructor functions for each `Op` it supports (`Op` is short for "operation"), and each is camelCased based on the `Op` it constructs. These functions generally take a field name and a value, but exceptions are noted below. _(Earlier versions used mostly 2-character names; these still exist for compatibility.)_
- **equal** uses `=` to create an equality comparison
- **greater** uses `>` to create a greater-than comparison
- **greaterOrEqual** uses `>=` to create a greater-than-or-equal-to comparison
- **less** uses `<` to create a less-than comparison
- **lessOrEqual** uses `<=` to create a less-than-or-equal-to comparison
- **notEqual** uses `<>` to create a not-equal comparison
- **between** uses `BETWEEN` to create a range comparison (it expects two values instead of one)
- **in** uses `IN` to create a comparison matching a set of values (it expects an array of values)
- **inArray** uses `?|` in PostgreSQL and a combination of `EXISTS / json_each / IN` in SQLite to mimic the behavior of `IN` on an array within a document (it expects the table name and an array of values)
- **exists** uses `IS NOT NULL` to create an existence comparison (requires no value)
- **notExists** uses `IS NULL` to create a non-existence comparison; fields are considered null if they are either not part of the document, or if they are part of the document but explicitly set to `null` (requires no value)
- **whereById** takes a parameter name and generates a field `Equal` comparison against the configured ID field.
- **whereDataContains** takes an optional parameter name (default is `:criteria`) and generates a JSON containment query _(PostgreSQL only)_
- **whereJsonPathMatches** takes an optional parameter name (default is `:path`) and generates a JSON Path match query _(PostgreSQL only)_
- **insert**, **save**, and **update** are the queries for those actions; all specify a `:data` parameter, and `update` also specifies an `:id` parameter against the configured ID field
Within the `PDODocument\Query` namespace, there are classes for each operation:
- **Definition** contains methods/functions to ensure tables, their keys, and field indexes exist.
- **Count**, **Exists**, **Find**, and **Delete** all require at least a table name. Their **byId** queries specify an `:id` parameter against the configured ID field (there is no `Count::byId`). Their **byFields** queries require a `Field` instance array and will use a `:field[n]` parameter if a parameter name is not provided (unless `Op::Exists` or `Op::NotExists` are used). `Count` has an `all` query which takes no further parameters and specifies no parameters.
- **Patch** and **RemoveFields** both perform partial updates. (Patching to `null` is similar, but not quite the same, as completely removing the field from the document.) Both these have the same `by*` functions as other operations.
That's a lot of reading! Some examples a bit below will help this make sense.
### Parameters
The **Parameters** class contains functions that turn values into parameters.
- **id** generates an `:id` parameter. If the ID field is an integer, it will be used as the value; otherwise, the string value of the ID will be used.
- **json** generates a user-provided-named JSON-formatted parameter for the value passed _(this can be used for PostgreSQL's JSON containment queries as well)_
- **nameFields** takes an array of `Field` criteria and generates the `:field[n]` name if it does not have a name already. This modifies the given array.
- **addFields** appends an array of `Field` criteria to the given parameter list.
- **fieldNames** creates parameters for the list of field names to be removed; for PostgreSQL, this returns a single parameter, while SQLite returns a list of parameters
In the event that no parameters are needed, pass an empty array (`[]`) in its place.
### Mapping Results
The `PDODocument\Mapper` namespace has an interface definition (`Mapper`) and several implementations of it. All mappers declare a single method, `map()`, which takes an associative array representing a database row and transforms it to its desired state.
* **DocumentMapper** deserializes the document from the given column name (default is `data`).
* **CountMapper** returns the numeric value of the first array entry.
* **ExistsMapper** returns a boolean value based on the first array entry.
* **StringMapper** returns the string value of the named field.
* **ArrayMapper** return the array as-is, with no deserialization.
We will see below how a simple custom mapper can extend or replace any of these.
## Putting It All Together
The **Custom** class has five functions:
- **list** requires a query, parameters, and a `Mapper`, and returns a `DocumentList<TDoc>` (described in an earlier section).
- **array** is the same as `::list`, except the result consumes the generator and returns the results in memory.
- **single** requires a query, parameters, and a `Mapper`, and returns one or no documents (`BitBadger\InspiredByFSharp\Option<TDoc>`).
- **scalar** requires a query, parameters, and a `Mapper`, and returns a scalar value (non-nullable; used for counts, existence, etc.)
- **nonQuery** requires a query and parameters and has no return value
> [!NOTE]
> Every other call in the library is written in terms of `Custom::list`, `Custom::scalar`, or `Custom::nonQuery`; your custom queries will use the same path the provided ones do!
Let's jump in with an example. When we query for a room, let's say that we also want to retrieve its hotel information as well. We saw the query above, but here is how we can implement it using a custom query.
```php
use PDODocument\{Configuration, Custom, Parameters, Query};
use PDODocument\Mapper\{DocumentMapper, Mapper};
// ...
// return type is Option<[Room, Hotel]>
$data = Custom::single(
"SELECT r.data AS room_data, h.data AS hotel_data
FROM room r
INNER JOIN hotel h ON h.data->>'" . Configuration::$idField . "' = r.data->>'hotelId'
WHERE r." . Query::whereById(),
[Parameters::id('my-room-key')],
new class implements Mapper {
public function map(array $result): array {
return [
(new DocumentMapper(Room::class, 'room_data'))->map($result),
(new DocumentMapper(Hotel::class, 'hotel_data'))->map($result)
];
}
});
if ($data->isSome()) {
[$room, $hotel] = $data->get();
// do stuff with the room and hotel data
}
```
This query uses `Configuration::idField` and `Query::whereById` to use the configured ID field. Creating custom queries using these building blocks allows us to utilize the configured value without hard-coding it throughout our custom queries. If the configuration changes, these queries will pick up the new field name seamlessly.
This also demonstrates a custom `Mapper`, which we can define inline as an anonymous class. It uses two different `DocumentMapper` instances to map each type, while both documents were retrieved with one query. Of course, though this example retrieved the entire document, we do not have to retrieve everything. If we only care about the name of the associated hotel, we could amend the query to retrieve only that information.
```php
use PDODocument\{Configuration, Custom, Parameters, Query};
use PDODocument\Mapper\{DocumentMapper, Mapper};
// ...
// return type is Option<[Room, string]>
$data = Custom::single(
"SELECT r.data, h.data->>'name' AS hotel_name
FROM room r
INNER JOIN hotel h ON h.data->>'" . Configuration::$idField . "' = r.data->>'hotelId'
WHERE r." . Query::whereById(),
[Parameters::id('my-room-key')],
new class implements Mapper {
public function map(array $result): array {
return [
(new DocumentMapper(Room::class, 'room_data'))->map($result),
$result['hotel_name']
];
}
});
if ($data->isSome()) {
[$room, $hotelName] = $data->get();
// do stuff with the room and hotel name
}
```
These queries are amazingly efficient, using 2 unique index lookups to return this data. Even though we do not have a foreign key between these two tables, simply being in a relational database allows us to retrieve this related data.
## Going Even Further
### Updating Data in Place
One drawback to document databases is the inability to update values in place; however, with a bit of creativity, we can do a lot more than we initially think. For a single field, SQLite has a `json_set` function that takes an existing JSON field, a field name, and a value to which it should be set. This allows us to do single-field updates in the database. If we wanted to raise our rates 10% for every room, we could use this query:
```sql
-- SQLite
UPDATE room SET data = json_set(data, 'rate', data->>'rate' * 1.1)
```
If we get any more complex, though, Common Table Expressions (CTEs) can help us. Perhaps we decided that we only wanted to raise the rates for hotels in New York, Chicago, and Los Angeles, and we wanted to exclude any brand with the word "Value" in its name. A CTE lets us select the source data we need to craft the update, then use that in the `UPDATE`'s clauses.
```sql
-- SQLite
WITH to_update AS
(SELECT r.data->>'id' AS room_id, r.data->>'rate' AS current_rate, r.data AS room_data
FROM room r
INNER JOIN hotel h ON h.data->>'id' = r.data->>'hotelId'
WHERE h.data ->> 'City' IN ('New York', 'Chicago', 'Los Angeles')
AND LOWER(h.data->>'name') NOT LIKE '%value%')
UPDATE room
SET data = json_set(to_update.room_data, 'rate', to_update.current_rate * 1.1)
WHERE room->>'id' = to_update.room_id
```
Both PostgreSQL and SQLite provide JSON patching, where multiple fields (or entire structures) can be changed at once. Let's revisit our rate increase; if we are making the rate more than $500, we'll apply a status of "Premium" to the room. If it is less than that, it should keep its same value.
First up, PostgreSQL:
```sql
-- PostgreSQL
WITH to_update AS
(SELECT r.data->>'id' AS room_id, (r.data->>'rate')::decimal AS rate, r.data->>'status' AS status
FROM room r
INNER JOIN hotel h ON h.data->>'id' = r.data->>'hotelId'
WHERE h.data->>'city' IN ('New York', 'Chicago', 'Los Angeles')
AND LOWER(h.data->>'name') NOT LIKE '%value%')
UPDATE room
SET data = data ||
('{"rate":' || to_update.rate * 1.1 || '","status":"'
|| CASE WHEN to_update.rate * 1.1 > 500 THEN 'Premium' ELSE to_update.status END
|| '"}')
WHERE room->>'id' = to_update.room_id
```
In SQLite:
```sql
-- SQLite
WITH to_update AS
(SELECT r.data->>'id' AS room_id, r.data->>'rate' AS rate, r.data->>'status' AS status
FROM room r
INNER JOIN hotel h ON h.data->>'id' = r.data->>'hotelId'
WHERE h.data->>'city' IN ('New York', 'Chicago', 'Los Angeles')
AND LOWER(h.data->>'name') NOT LIKE '%value%')
UPDATE room
SET data = json_patch(data, json(
'{"rate":' || to_update.rate * 1.1 || '","status":"'
|| CASE WHEN to_update.rate * 1.1 > 500 THEN 'Premium' ELSE to_update.status END
|| '"}'))
WHERE room->>'id' = to_update.room_id
```
For PostgreSQL, `->>` always returns text, so we need to cast the rate to a number. In either case, we do not want to use this technique for user-provided data; however, in place, it allowed us to complete all of our scenarios without having to load the documents into our application and manipulate them there.
Updates in place may not need parameters (though it would be easy to foresee a "rate adjustment" feature where the 1.1 adjustment was not hard-coded); in fact, none of the samples in this section used the document libraries at all. These queries can be executed by `Custom::nonQuery`, though, providing parameters as required.
### Using This Library for Non-Document Queries
The `Custom` functions can be used with non-document tables as well. This may be a convenient and consistent way to access your data, while delegating connection management to the library and its configured data source. The included `ArrayMapper` class will return the array from the result, and you can easily write a mapper for your classes to populate them.
Let's walk through a short example:
```php
use PDODocument\{Custom, DocumentList};
use PDODocument\Mapper\Mapper;
// Stores metadata for a given user
class MetaData
{
public string $id = '';
public string $userId = '';
public string $key = '';
public string $value = '';
// Define a static method that returns the mapper
public static function mapper(): Mapper
{
return new class implements Mapper {
public function map(array $results): MetaData
{
$it = new MetaData();
$it->id = $results['id'];
$it->userId = $results['userId'];
$it->key = $results['key'];
$it->value = $results['value'];
return $it;
}
};
}
}
// somewhere retrieving data; type is DocumentList<MetaData>
function metaDataForUser(string $userId): DocumentList
{
return Custom::list("SELECT * FROM user_metadata WHERE user_id = :userId",
[":userId" => $userId)], MetaData::mapper());
}
```
[tnf]: https://en.wikipedia.org/wiki/Third_normal_form "Third Normal Form &bull; Wikipedia"
[id]: ../getting-started.md#configuring-id-fields "Getting Started (ID Fields) &bull; PDODocument &bull; Relational Documents"
[Basic Usage]: ../basic-usage.md "Basic Usage &bull; PDODocument &bull; Relational Documents"