JournalSpecs¶
JournalSpecs declare and configure journals managed by a Gazette broker cluster.
Users generally work with JournalSpecs in YAML form using the gazctl
tool,
in a very similar way to the management of Kubernetes resources using kubectl
.
Journals names form a flat key-space. While it’s common practice to capture some semantic hierarchy in the path components of journal names, it’s important to understand that these have no particular meaning to the broker cluster.
To make it easy to work with JournalSpecs in YAML form, however, the gazctl
tool converts to and from a hierarchical representation for end-user
presentation. Intermediate nodes of the hierarchy are “directories”
(indicated by a trailing ‘/’ in their name), and terminal nodes represent
journals (which never have a trailing ‘/’).
Journal YAMLs are mapped to complete JournalSpecs by merging configuration from parent to child. In other words, where a journal provides no value for a configuration property it derives a value from its closest parent which specifies a non-zero value.
When deriving new YAML to present a set of selected JournalSpecs, gazctl
“hoists” property values shared by JournalSpecs to a representative parent
directory node which has a common prefix of those JournalSpecs.
This conversion happens entirely within the tool – JournalSpecs sent to or queried from brokers are full and complete individual specifications.
Example YAML¶
It’s common for teams to version-control journal YAMLs which configure journals owned by applications under the team’s purview. For example the Gazette repository versions YAML for journals used by example applications of the repository:
# examples.journalspace.yaml declares JournalSpecs and configuration
# used by example applications of the gazette repository.
# Root "directory" of this journal YAML.
name: examples/
# Replication is the number of brokers required to participate in write
# transactions of the journal. I.e. a value of 3 means an Append RPC will proceed
# only after data has been replicated to three distinct brokers in at least two
# failure zones.
replication: 3
# Labels are key/value pairs that are attached to journals. They're intended
# to represent identifying or organizing attributes of journals which are
# meaningful to users and applications, but have no meaning to the broker itself.
labels:
- name: example-journals
# Fragment defines how the broker will map accepted writes into fragments.
fragment:
# Desired length of each journal fragment. Note fragments can be
# substantially smaller or slightly larger under normal operation, as
# journal assignments change or to ensure atomicity of writes.
length: 268435456 # 256MB.
# Stores enumerates the fragment backing stores of the journal. More
# than one store may be provided. New fragments are always persisted
# to the first store in the list, but all stores are refreshed when
# building the fragment index.
stores:
- s3://examples/fragments/?profile=minio&endpoint=http%3A%2F%2Fminio%3A9000
# Refresh interval defines the frequency with which stores are re-listed.
refresh_interval: 1m0s
# Retention is the time interval after which the fragment is eligible
# for pruning from the backing store.
retention: 720h0m0s
# Compression codec used to compress fragments. One of:
# NONE, GZIP, GZIP_OFFLOAD_DECOMPRESSION, SNAPPY, ZSTANDARD.
compression_codec: SNAPPY
# Flush interval defines the minimum frequency at which fragments are flushed.
flush_interval: 10m0s
# A path postfix is a Go template which can be used to further refine the
# path of fragments persisted under a journal path within the fragment store.
# A primary use case is to maintain a Hive-compatible partitioning of a
# journal's fragment files on the fragment's UTC creation date.
path_postfix_template: date={{ .Spool.FirstAppendTime.Format "2006-01-02" }}
children:
# Journals used by go.gazette.dev/core/examples/bike-share
- name: examples/bike-share/
labels:
- name: example-name
value: bike-share
children:
# Partitions to which completed graph cycles are written.
- name: examples/bike-share/cycles/
labels:
- name: app.gazette.dev/message-type
value: bike_share.Cycle
- name: content-type
value: application/x-ndjson
children:
- name: examples/bike-share/cycles/part-000
- name: examples/bike-share/cycles/part-001
# Recovery logs of the two example ShardSpecs which use SQLite.
# (other shards use a remote SQL store, and don't need recovery logs).
- name: examples/bike-share/recovery-logs/
labels:
- name: content-type
value: application/x-gazette-recoverylog
children:
- name: examples/bike-share/recovery-logs/cycles-part-002
- name: examples/bike-share/recovery-logs/cycles-part-003
# Partitions to which input CSV records of the dataset are written.
- name: examples/bike-share/rides/
labels:
- name: app.gazette.dev/message-type
value: bike_share.Ride
- name: content-type
value: text/csv
children:
- name: examples/bike-share/rides/part-000
- name: examples/bike-share/rides/part-001
- name: examples/bike-share/rides/part-002
- name: examples/bike-share/rides/part-003
# Journal for basic "hello world" testing and curl-based examples.
- name: examples/foobar
# Journals used by go.gazette.dev/core/examples/stream-sum
- name: examples/stream-sum/
labels:
- name: example-name
value: stream-sum
fragment:
# The stream-sum example is used for crash tests, and
# refresh_interval influences the recovery time of consumers in
# corner cases where a broker crashes on a not-quite-caught-up
# consumer. That consumer may retry its read to a broker which
# must wait for the fragment to appear in cloud storage. Use an
# aggressive interval to ensure reads are satisfied within the
# test's required SLA.
refresh_interval: 5s
children:
# Partitions to which stream Chunks are written.
- name: examples/stream-sum/chunks/
labels:
- name: app.gazette.dev/message-type
value: stream_sum.Chunk
- name: content-type
value: application/x-ndjson
children:
- name: examples/stream-sum/chunks/part-000
- name: examples/stream-sum/chunks/part-001
- name: examples/stream-sum/chunks/part-002
- name: examples/stream-sum/chunks/part-003
- name: examples/stream-sum/chunks/part-004
- name: examples/stream-sum/chunks/part-005
- name: examples/stream-sum/chunks/part-006
- name: examples/stream-sum/chunks/part-007
# Recovery logs of stream_sum.Summer ShardSpecs.
- name: examples/stream-sum/recovery-logs/
labels:
- name: content-type
value: application/x-gazette-recoverylog
children:
- name: examples/stream-sum/recovery-logs/chunks-part-000
- name: examples/stream-sum/recovery-logs/chunks-part-001
- name: examples/stream-sum/recovery-logs/chunks-part-002
- name: examples/stream-sum/recovery-logs/chunks-part-003
- name: examples/stream-sum/recovery-logs/chunks-part-004
- name: examples/stream-sum/recovery-logs/chunks-part-005
- name: examples/stream-sum/recovery-logs/chunks-part-006
- name: examples/stream-sum/recovery-logs/chunks-part-007
# Journal to which completed stream Sums are written.
- name: examples/stream-sum/sums
labels:
- name: app.gazette.dev/message-type
value: stream_sum.Sum
- name: content-type
value: application/x-ndjson
# Journals used by the word-count example.
- name: examples/word-count/
labels:
- name: example-name
value: word-count
children:
# Partitions to which NGramCounts are written.
- name: examples/word-count/deltas/
labels:
- name: app.gazette.dev/message-type
value: word_count.NGramCount
- name: content-type
value: application/x-protobuf-fixed
children:
- name: examples/word-count/deltas/part-000
- name: examples/word-count/deltas/part-001
- name: examples/word-count/deltas/part-002
- name: examples/word-count/deltas/part-003
# Recovery logs of word_count.Counter ShardSpecs.
- name: examples/word-count/recovery-logs/
labels:
- name: content-type
value: application/x-gazette-recoverylog
children:
- name: examples/word-count/recovery-logs/shard-000
- name: examples/word-count/recovery-logs/shard-001
- name: examples/word-count/recovery-logs/shard-002
- name: examples/word-count/recovery-logs/shard-003
Etcd Revisions¶
JournalSpecs retrieved by the gazctl
tool will include their respective
Etcd modification revisions as field revision
within the rendered YAML.
When applying YAML specs via gazctl journals apply
explicitly specified revision
can only exist on the leaf-level journals
of the spec. Any specs which omit revision
assume a value of zero
(implying the journal must not exist). The revisions of specs are always compared
to the current Etcd store revision,
and an apply will fail if there’s a mismatch. This prevents a gazctl journals edit
or list => modify => apply sequence from overwriting specification changes which
may have been made in the meantime.
name: examples/foobar
replication: 3
revision: 6
fragment:
... etc ...
A applied revision value -1
can be used to explicitly signal that the Etcd
store revision should be ignored. This is helpful when the desired source-of-truth
for a set of JournalSpecs is versioned source control, and where an apply
of those specs should always overwrite any existing Etcd versions.
Deleting JournalSpecs¶
One or more JournalSpecs may be deleted by adding a delete: true
stanza to the
YAML returned by gazctl
and then applying it – for example, as part of a
gazctl journals edit
workflow. A delete
stanza set on a parent node also
applies to all children.
name: examples/foobar
delete: true
revision: 6
replication: 3
fragment:
... etc ...
Once applied, brokers will immediately stop serving the journal. Note that existing journal fragments are not impacted and must be manually deleted.