JournalSpecs

JournalSpecs declare and configure journals managed by a Gazette broker cluster. Users generally work with JournalSpecs in YAML form using the gazctl tool, in a very similar way to the management of Kubernetes resources using kubectl.

Journals names form a flat key-space. While it’s common practice to capture some semantic hierarchy in the path components of journal names, it’s important to understand that these have no particular meaning to the broker cluster.

To make it easy to work with JournalSpecs in YAML form, however, the gazctl tool converts to and from a hierarchical representation for end-user presentation. Intermediate nodes of the hierarchy are “directories” (indicated by a trailing ‘/’ in their name), and terminal nodes represent journals (which never have a trailing ‘/’).

Journal YAMLs are mapped to complete JournalSpecs by merging configuration from parent to child. In other words, where a journal provides no value for a configuration property it derives a value from its closest parent which specifies a non-zero value.

When deriving new YAML to present a set of selected JournalSpecs, gazctl “hoists” property values shared by JournalSpecs to a representative parent directory node which has a common prefix of those JournalSpecs.

This conversion happens entirely within the tool – JournalSpecs sent to or queried from brokers are full and complete individual specifications.

Example YAML

It’s common for teams to version-control journal YAMLs which configure journals owned by applications under the team’s purview. For example the Gazette repository versions YAML for journals used by example applications of the repository:

# examples.journalspace.yaml declares JournalSpecs and configuration
# used by example applications of the gazette repository.

# Root "directory" of this journal YAML.
name: examples/
# Replication is the number of brokers required to participate in write
# transactions of the journal. I.e. a value of 3 means an Append RPC will proceed
# only after data has been replicated to three distinct brokers in at least two
# failure zones.
replication: 3
# Labels are key/value pairs that are attached to journals. They're intended
# to represent identifying or organizing attributes of journals which are
# meaningful to users and applications, but have no meaning to the broker itself.
labels:
  - name: example-journals
# Fragment defines how the broker will map accepted writes into fragments.
fragment:
  # Desired length of each journal fragment. Note fragments can be
  # substantially smaller or slightly larger under normal operation, as
  # journal assignments change or to ensure atomicity of writes.
  length: 268435456 # 256MB.
  # Stores enumerates the fragment backing stores of the journal. More
  # than one store may be provided. New fragments are always persisted
  # to the first store in the list, but all stores are refreshed when
  # building the fragment index.
  stores:
    - s3://examples/fragments/?profile=minio&endpoint=http%3A%2F%2Fminio%3A9000
  # Refresh interval defines the frequency with which stores are re-listed.
  refresh_interval: 1m0s
  # Retention is the time interval after which the fragment is eligible
  # for pruning from the backing store.
  retention: 720h0m0s
  # Compression codec used to compress fragments. One of:
  # NONE, GZIP, GZIP_OFFLOAD_DECOMPRESSION, SNAPPY, ZSTANDARD.
  compression_codec: SNAPPY
  # Flush interval defines the minimum frequency at which fragments are flushed.
  flush_interval: 10m0s
  # A path postfix is a Go template which can be used to further refine the
  # path of fragments persisted under a journal path within the fragment store.
  # A primary use case is to maintain a Hive-compatible partitioning of a
  # journal's fragment files on the fragment's UTC creation date.
  path_postfix_template: date={{ .Spool.FirstAppendTime.Format "2006-01-02" }}
children:

  # Journals used by go.gazette.dev/core/examples/bike-share
  - name: examples/bike-share/
    labels:
      - name:  example-name
        value: bike-share
    children:
      # Partitions to which completed graph cycles are written.
      - name: examples/bike-share/cycles/
        labels:
          - name:  app.gazette.dev/message-type
            value: bike_share.Cycle
          - name:  content-type
            value: application/x-ndjson
        children:
          - name: examples/bike-share/cycles/part-000
          - name: examples/bike-share/cycles/part-001
      # Recovery logs of the two example ShardSpecs which use SQLite.
      # (other shards use a remote SQL store, and don't need recovery logs).
      - name: examples/bike-share/recovery-logs/
        labels:
          - name:  content-type
            value: application/x-gazette-recoverylog
        children:
          - name: examples/bike-share/recovery-logs/cycles-part-002
          - name: examples/bike-share/recovery-logs/cycles-part-003
      # Partitions to which input CSV records of the dataset are written.
      - name: examples/bike-share/rides/
        labels:
          - name:  app.gazette.dev/message-type
            value: bike_share.Ride
          - name:  content-type
            value: text/csv
        children:
          - name: examples/bike-share/rides/part-000
          - name: examples/bike-share/rides/part-001
          - name: examples/bike-share/rides/part-002
          - name: examples/bike-share/rides/part-003

  # Journal for basic "hello world" testing and curl-based examples.
  - name: examples/foobar

  # Journals used by go.gazette.dev/core/examples/stream-sum
  - name: examples/stream-sum/
    labels:
      - name:  example-name
        value: stream-sum
    fragment:
      # The stream-sum example is used for crash tests, and
      # refresh_interval influences the recovery time of consumers in
      # corner cases where a broker crashes on a not-quite-caught-up
      # consumer. That consumer may retry its read to a broker which
      # must wait for the fragment to appear in cloud storage. Use an
      # aggressive interval to ensure reads are satisfied within the
      # test's required SLA.
      refresh_interval:  5s
    children:
      # Partitions to which stream Chunks are written.
      - name: examples/stream-sum/chunks/
        labels:
          - name:  app.gazette.dev/message-type
            value: stream_sum.Chunk
          - name:  content-type
            value: application/x-ndjson
        children:
          - name: examples/stream-sum/chunks/part-000
          - name: examples/stream-sum/chunks/part-001
          - name: examples/stream-sum/chunks/part-002
          - name: examples/stream-sum/chunks/part-003
          - name: examples/stream-sum/chunks/part-004
          - name: examples/stream-sum/chunks/part-005
          - name: examples/stream-sum/chunks/part-006
          - name: examples/stream-sum/chunks/part-007
      # Recovery logs of stream_sum.Summer ShardSpecs.
      - name: examples/stream-sum/recovery-logs/
        labels:
          - name:  content-type
            value: application/x-gazette-recoverylog
        children:
          - name: examples/stream-sum/recovery-logs/chunks-part-000
          - name: examples/stream-sum/recovery-logs/chunks-part-001
          - name: examples/stream-sum/recovery-logs/chunks-part-002
          - name: examples/stream-sum/recovery-logs/chunks-part-003
          - name: examples/stream-sum/recovery-logs/chunks-part-004
          - name: examples/stream-sum/recovery-logs/chunks-part-005
          - name: examples/stream-sum/recovery-logs/chunks-part-006
          - name: examples/stream-sum/recovery-logs/chunks-part-007
      # Journal to which completed stream Sums are written.
      - name: examples/stream-sum/sums
        labels:
          - name:  app.gazette.dev/message-type
            value: stream_sum.Sum
          - name:  content-type
            value: application/x-ndjson

  # Journals used by the word-count example.
  - name: examples/word-count/
    labels:
      - name:  example-name
        value: word-count
    children:
      # Partitions to which NGramCounts are written.
      - name: examples/word-count/deltas/
        labels:
          - name:  app.gazette.dev/message-type
            value: word_count.NGramCount
          - name:  content-type
            value: application/x-protobuf-fixed
        children:
          - name: examples/word-count/deltas/part-000
          - name: examples/word-count/deltas/part-001
          - name: examples/word-count/deltas/part-002
          - name: examples/word-count/deltas/part-003
      # Recovery logs of word_count.Counter ShardSpecs.
      - name: examples/word-count/recovery-logs/
        labels:
          - name:  content-type
            value: application/x-gazette-recoverylog
        children:
          - name: examples/word-count/recovery-logs/shard-000
          - name: examples/word-count/recovery-logs/shard-001
          - name: examples/word-count/recovery-logs/shard-002
          - name: examples/word-count/recovery-logs/shard-003

Etcd Revisions

JournalSpecs retrieved by the gazctl tool will include their respective Etcd modification revisions as field revision within the rendered YAML.

When applying YAML specs via gazctl journals apply explicitly specified revision can only exist on the leaf-level journals of the spec. Any specs which omit revision assume a value of zero (implying the journal must not exist). The revisions of specs are always compared to the current Etcd store revision, and an apply will fail if there’s a mismatch. This prevents a gazctl journals edit or list => modify => apply sequence from overwriting specification changes which may have been made in the meantime.

name: examples/foobar
replication: 3
revision: 6
fragment:
    ... etc ...

A applied revision value -1 can be used to explicitly signal that the Etcd store revision should be ignored. This is helpful when the desired source-of-truth for a set of JournalSpecs is versioned source control, and where an apply of those specs should always overwrite any existing Etcd versions.

Deleting JournalSpecs

One or more JournalSpecs may be deleted by adding a delete: true stanza to the YAML returned by gazctl and then applying it – for example, as part of a gazctl journals edit workflow. A delete stanza set on a parent node also applies to all children.

name: examples/foobar
delete: true
revision: 6
replication: 3
fragment:
    ... etc ...

Once applied, brokers will immediately stop serving the journal. Note that existing journal fragments are not impacted and must be manually deleted.