This document offers some recommendations based on lessons learned from running Gazette in production at LiveRamp.
At LiveRamp, Gazette is deployed entirely in the cloud: both clients and brokers run on a Kubernetes cluster whose underlying nodes are spread across multiple zones, and journal content is persisted to cloud storage. While relying on cloud infrastructure makes it easy to scale usage up and down as needed, it can also make the task of predicting costs more difficult, especially for high-traffic systems like ours that deal with data on the order of terabytes each month.
- For non-recovery log journals, define retention policies for each category of data according to its usage requirements, and then enforce these by configuring lifecycle policies via the cloud storage provider. For example, one could have a bucket’s files be transitioned to cold storage X days after creation (which would keep the data backed up but not immediately accessible for a much lower cost than standard storage), and then finally have it transition to expiry.
- For backing stores that do not support lifecycle policies, and for more fine- grained control over retention, use gazctl journals prune. By invoking gazctl journals prune -l my-selector, one can have all matching journal fragments that match the label selector and are older than the current configured retention (as specified by the fragment.retention property in the JournalSpec) be deleted across all configured backing stores. This could be automated via a cron or a regular Kubernetes job. Note that the tradeoff of adopting this approach over (1) is that it entails additional API operations for the deletes, which may cost more depending on what store is being used.
- For recovery log journals, use the gazctl shards prune command-line tool periodically to delete fragments that are no longer needed. Note that a simple time-based lifecycle policy like what is described in (1) will not work for recovery logs, because historical portions of the recovery log may still contain current database content. To identify whether a recovery log fragment is still needed, we have to examine the shard’s current hints (which is how gazctl shards prune works), and the age of the fragment is not relevant.
Data Transfer Costs¶
If clients and brokers are spread across multiple availability zones or even regions, consider the costs incurred by inter-zone traffic. In the case of writes, the client must issue its write to the current primary broker of the journal, regardless of whether that primary is in the same zone as the client. The primary in turn replicates the data to all its peers, again regardless of their zones. All journals are assigned to brokers across more than one zone to ensure the durability of writes in the face of zone failures, which means that a write will always lead to inter-zone traffic.
Cross-zone data transfers are a price that must be payed for highly durable appends, but Gazette does attempt to minimize cross-zone transfers incurred at read time by having clients prefer to read from a replica that is in the same zone as itself. This preference is automatically specified if one uses the offical helm charts for deployment.
In those charts, we supply the Gazette client with the preferred zone (i.e. the zone that the client itself is in) by populating the consumer.zone flag with the correct zone: https://github.com/gazette/core/blob/master/charts/consumer/templates/deployment.yaml#L54. (That zone is generated by a script, node-zone.sh, that takes a Kubernetes node name as input and gives the zone of that node as output.) We also do the same for the Gazette broker by populating the broker.zone flag: https://github.com/gazette/core/blob/master/charts/gazette/templates/deployment.yaml#L47. If one is not using the official helm charts and wishes to have a zone-aware read strategy be enforced, then one would have to ensure that the consumer.zone and broker.zone flags are correctly set during deployment.