Zookeeper
Astra uses Zookeeper as a metadata store accessible by all components, with Apache Curator recipes wrapping all major node operations.
Recommended architecture
- Five nodes in quorum
- Observers serving all direct client traffic
- Quorum members excluded from DNS, only serving forwarded observer requests
flowchart TD
astra[Astra]
astra -- astra-zookeeper-observers.internal --> observers
subgraph observers[observers]
obs1[observer]
obs2[observer]
obs3[observer]
obs4[observer]
obs5[observer]
end
observers --> quorum
subgraph quorum
fol1[follower]
fol2[follower]
fol3[follower]
fol4[follower]
leader{{leader}}
end
Recommended configs
znode.container.maxNeverUsedIntervalMs
This is the amount of time a container can exist without children before it is eligible for deleting. This happens when a node crashes while attempting to create a znode, and only the parent is left (partitioned metadata stores).
znode.container.maxNeverUsedIntervalMs=10000znode.container.maxNeverUsedIntervalMs : (Java system property only) New in 3.6.0: The maximum interval in milliseconds that a container that has never had any children is retained. Should be long enough for your client to create the container, do any needed work and then create children. Default is “0” which is used to indicate that containers that have never had any children are never deleted.
https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_performance_options
Note this is a Java system property, and must be set similar to the following:
exec java -cp "$CLASSPATH" \
-Dznode.container.maxNeverUsedIntervalMs=10000 \
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xms2g -Xmx2g \
org.apache.zookeeper.server.quorum.QuorumPeerMain "$@"Troubleshooting
jute.maxbuffer
Zookeeper is designed for small files, and not a large amount of them per path. This is enforced with a file size limit, that will return an error when attempting to read values larger than this configured amount. This error will typically occur when attempting to list children on a specific path, and can exceed the configure jute.maxbuffer.
The default jute.maxbuffer value for Zookeeper is 1MB. Changes to this limit should be made on both the server and clients. For additional documentation, Solr provides an excellent writeup about this - https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html.