Wednesday 3 August 2016

What is Zookeper ?

ZooKeeper is a high-performance coordination service for distributed applications(like HBase). It exposes common services like naming, configuration management, synchronization, and group services, in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.
1. At a High level

ZooKeeper is a service for sure - to which clients can connect to. It provides access to clients to a tree like structure  or a hierarchical namespace as ZooKeeper documentation says. So why we need this tree ? Of course for storing data and hence it is called a “data tree”. On that tree you can do the whole set of CRUD operations, Plus you can use GET and SET operations for data manipulations. It uses the standard UNIX notation for file system paths. For example, we use /A/B/C to denote the path to zNode C, where C has B as its parent and B has A as its parent. Here is a sample:




That looks very much like a UNIX file system, right?  But here is the terminology
1.    Each node in the tree is called a zNode
2.    Every zNode in the tree is identified by a path
3.   zNode types – persistent and ephemeral
4.    Each zNode will store a value or data and may be child nodes
5.    Cannot rename znodes
6.    We can add/remove WATCH  to znodes.
That’s it, someone rightly said ZooKeeper is Feature-light and you could build recipes on top of it. Take a look at netflix’s curator framework (now a Apache project) – some said this makes using ZooKeeper so much easy from a client perspective. WATCHes are interesting - you can SET a WATCH for a zNode path to let you know if something changed. Kind of subscribing to changes on a path.

2. One level deeper

So lets consider another example where goal is to store some configurations in a <K,V> format and make it available across cluster of machines. The <K,V> should be persistent aka disk based and should be HA or Replicated and fault tolerant. ZooKeeper is a 'natural' for  this use case.




Role of Zookeeper in Hbase :

1.   HBase relies completely on Zookeeper. HBase provides you the option to use its built-in Zookeeper which will get started whenever you start HBAse. But it is not good if you are working on a production cluster. In such scenarios it's always good to have a dedicated Zookeeper cluster and integrate it with your HBase cluster.

2. A distributed HBase relies completely on Zookeeper (for cluster configuration and management). In Apache HBase, ZooKeeper coordinates, communicates, and shares state between the Masters and RegionServers. HBase has a design policy of using ZooKeeper only for transient data (that is, for coordination and state communication). Thus if the HBase’s ZooKeeper data is removed, only the transient operations are affected — data can continue to be written and read to/from HBase.

Technically speaking: By default HBase manages zookeeper itself i.e. starting and stopping the zookeeper quorum (the cluster of zookeeper nodes) when we start and stop HBase - to verify the settings look into the file conf/hbase-evn.sh (in your hbase directory) there must be a line:

Q .Why number of node in zookeeper should be odd ?
Ans .

Zookeeper requires that you have a quorum of servers up, where quorum is ceil(N/2). For a 3 server ensemble, that means 2 servers must be up at any time, for a 5 server ensemble, 3 servers need to be up at any time.

Reliability:

A single ZooKeeper server (standalone) is essentially a coordinator with
no reliability (a single serving node failure brings down the ZK service).

A 3 server ensemble (you need to jump to 3 and not 2 because ZK works
based on simple majority voting) allows for a single server to fail and
the service will still be available.

So if you want reliability go with at least 3. We typically recommend
having 5 servers in "online" production serving environments. This allows
you to take 1 server out of service (say planned maintenance) and still
be able to sustain an unexpected outage of one of the remaining servers

w/o interruption of the service.

Q 2 .Why number of nodes are odd in zookeeper ?
Ans .Due to voting mechanism.

No comments:

Post a Comment