Kerberos Master/Slave HA configuration

Since we only have one KDC on our cluster, it will be an SPOF (Single Point of Failure), so I have to create a Master/Slave KDC to avoid this problem.

There would be some steps to convert SP to HA.

Description
master2.hadoop is existence KDC previously, master1.hadoop will install a new KDC server

  1. Install KDC on new node(master1.hadoop).
    yum -y install krb5-server
  2. Change config file on origin KDC(master2.hadoop)

    [libdefaults]
    default_realm = PG.COM
    dns_lookup_kdc = false
    dns_lookup_realm = false
    ticket_lifetime = 7d
    renew_lifetime = 30d
    forwardable = true
    #default_tgs_enctypes = rc4-hmac
    #default_tkt_enctypes = rc4-hmac
    #permitted_enctypes = rc4-hmac
    udp_preference_limit = 1
    kdc_timeout = 3000
    [realms]
    PG.COM =
    {
    kdc = master2.hadoop
    kdc = master1.hadoop
    admin_server = master2.hadoop
    }
    [logging]
    default = FILE:/var/log/krb5kdc.log
    admin_server = FILE:/var/log/kadmind.log
    kdc = FILE:/var/log/krb5kdc.log

    Red block are very important on centos 6, orange block is the new line added

  3. On new node(master1.hadoop)
    scp master2.hadoop:/var/kerberos/krb5kdc/kdc.conf /var/kerberos/krb5kdc/
    scp master2.hadoop:/var/kerberos/krb5kdc/kadm5.acl /var/kerberos/krb5kdc/
    scp master2.hadoop:/var/kerberos/krb5kdc/.k5.PG.COM /var/kerberos/krb5kdc/
    scp master2.hadoop:/etc/krb5.conf /etc/
    kadmin
    : ank host/master1.hadoop
    : xst host/master1.hadoop
  4. On old node(master2.hadoop)
    kadmin
    : ank host/master2.hadoop
    : xst host/master2.hadoop
  5. And then back to new node(master1.hadoop)

    vi /var/kerberos/krb5kdc/kpropd.acl
    and insert two lines

    host/master1.hadoop@PG.COM
    host/master2.hadoop@PG.COM

    and then

    kdb_util stash
    kpropd -S
  6. Jump to old node(master2.hadoop)
    kdb_util dump /var/kerberos/krb5kdc/kdc.dump
    kprop -f /var/kerberos/krb5kdc/kdc.dump master1.hadoop

    When see “Database propagation to master1.hadoop: SUCCEEDED”, it means all the work have done well enough, and the slave should be start now.

  7. Last step on new node(master1.hadoop)
    service krb5kdc start

    The meaning of red block in step two is:
    Cenots 6.x with Kerberos 1.10.x had a bug that will cause sync kdb failed, the issue is there is a problem when you use rc4 as the default enctype. So you must comment the to avoid this happen. kprop doesn’t works with rc4 encrypt type.

    https://github.com/krb5/krb5/commit/8d01455ec9ed88bd3ccae939961a6e123bb3d45f

    It fixed on kerberos 1.11.1

    finally: of course you should restart kdc and kadmin services

试用时间序列数据库InfluxDB

Hadoop集群监控需要使用时间序列数据库,今天花了半天时间调研使用了一下最近比较火的InfluxDB,发现还真是不错,记录一下学习心得。

Influx是用Go语言写的,专为时间序列数据持久化所开发的,由于使用Go语言,所以各平台基本都支持。类似的时间序列数据库还有OpenTSDB,Prometheus等。 Continue reading 试用时间序列数据库InfluxDB

写几个Hadoop部署用到的小脚本

最近抛弃非ssh连接的hadoop集群部署方式了,还是回到了用ssh key 验证的方式上了。这里面就有些麻烦,每台机器都要上传公钥。恰恰我又是个很懒的人,所以写几个小脚本完成,只要在一台机器上面就可以做公钥的分发了。 Continue reading 写几个Hadoop部署用到的小脚本