Kerberos Master/Slave HA configuration

Since we only have one KDC on our cluster, it will be an SPOF (Single Point of Failure), so I have to create a Master/Slave KDC to avoid this problem.

There would be some steps to convert SP to HA.

Description
master2.hadoop is existence KDC previously, master1.hadoop will install a new KDC server

  1. Install KDC on new node(master1.hadoop).
    yum -y install krb5-server
  2. Change config file on origin KDC(master2.hadoop)

    [libdefaults]
    default_realm = PG.COM
    dns_lookup_kdc = false
    dns_lookup_realm = false
    ticket_lifetime = 7d
    renew_lifetime = 30d
    forwardable = true
    #default_tgs_enctypes = rc4-hmac
    #default_tkt_enctypes = rc4-hmac
    #permitted_enctypes = rc4-hmac
    udp_preference_limit = 1
    kdc_timeout = 3000
    [realms]
    PG.COM =
    {
    kdc = master2.hadoop
    kdc = master1.hadoop
    admin_server = master2.hadoop
    }
    [logging]
    default = FILE:/var/log/krb5kdc.log
    admin_server = FILE:/var/log/kadmind.log
    kdc = FILE:/var/log/krb5kdc.log

    Red block are very important on centos 6, orange block is the new line added

  3. On new node(master1.hadoop)
    scp master2.hadoop:/var/kerberos/krb5kdc/kdc.conf /var/kerberos/krb5kdc/
    scp master2.hadoop:/var/kerberos/krb5kdc/kadm5.acl /var/kerberos/krb5kdc/
    scp master2.hadoop:/var/kerberos/krb5kdc/.k5.PG.COM /var/kerberos/krb5kdc/
    scp master2.hadoop:/etc/krb5.conf /etc/
    kadmin
    : ank host/master1.hadoop
    : xst host/master1.hadoop
  4. On old node(master2.hadoop)
    kadmin
    : ank host/master2.hadoop
    : xst host/master2.hadoop
  5. And then back to new node(master1.hadoop)

    vi /var/kerberos/krb5kdc/kpropd.acl
    and insert two lines

    host/master1.hadoop@PG.COM
    host/master2.hadoop@PG.COM

    and then

    kdb_util stash
    kpropd -S
  6. Jump to old node(master2.hadoop)
    kdb_util dump /var/kerberos/krb5kdc/kdc.dump
    kprop -f /var/kerberos/krb5kdc/kdc.dump master1.hadoop

    When see “Database propagation to master1.hadoop: SUCCEEDED”, it means all the work have done well enough, and the slave should be start now.

  7. Last step on new node(master1.hadoop)
    service krb5kdc start

    The meaning of red block in step two is:
    Cenots 6.x with Kerberos 1.10.x had a bug that will cause sync kdb failed, the issue is there is a problem when you use rc4 as the default enctype. So you must comment the to avoid this happen. kprop doesn’t works with rc4 encrypt type.

    https://github.com/krb5/krb5/commit/8d01455ec9ed88bd3ccae939961a6e123bb3d45f

    It fixed on kerberos 1.11.1

    finally: of course you should restart kdc and kadmin services