Page 2 of 4
1 2 3 4

Use kerberized Hive in Zeppelin

We deployed Apache Zeppelin 0.7.0 for the Kerberos secured Hadoop cluster, and my dear colleague cannot use it correctly, so I have to find out why he can’t use anything in Zeppelin, except shell command.

I start with Kerberized Hive Continue reading Use kerberized Hive in Zeppelin

Troubleshooting kerberized hive issues

Today, my colleagues want to use hive in zeppelin, it’s the first time to use hive in this new kerberized cluster, and unfortunately there was an authenticate issue of using hive. So I have to debug on it.

The hive client was installed hadoop-client and hive and put all the needed keytabs in config dirs and set the right permission of their all, but still could not connect to the cluster. The log always shows authentication failed. Continue reading Troubleshooting kerberized hive issues

Enable Kerberos secured Hadoop cluster with Cloudera Manager

I created a secured Hadoop cluster for P&G with cloudera manager, and this document is to record how to enable kerberos secured cluster with cloudera manager. Firstly we should have a cluster that contains kerberos KDC and kerberos clients Continue reading Enable Kerberos secured Hadoop cluster with Cloudera Manager

Dr.Elephant mysql connection error

This is the first time I try to use english to write my blog, so don’t jeer at the mistake of my grammar and spelling.

Because of multi threaded drelephant will cause JobHistoryServer’s Loads very high, so I stopped it for a strench of time. Until last week, a period pull from JHS patch merge request from github was released. I re-compiled dr. elephant and deploy the new dr. elephant on the cluster. It seems stable, but on this Monday morning, my leader told me that there were no more counters and any information about cluster jobs in dr. elephant.  So I logged in to the server, and check log, then I found this message below. Continue reading Dr.Elephant mysql connection error

试用时间序列数据库InfluxDB

Hadoop集群监控需要使用时间序列数据库,今天花了半天时间调研使用了一下最近比较火的InfluxDB,发现还真是不错,记录一下学习心得。

Influx是用Go语言写的,专为时间序列数据持久化所开发的,由于使用Go语言,所以各平台基本都支持。类似的时间序列数据库还有OpenTSDB,Prometheus等。 Continue reading 试用时间序列数据库InfluxDB

Hadoop监控分析工具Dr.Elephant

公司基础架构这边想提取慢作业和获悉资源浪费的情况,所以装个dr elephant看看。LinkIn开源的系统,可以对基于yarn的mr和spark作业进行性能分析和调优建议。

DRE大部分基于java开发,spark监控部分使用scala开发,使用play堆栈式框架。这是一个类似Python里面Django的框架,基于java?scala?没太细了解,直接下来就能用,需要java1.8以上。 Continue reading Hadoop监控分析工具Dr.Elephant

Apache Bigtop与卖书求生

快一年没写博客了,终于回来了,最近因公司业务需要,要基于cdh发行版打包自定义patch的rpm,于是又搞起了bigtop,就是那个hadoop编译打包rpm和deb的工具,由于国内基本没有相关的资料和文档,所以觉得有必要把阅读bigtop源码和修改的思路分享一下。 Continue reading Apache Bigtop与卖书求生

Hadoop运维记录系列(十七)

上个月通过email,帮朋友的朋友解决了一个Cloudera的Spark-SQL无法访问HBase做数据分析的问题,记录一下。

首先,对方已经做好了Hive访问HBase,所以spark-sql原则上可以通过调用Hive的元数据来访问Hbase。但是执行极慢,而且日志无报错。中间都是邮件沟通,先问了几个问题,是否启用了Kerberos,是否Hive访问Hbase正常,HBase shell访问数据是否正常等等,回答说没有用Kerberos,Hive访问Hbase正常,spark-sql读取Hive元数据也正常,Hbase shell也正常,就是spark-sql跑不了。 Continue reading Hadoop运维记录系列(十七)

Hadoop运维记录系列(十六)

应了一个国内某电信运营商集群恢复的事,集群故障很严重,做了HA的集群Namenode挂掉了。具体过程不详,但是从受害者的只言片语中大概回顾一下历史的片段。

  1. Active的namenode元数据硬盘满了,满了,满了…上来第一句话就如雷贯耳。
  2. 运维人员发现硬盘满了以后执行了对active namenode的元数据日志执行了 echo “” > edit_xxxx-xxxx…第二句话如五雷轰顶。
  3. 然后发现standby没法切换,切换也没用,因为standby的元数据和日志是5月份的…这个结果让人无法直视。

Continue reading Hadoop运维记录系列(十六)

使用flume替代原有的scribe服务

以前很多业务都是用scribe做日志收集的支撑的,后来fb停止了对scribe的开发支持。而且scribe在机器上编译一次的代价太大了,各种坑,正好后来flume从1.3.0开始加入了对scribe的支持。就可以把原来scribe上面接入的数据转用flume收集了。虽然我很喜欢scribe,但是失去了官方支持毕竟还是很闹心的。 Continue reading 使用flume替代原有的scribe服务

Page 2 of 4
1 2 3 4