We deployed Apache Zeppelin 0.7.0 for the Kerberos secured Hadoop cluster, and my dear colleague cannot use it correctly, so I have to find out why he can’t use anything in Zeppelin, except shell command.

I start with Kerberized Hive

ERROR [2017-05-03 23:49:10,603] ({qtp1128883028-1682} NotebookServer.java[afterStatusChange]:2018) - Error
org.apache.zeppelin.interpreter.InterpreterException: paragraph_1493825452456_506278578's Interpreter hive not found
        at org.apache.zeppelin.notebook.Note.run(Note.java:572)
        at org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1626)
        at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:1600)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:263)
        at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
        at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
        at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
        at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
        at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
        at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:745)

Two things caused this problem:
1. Privilege, Zeppelin must download some jars from repo.maven.org with using interpreters insides, it called dependencies. I packeged zeppelin as an RPM destribution, so the installation path is /usr/lib/zeppelin, and the download path should be /usr/lib/zeppelin/local-repo by default. But this path is owned by root, so chown the local-repo to zeppelin:zeppelin will resolve this issue.
2. There is not hive interpreter any more, in official documentations, the Hive interpreter was deprecated for a long time, it use jdbc instead. My dear colleague do not read any document, and create an interpreter named hive, so it’s gone wrong.

The right way is doing like this, put the config in jdbc area.

And you should use it like
%jdbc(hive)
show databases

Then, I must resolve the Kerberized Hive in Zeppelin, I google-ed, but there are no useful information, it seems no one doing this before.
I add hive-jdbc and hadoop-common as the official document, but it doesn’t work.
Hive interpreter
logs give me a warn

 WARN [2017-05-03 23:57:48,982] ({pool-2-thread-9} NotebookServer.java[afterStatusChange]:2026) - Job 20170503-234910_1513582802 is finished, status: ERROR, exception: null, result: %text org.apache.zeppelin.interpreter.InterpreterException: Could not load shims in class org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge23
java.lang.ClassNotFoundException: org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge23
        at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:413)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:561)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:660)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

shims? well, just add shims in jdbc->dependencies area, but, still not work, give me anothor error

ERROR [2017-05-04 00:14:23,725] ({qtp1128883028-1804} NotebookServer.java[onMessage]:355) - Can't handle message
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.cancel(RemoteInterpreter.java:367)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:100)
        at org.apache.zeppelin.notebook.Paragraph.jobAbort(Paragraph.java:457)
        at org.apache.zeppelin.scheduler.Job.abort(Job.java:236)
        at org.apache.zeppelin.socket.NotebookServer.cancelParagraph(NotebookServer.java:1536)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:269)
        at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
        at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
        at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
        at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
        at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
        at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_cancel(RemoteInterpreterService.java:291)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.cancel(RemoteInterpreterService.java:276)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.cancel(RemoteInterpreter.java:364)
        ... 21 more

Then I add hive-exec.jar into dependencies, then zeppelin died… remove hive-exec and restart zeppelin…

The final error is a hadoop proxy problem, I use zin as a hive authentication account, and add zeppelin user to all nodes and create a zeppelin keytab file, and then it give me this error

org.apache.hive.service.cli.HiveSQLException: Failed to validate proxy privilege of zeppelin for admin
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:232)
        at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:491)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:187)
        at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
        at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
        at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
        at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:233)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:351)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:385)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:561)
        at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:660)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to validate proxy privilege of zin for admin
        at org.apache.hive.service.auth.HiveAuthFactory.verifyProxyAccess(HiveAuthFactory.java:402)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getProxyUser(ThriftCLIService.java:750)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getUserName(ThriftCLIService.java:384)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:411)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:316)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:746)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        ... 3 more
Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User: zin is not allowed to impersonate admin

This is so easy, just like hue or other applications, this user is not set as a hadoop proxy user/group in core-site, I think until I add a hadoop proxy user it will work, but I don’t know how to add a new config in FUCKING Cloudera Manager, so I give up, and use an exist user named hive to access zeppelin.
So the zeppelin config is  like this finally.

And with kerberized spark on yarn?
Just add following config in spark-default.conf
spark.yarn.principal user/host@PG.COM
spark.yarn.keytab /etc/hadoop/conf.empty/user.keytab

Finally, I can sleep.



发表评论

*
*

Required fields are marked *