Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

Select or create a user account to be used as principal.
This should not be the kafka or spark service account.
Generate a keytab for the user.
Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

The following example specifies keytab location ./v.keytab for principal vagrant@example.com:
```
KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="./v.keytab"
   storeKey=true
   useTicketCache=false
   serviceName="kafka"
   principal="vagrant@EXAMPLE.COM";
};
```

In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --filesoption, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:

spark-submit \
    --files key.conf#key.conf,v.keytab#v.keytab \
    --driver-java-options "-Djava.security.auth.login.config=./key.conf" \
    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \
...

Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
```
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL
```

Parent topic: Using Spark Streaming

posted @ 2019-03-19 14:39 大数据从业者FelixZh 阅读(336) 评论(0) 编辑收藏举报

刷新页面返回顶部

大数据从业者

最新文章，见微信公众号：大数据从业者

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

公告