Version: 0.6.0

Submarine Spark Security Plugin

ACL Management for Apache Spark SQL with Apache Ranger, enabling:

  • Table/Column level authorization
  • Row level filtering
  • Data masking

Security is one of fundamental features for enterprise adoption. Apache Ranger™ offers many security plugins for many Hadoop ecosystem components, such as HDFS, Hive, HBase, Solr and Sqoop2. However, Apache Spark™ is not counted in yet. When a secured HDFS cluster is used as a data warehouse accessed by various users and groups via different applications wrote by Spark and Hive, it is very difficult to guarantee data management in a consistent way. Apache Spark users visit data warehouse only with Storage based access controls offered by HDFS. This library enables Spark with SQL Standard Based Authorization.

Build#

Please refer to the online documentation - Building submarine spark security plguin

Quick Start#

Three steps to integrate Apache Spark and Apache Ranger.

Installation#

Place the submarine-spark-security-<version>.jar into $SPARK_HOME/jars.

Configurations#

Settings for Apache Ranger#

Create ranger-spark-security.xml in $SPARK_HOME/conf and add the following configurations for pointing to the right Apache Ranger admin server.

<configuration>
<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>ranger admin address like http://ranger-admin.org:6080</value>
</property>
<property>
<name>ranger.plugin.spark.service.name</name>
<value>a ranger hive service name</value>
</property>
<property>
<name>ranger.plugin.spark.policy.cache.dir</name>
<value>./a ranger hive service name/policycache</value>
</property>
<property>
<name>ranger.plugin.spark.policy.pollIntervalMs</name>
<value>5000</value>
</property>
<property>
<name>ranger.plugin.spark.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
</configuration>

Create ranger-spark-audit.xml in $SPARK_HOME/conf and add the following configurations to enable/disable auditing.

<configuration>
<property>
<name>xasecure.audit.is.enabled</name>
<value>true</value>
</property>
<property>
<name>xasecure.audit.destination.db</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.url</name>
<value>jdbc:mysql://10.171.161.78/ranger</value>
</property>
<property>
<name>xasecure.audit.destination.db.password</name>
<value>rangeradmin</value>
</property>
<property>
<name>xasecure.audit.destination.db.user</name>
<value>rangeradmin</value>
</property>
</configuration>

Settings for Apache Spark#

You can configure spark.sql.extensions with the *Extension we provided. For example, spark.sql.extensions=org.apache.submarine.spark.security.api.RangerSparkAuthzExtension

Currently, you can set the following options to spark.sql.extensions to choose authorization w/ or w/o extra functions.

optionauthorizationrow filteringdata masking
org.apache.submarine.spark.security.api.RangerSparkAuthzExtension××
org.apache.submarine.spark.security.api.RangerSparkSQLExtension