Submarine Spark Security Plugin
ACL Management for Apache Spark SQL with Apache Ranger, enabling:
- Table/Column level authorization
- Row level filtering
- Data masking
Security is one of fundamental features for enterprise adoption. Apache Ranger™ offers many security plugins for many Hadoop ecosystem components, such as HDFS, Hive, HBase, Solr and Sqoop2. However, Apache Spark™ is not counted in yet. When a secured HDFS cluster is used as a data warehouse accessed by various users and groups via different applications wrote by Spark and Hive, it is very difficult to guarantee data management in a consistent way. Apache Spark users visit data warehouse only with Storage based access controls offered by HDFS. This library enables Spark with SQL Standard Based Authorization.
Build
Please refer to the online documentation - Building submarine spark security plguin
Quick Start
Three steps to integrate Apache Spark and Apache Ranger.
Installation
Place the submarine-spark-security-<version>.jar into $SPARK_HOME/jars
.
Configurations
Settings for Apache Ranger
Create ranger-spark-security.xml
in $SPARK_HOME/conf
and add the following configurations
for pointing to the right Apache Ranger admin server.
<configuration>
<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>ranger admin address like http://ranger-admin.org:6080</value>
</property>
<property>
<name>ranger.plugin.spark.service.name</name>
<value>a ranger hive service name</value>
</property>
<property>
<name>ranger.plugin.spark.policy.cache.dir</name>
<value>./a ranger hive service name/policycache</value>
</property>
<property>
<name>ranger.plugin.spark.policy.pollIntervalMs</name>
<value>5000</value>
</property>
<property>
<name>ranger.plugin.spark.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
</configuration>
Create ranger-spark-audit.xml
in $SPARK_HOME/conf
and add the following configurations
to enable/disable auditing.
<configuration>
<property>
<name>xasecure.audit.is.enabled</name>
<value>true</value>
</property>
<property>
<name>xasecure.audit.destination.db</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.url</name>
<value>jdbc:mysql://10.171.161.78/ranger</value>
</property>
<property>
<name>xasecure.audit.destination.db.password</name>
<value>rangeradmin</value>
</property>
<property>
<name>xasecure.audit.destination.db.user</name>
<value>rangeradmin</value>
</property>
</configuration>
Settings for Apache Spark
You can configure spark.sql.extensions
with the *Extension
we provided.
For example, spark.sql.extensions=org.apache.submarine.spark.security.api.RangerSparkAuthzExtension
Currently, you can set the following options to spark.sql.extensions
to choose authorization w/ or w/o
extra functions.
option | authorization | row filtering | data masking |
---|---|---|---|
org.apache.submarine.spark.security.api.RangerSparkAuthzExtension | √ | × | × |
org.apache.submarine.spark.security.api.RangerSparkSQLExtension | √ | √ | √ |