Skip to main content

Appendix G: Enabling Spark in the HDFS IDP

The Hadoop Data IDP (HDFS IDP) now works with Spark in Spark-enabled clusters for Detection. Protection is not enabled to use Spark in 6.5.0 for Spark-enabled clusters. Both Detection and Protection are supported as before, with MapReduce, whether the cluster is Spark-enabled or not.

To enable Spark integration, the following steps are to be performed.

At Install Time

  1. Installer asks for location where HDFS IDP must be installed:

  2. Select Option 6 (“Spark”) shown below:

  3. Select Hortonworks or EMR options as below:


  4. Proceed as normal with installation.


In the HDFSIDPConfig.properties File:

For Spark with Hortonworks, the distro property will be set to spark by the installer.

For EMR, the distro property will be set to EMR and the s3filesystem property to s3a. If the user needs to use EMR for Hadoop HDFS instead of for S3 processing, the s3filesystem property needs to be set to hdfs, and the IDP needs to be restarted.

In the jetty-embedded.properties File (Hortonworks only)

The appropriate Hortonworks version number needs to be set, as follows (version # below is an example.)

-Dspark.driver.extraJavaOptions=-Dhdp.version=2.4.2.0-258

 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.