Using the PK Protect Azure Cloud IDP, user can provision the HDInsight cluster to access Azure data. The Azure Cloud IDP is used for browsing the Azure storage and provisioning the cluster. The Azure Data IDP allows PK Protect to run data detection and protection tasks on the Azure repository.
Access the Azure Configuration screen by clicking the Azure Configuration option in the left side pane. The Azure Configuration screen is depicted below:
The top panel displays the cluster information along with its status. Clicking on the configuration populates the bottom panel with cluster details.
Perform the following steps to configure a cluster:
Click Configure New Cluster.
The Add/Edit Configuration dialog box will appear.
The fields are described below:
Cluster Name: Enter the name you want to provide to the cluster.
Cluster Login Username: Specify the username used to log in to the HDInsight cluster.
Cluster Login Password: Specify the password to the cluster.
SSH Username: Used to access the cluster via Secure Shell (SSH). Specify the SSH Username.
SSH Authentication Type: You can authenticate SSH either through a password or a public key. If you wish to use the Cluster Login Password as SSH Password, then check the Use same password as cluster login checkbox.
*Note: The HDInsight cluster can have two user accounts: HDInsight cluster user account and SSH user account.
Location: Specify the cluster location.
Virtual Network: Specify the Virtual Network where the HDInsight cluster is deployed.
Subnet: Define the sub network TCP/IP address. Specify this value when the cluster is created on a specific subnet.
Number of worker nodes: Specify the number of worker nodes.
Worker node size: Specify the VM size of the worker nodes.
Head node size: Specify the VM size of the head nodes.
Edge node size: Specify the VM size of the edge nodes.
Resource group: Resource group is a set of cluster resources. Specify the resource group that the cluster will be part of.
Default: Check the Default checkbox if you wish to make the account as the default storage account.
Azure Cloud IDP: Select the Azure Cloud IDP.
Storage Type: Select the storage type from the dropdown. HDInsight clusters can use the following storage options
Azure Data Lake Storage Gen1
Azure Data Lake Storage Gen2
*Note: There should be at least one storage type. You are allowed to select any of the above listed Storage Types as the primary storage, but you can only select Azure Storage as the secondary storage in case you want to add additional storage.
Storage Accounts: Select the storage account that will be used for the cluster logs.
Container: Storage accounts consist of multiple containers (folders). Specify the name of the container that will be used to store logs.
Add: Click Add to add the storage account.
Perform the following steps to provision a cluster with the given configuration:
Select the desired configuration.
Click Provision Compute Cluster.
Perform the following steps to edit a cluster:
Select the configuration you want to edit.
Click Edit Cluster.
Click Save after making the desired changes.
Perform the following steps to destroy a cluster:
Select the desired configuration you want to destroy.
Click Destroy Compute Cluster.