To create a task in HDFS, go to Hadoop > HDFS > Tasks. This displays the Task Definition screen. To create a new task, click the Add New Task Definition tab. The following screenshot shows the user interface for creating a task.
Enter a meaningful task name in the Task Name field and a description in the Task Description field. The task name must be unique to the task. It can be up to 256 characters and consist of letters, numbers, certain symbols (! @ # $ _), without any spaces. The definition can be if the task name and contain any combination of numbers, letters, and symbols.
Select the attribute name from the Task Attribute drop-down. This option allows to add tags to the created task.
Choose a Task Type. There are seven types of task that user can create.
The Manage Scan Locations panel lets you select the directories for scanning. This panel displays the two tabs i.e., Include In Scan and Exclude From Scan.
*Note: To decrypt tasks, ensure that appropriate roles have been assigned before executing the
encrypting task. To know more, refer section Role Management.
Include Files that Failed Previously checkbox will be greyed out for a fresh scan. Check this option if some elements of a previously executed task got skipped or the task was completed with errors. This option is available only for detection tasks.
Delete Input Files on Job Completion option is available for Masking/Field Encryption and Row encryption task types. Check this checkbox to delete the input files after task has been executed successfully.
Job Configurations: Check this checkbox to update the cluster-based properties for a task. Click the Job Configuration keyword available next to the checkbox to setup the parameters list. The value for Job configuration will contain the pre-defined key and the value. If you have not specified any Job Configuration, then default parameter list will be executed.
*Note: This option is not applicable when Metadata Discovery is selected in the Task Type.
To configure enter the key and value. Click Add button. This will add the details of key and value. Click Save button to save the configuration details, else click Cancel.
Include In Scan tab displays the list of all directories which have been selected for scanning. To delete a directory, click Trash icon in the Actions column. To include a directory for scanning, perform the below steps:
Click Select Directories to choose the files or folders for scanning. This will open the Hadoop File Browser from where files or folders can be selected. The browser displays the list of all directories and the folders.
The Namespace field allows you to enter the URL of the data node. This displays the list of all directories of the selected namespace. The Namespace field appears only when Detection is selected in the Task Type.
Select a directory in the left section of the panel. This will display the list of all the files and folders in the right section of the panel. To select a file or folder, check the checkbox available with the Type column.
Click Add button to include the selected files or folders in the bottom section of the panel. To delete a selected file, click the Trash button in the Actions column.
Checking the checkbox in the Homogenous Type column enables the drop-down in the File Type column. The drop-down in the File Type column displays two formats of the file i.e., JSON and XML
Click Done button to make the changes effective, else click Cancel.
Exclude from Scan: Exclude from Scan tab displays the list of all selected scan locations or the file extensions which need to be excluded while scanning. The Exclude from Scan tab will be enabled when Detection and Masking/Field Encryption are selected in Task Type field.
To add file extension, enter the file extension in the textbox and click Add button. This adds the file extension in the below panel. To delete a file extension, click the Trash icon in the Actions column.
To add scan location, click the + Select Directories or Browse button. Perform the below steps for selecting a file:
The Hadoop File Browser opens. Select a file or a directory in the left section. This displays the list of files and folders for the selected directory.
To select a file or folder, check the checkbox available with the Type column. Click Add button. This will add the selected file or folder in the bottom panel of the browser.
Click Done button to make the changes effective else click Cancel.
Select Policy panel displays the list of all available compliance policies. You can view all Pre-Defined and User Defined policies in this panel. To add a policy, click + Add Policy button. To know more about Policies, refer section Policy. To select one or more policies, check the checkboxes available with the policy name.
Pre-Defined and Custom Sensitive Types panel display the list of all Pre-defined and Custom Sensitive types. To select a sensitive type, check the checkbox available with the sensitive type name. To add a sensitive data type, click + Add New Sensitive Data Type button.
Row Encryption uses default row encryption configuration for masking. This will mask all the entries of the row and is best suited to unstructured datatypes such as text files.
FP Encryption uses default encryption configuration to protect the original data format. This option is best suited to structured datatypes.
FP Decryption can only be executed on data that has been encrypted using FP Encryption.
Decryption can be executed on data that has been encrypted using FP Encryption or Field Encryption.
If you select Masking/Encryption as the Task Type, Protection Option and Consistent fields are also available. Select the required Protection Option for the selected sensitive types. For details about all the masking options available in HDFS, refer to Masking Options.
If you select Metadata Discovery as the task type, the panel Metadata File Types will be displayed. Check the required file types for the scan by checking the checkbox against the name of the required file type.
Click Save button to save the task. To execute the task instantly after saving, click Save and Execute button. Click Save As button, if you want to save the task with the same configuration but with different name.
To edit an existing task, select the task from the Task panel of the Task Definition screen. Click edit icon under the Actions column. To edit an existing task, select the task from the Task panel of the Task Definition screen. Click Pen icon under the Actions column.