To create a RC/ORC file structure, perform the following steps:
The Assign Sensitive Data Type panel allow you to browse and select the columns that needs to be defined before applying masking. To add a column, there are two ways of defining it:
Manually – To enter a column name manually, perform the following steps:
Enter the name of the column. This is a mandatory field.
Enter the number of the column. This lets you define the position of the column in the Parquet file. This is a mandatory field.
Select the Sensitive Data Type from the drop-down. This displays the list of all Pre-defined and User defined Sensitive Data Types.
Click the Add button. This will add the details of the column in the list panel.
To delete all the added columns details from the panel, check the checkbox(s) available with the Column Number header and click Delete. The Delete button is enabled only when a record has been selected. You can also delete an individual record by clicking the Trash icon available under the Actions column.
Similarly, to edit the details of the column, click the Pen icon under the Actions column.
To filter or search the structure from the given list, click the Search field. This displays the list of headers based on which filter can be applied.
To remove all the filters, click the x Clear Filters button. To remove individual filters, click the Close button next to the applied filter.
Import File Structure – Using this option, you can import the column details, automatically. Following are the steps for importing the column details:
Click the Browse File button.
This opens up the side window which displays the list of all the objects or files in the selected cluster for the selected module. This screen is divided into two panels:
The top panel displays the information for the selected module and cluster. To change the module, click the Select Module drop-down. This will list all the modules of the application. There are five modules i.e., File, Hadoop, AWS, Azure, and GCS. Similarly, if you want to change the cluster, click the Select Cluster drop-down.
In the same panel, you can also view the IDP status for each selected module.
The bottom panel i.e., File Browser displays the list of all the directories and the objects for the selected module and cluster. This screen is divided into two panels:
*Note: The browser name changes based on the module selected in the Select Module drop-down.
The left panel display the list of all the directories associated to a selected cluster. You can also search the directory name by typing the name in the field provided with the Expand button.
The Expand button is enabled only when you search data in the textbox.
The right panel displays the list of all the files and the folders which are stored in the selected directory. To select any file or object, click the specific file name.
To refresh this panel with the updated information, click the Refresh button on the top right corner of the panel. If Is Recursive feature is enabled, you can search any file in the parent directory and its sub directories. However, if this feature is disabled, search will happen only in the parent directory.
You can also configure the columns in this panel. Using this option, you can re-arrange the columns as per your requirement. To know more, visit Common Controls.
On selecting the object, click the Select button. This redirects you to the new panel in which you need to enter the number of rows which will be used for sampling and select the show sampling rows option.
Enter the number of rows to sample in the Rows To Sample field. Before importing the data, this field samples the data based on the value specified in the field.
The Show Sample Rows field allows user to view the sample values stored in the columns for the selected file. If the value is set to True, it displays the sample values for each column that is being imported. Setting the value to False, will not display the sample values for that structure.
Now, click the Import Columns button. This will import all the columns of the selected file or object in the Filtered Columns panel.
Select the sensitive type from the Sensitive Data Type drop-down for each column details that needs to be added in the structure. The Sensitive Data Type drop-down displays the list of all Pre-defined and User Defined sensitive types.
To add the filtered column details, check the checkbox(s) for each column details.
Click the Save button. This will add the selected column details in the Assign Sensitive Data Type panel.
Click the Save button. This will save the structure details. The details are displayed on Structure List screen. Click Cancel button if you do not want to save the changes.