Skip to main content

Detection - GCS

Detection tasks scan the target source system for sensitive data elements.

*Note: Detection can be performed on Image Files also.

Click the + Show Advanced Options button to set the advanced settings for the task.

  1. (Re)Scan All Files: Check this checkbox to scan all the available objects for a given connection between the dates specified in Files Modified After and Files Modified Before drop- down. The Files Modified After and Files Modified Before fields appear when (Re) Scan All Files option is checked.

  2. Skip Files with No Extension: Check this checkbox to skip the scanning of the files which have no extension. Once the task has been executed successfully, the result can be seen under the Skipped Files tab of the By Task screen for the Results module. The reason for skipping the file will be cited as Object Name without extension under the Skipped Reason column.

  3. Unstructured Input Files: Check this checkbox, if you know that the scan location only contains files of text type only. This option allows the system to bypass the file type detection process and to finish the task in lesser time.

    *Note: Enable property enable.discovery.unstructured.sampling=True in

      HDFSAgentConfig.properties file, located at .../installation directory/Agents/<IDP>/expandedArchive/WEB-INF/classes to perform the sampling for unstructured files under discovery. By Default, value of this property is set to false. Restart the IDP.

  4. Don’t Report File Size: Check the checkbox if you do not want to report the size of the object once scanning gets completed.

  5. Create Metadata Info File: Check the Create Metadata Info File checkbox to write the metadata of the scanned files as a separate file named Metatdata.txt. It is recommended to run any task with this flag for few files to understand the metadata, as for large number of files, it will take a long time.

    *Note: By default, the Metadata.txt file is available at location: ‘resultsDirectory/task_name/task_id’.

  6. Read Files: Choose to read the entire file or a part of the file at random.
                                                 

    1. Entire File: Choose this option, if you want to read all the content of the selected files.

    2. Part of Files: Choose this option if you want to read the content of a structured file at random. On selecting this option, the Exit on first Hit option is visible.

      To scan any unstructured file partially, set the value of enable.discovery.unstructured.sampling property to True in HDFSAgentConfig.properties file, located at  “.../installation directory/Agents/<IDP>/expandedArchive/WEB-INF/classes”.

  7. Exit on first hit: Check this checkbox to stop the scanning, when the first sensitive record in a file is detected during the detection process. This option is visible when you select Part of Files option in the Read Files option.

  8. Header Row Number: The value in this field specifies the row number where column headers are defined in the file. By default, the value is set to -1. This value specifies that file does not contain any column headers. If any other value is specified in this field, it means that this is the row number where column headers are placed.

    For example, in the below screenshot the header row number is set to 5. It means that the column headers are defined in the fifth row.

  9. Filename Regex: Enter the regular expression of the object. Only those objects are scanned whose name matches with the regex entered in this field.

  10. Auto Batch Size: This option defines the number of files per batch during the scanning process. 

    1. Batch Size (Files): This option will scan the files per batch during the detection process. By default, the batch size per file is 30. This option is visible when auto batch size checkbox is not checked.

    2. Auto Batch Size (MB): This option allows you to enter the minimum batch size in Batch min size (MB) option.  This option is visible when you check the Auto Batch Size (MB) option. By default, the batch size if 512 MB.

  11. Sampling Configuration: PK Protect is equipped with data sampling to limit the area of scan which helps in reducing the time taken for detection. By default, there are two options to scan sample data from files are: 

    1. Top 1000 Rows - It will sample approximate 1000 records from the top of the database.

    2. Read top 5% of data - It will sample 5 percent of the data from the top of the database.By default, Top 1000 Rows option is selected.

      To create a sample, go to Google Cloud > GCS > Tasks > Add/Edit Sampling Configuration Tab. Click + Add Configuration button. You can also create a sample by clicking + Add button next to the Sampling Configuration drop-down on the Add New Task Definition screen.


      Perform the below steps to create a sample:

      1. Enter the name and description of the sampling configuration in the Name and Description textbox.

      2. Check the option Set Sampling Config as Default to set the Sampling Configuration as the default configuration for all tasks.

      3. Check the option Show Advance Sampling Details to set the advanced settings for sampling as shown above. Below are the options for advanced settings:

        1. File Size Range (Bytes): Enter the range for the sample in Bytes.

        2. To: Enter the ending range for the sample. `

        3. By: To specify how to pick data for sampling from the source system, there are two ways:·
             

          1. Rows: Select ‘Rows’ from the drop-down, to sample data based on the number of rows.

          2. Percent: Select ‘Percent” from the drop-down, to sample a percentage of the data.

        4. Value: Enter the numeric value. It will specify the total number of records to be processed if sampling By-Rows is selected and denotes the percentage of sampling By-Percent is selected.

  12. After setting up the required configuration, click Add to add the user-defined sampling configuration to the list. Click the Save button to make the changes effective else click Cancel button.

To proceed further for remaining steps, refer to step 4 of Create a GCS task.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.