Skip to main content

Cassandra (NoSQL)

To create a task in Cassandra, go to NoSQL > Cassandra > Tasks. This will display the Task Definitions screen. To create a new task, click the Add New Task Definition tab. The following screenshot depicts the user interface for creating a task:

  1. Enter a unique task name in the Task Name field. This field supports numeric and character values.

  2. Enter a Task Description of maximum 254 characters. This field supports numeric and character values.

  3. Select the attribute name from the Task Attribute drop-down. This option allows to add tags to the created task.


    *Note: Task Attribute field is not applicable for NoSQL tasks.

  4. Enter a numeric value for Duplicate Damping Factor. It is used to specify the maximum hit count for any given type of sensitive type.

  5. PK Protect is equipped with data sampling to limit the area of scan which helps in reducing the time taken for detection. By default, there are two options to scan sample data from the database are:

    1. Random 1000 rows/documents - It will sample 1000 rows from the data, randomly.

    2. Read random 5% of data –It will sample approximate 5 percent of the data. By selecting percentage, true random sampling is performed. It fetches true random data from multiple partitions of the table distributed across the cluster.

      Select the Sampling Configuration from the drop-down.


      *Note: For Cassandra, sampling by “percentage” scans an approximate percentage of rows.

      By default, Random 1000 rows/documents option is selected. To define a sampling configuration, go to NoSQL > Cassandra> Tasks > Sampling Configuration tab. Click + Add button. To create a sample in the Add New Task Definition screen, click the +Add button next to the Sampling Configuration drop-down.


      The Sampling Configuration screen is depicted below:


      Perform the below steps to create a sample:

      1. Enter the name and description of the sampling configuration in the Name and Description textbox. These fields accepts both numerals and character values.

      2. Check the option Set Sampling Config As Default to set the Sampling Configuration as the default configuration for all your tasks.

      3. By: To Specify how to pick data for sampling from the database, there are two ways:

        1. Rows/Documents: Select Rows/Documents from the drop-down, to sample data based on the number of rows/documents.

        2. Percent: Select Percent from the drop-down, to sample a percentage of the data.


          *Note: Set the property cassandra.count.disabled’ as true. It is located under NOSQL IDP properties on IDP Management > IDPs screen in Admin. It enables the total rows count operation on a table.

      4. Value: Enter the numeric value. It will specify the total number of records to be processed if sampling By-Rows is selected and denotes the percentage of sampling By-Percent is selected. 

      5. Type: Select the sampling configuration type from the Type option.

        1. Random: This option in the Type field scans random entries in the database. It will scan the number of entries based on the value entered in the Value field.

        2. Complete: Select Complete to use the complete data for sampling.

      6. After setting up the required configuration, click Add to add the user-defined sampling configuration to the list. Click the Save button to save the changes.

  6. The Select Connections lists down all the available connections. Any number of connections can be selected for a task. You can also create a new connection by clicking the +Add New Connection button.  To know more about how to create and manage connections, refer to section Connection Manager.


    To select a connection, check the checkbox available with the connection name. The list of connections can be segregated based on the group values specified in the Select Group drop-down.


    There are five options based on which connections can be sorted.

    1. Connection IDP: Categorizes the available connections based on the types of IDPs available, i.e., Detection and Masking.

    2. Connection Type: Categorizes the available connections based on the type of server                     connected to, i.e., Oracle, Teradata, SQL server etc.

    3. Host Name: Categorizes the list of available connections based on Host Names.

    4. Location: Categorizes the available connections based on the location of the target source system server, i.e., On-Premises and Cloud.

    5. User Name: Categorizes the list of available connections based on the Usernames.

      The Select Group Value drop-down displays the values based on the selection made in the Select Group drop-down. The panel gets populated as per the selection made in the Select Group drop-down.


      Check the checkbox available with the connection name that you want to select, only then the Test and Database Object Filters options get enabled.


      The Test button lets you to test the connection before executing the task. It will show the pop-up on successful completion of testing.


      Click the Database Object Filter to filter tables and/or columns. Once filters are defined, then only those databases/tables/columns that match the filter are scanned.


      Select the connection from the Connection List or search the Schema/Keyspace name in the Filter by Schema/Keyspace name textbox. This panel displays the list of all available connections. To refresh this section, click  Refresh button next to the textbox. This will update the information.


      Apply the filter in the right section of the panel by specifying the Operator, Collection Operator and Collection Filter name. Click the + (add) sign next to the collection filter drop-down to add the filter in the Selected Filters.

       

      The Selected Filters display the list of all the filters which have been recently added. To edit a filter detail, click this Pen icon in the Edit column. To delete the filter, check the checkbox next to the edit column and click this Trash button.


      There are eight types of Operators based on which you can select the Column Family name.

      1. Equals: This operator will check whether the given column family name exists in the selected connection. It will return the matched records if the condition is fulfilled.

      2. Not Equal to: This operator will return all the records except the given column family name.

      3. Contains: This operator will return only those column family name which names contains the given condition.

      4. Does not contain: The functionality of this operator is like the Not Equal to operator since it returns all the column family names which do not contain the given column family name.

      5. Starts with: This operator will return all the column family name which name starts with the given condition.

      6. Does not start with:  The functionality of this operator is like the Does not contain and Not Equal to, since it will return all the column family name which does not start with the given condition. 

      7. Ends with: This operator will return all the tables/column name whose name ends with the given input.

      8. Does not end with:  The functionality of this operator is like the Does not contain and Not Equal to, since it will return all the tables/column name except the one which has been entered.

        To test a filter, check the checkbox for the selected filter next to the Edit column. Click the Test button in the Filtered Data section. The Filtered Data section displays the result for the filter applied on the Keyspace name and the Column Family.


        Click Save button to make the changes effective else click Cancel.

  7. The Compliance Policies panel displays all the Pre-Defined and Customized Policies. You can select any number of policies while creating or editing a task. Sensitive types associated with the selected policy can be viewed in the Pre-Defined and Custom Sensitive Types panel. Selecting a policy is not a mandatory step, users can also proceed to select individual sensitive types. You can also add a new policy by clicking the +Add Policy button. To know more, refer to section Policy.

  8. The Pre-Defined and Custom Sensitive Types panel lists down all the Sensitive Types. The Sensitive Type associated with the policy gets selected in the Pre-Defined and Custom Sensitive panel and cannot be removed once selected, however any number of sensitive types can be added to the panel. You can also add a new sensitive type by clicking the +Add New Sensitive Data Type button. To know more, refer to section Sensitive Types.

  9. Click Save button to save the task. To execute the task instantly after saving, click Save & Execute button. Click Save As button, if you want to save the task with the same configuration but with different name.

  10. Click Cancel if you do not want to save the changes.
    To edit an existing task, select the required task from the Task Definitions screen. Click this Pen icon in the Actions column.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.