Skip to main content

Setting-up Confidence Factor for Sensitive Type

To access the Sensitive Types tab, click Policy > Sensitive Type Manager > Sensitive Types tab. This screen displays the list of all Pre-defined and Custom Defined Sensitive Types.

To set the value of confidence factor for a single sensitive type, follow the below steps.

  1. Click the Setting icon in the Actions column. A Sensitive Type window popup in which you can edit the confidence factor value for a sensitive type.

  2. Edit the values for all the components to increase or decrease the confidence factor. Hovering over on each textbox displays the information about the component and its range.


    The fields are described below:

    1. Sensitive Type Strength:
      Several sensitive types can be linked to patterns. Attribute sensitivity is a measure of how many hits are needed to get a high confidence that any given cell is sensitive in the column. A high value indicates that the Sensitive Type is strongly linked to the pattern. If the value of the sensitivity is high, then a small number of hits are sufficient to get a high confidence value. The regular expression for a sensitive type is characterized by its strength.

      The strength indicates how sure we are that a match of this regular expression in the cell means that the cell is the given sensitive type or not.


      For example, a single pattern match with a Telephone Number (Digits only) is not sufficient to claim that the number is indeed a phone number, it could be any numeric identifier. For such a Sensitive Type, the Sensitive Type Strength should be kept low.

      *Note: The value range lies between 0-1, 1 is the highest.

    2. Header Match Weightage:
      The columns in structured data could contain a header indicating what the column contains and this helps in yielding an information on the sensitive type contained in that specific column. If the header matches a known string for the sensitive type, then we say that there is a header match.

      A higher weightage value tells the calculation to increase the score if there is a match i.e., a match in the header name increases the probability of concluding that the data is truly of the matched Sensitive Type.

      *Note: The value range lies between 0-1, 1 is the highest.

    3. Header Mismatch Weightage:
      If the header does not match a known string for the sensitive type, then we say there is a header mismatch. A higher weightage value tells the calculation to decrease the score if the header name does not match.

      This should be set to a high value only if the header names are expected to be present, and to be indicative of the data contained under them. However, if the headers are known to not be very reliable then this value should be kept low, to avoid labelling sensitive data as a false positive.

      *Note: The value range lies between 0-1, 1 is the highest.

    4. Null Count Weightage:
      This value tells the Calculation to what extent mishits should affect the score i.e., how strongly can the number of mishits decrease the score of confidence factor. This should be set to a higher value if a mishit is a strong indicator that a value may not be sensitive. If a mishit for a Sensitive Type does not indicate much, then this value should be kept low.

      For example, a value may not be picked up as a Full/Part Name, because it may have been missing from the reference lists; it is difficult to claim with certainty that because the value was not flagged as a Name, it is not—such a Sensitive Type should have a low Mishit Count Weightage, as the mishit does not guarantee that the value is not sensitive.

      *Note: The value range lies between 0-1, 1 is the highest.

    5.  Mis Hits Count Weightage:
      This value tells the Calculation to what extent mishits should affect the score i.e., how strongly can the number of mishits decrease the score of confidence factor. This should be set to a higher value if a mishit is a strong indicator that a value may not be sensitive. If a mishit for a Sensitive Type does not indicate much, then this value should be kept low.

      For example, a value may not be picked up as a Full/Part Name, because it may have been missing from the reference lists; it is difficult to claim with certainty that because the value was not flagged as a Name, it is not—such a Sensitive Type should have a low Mishit Count Weightage, as the mishit does not guarantee that the value is not sensitive.

      *Note: The value range lies between 0-1, 1 is the highest.

    6.  Null Count Decay Rate:
      This value exponentially decreases the score based on how many nulls exist. This is the Null Count equivalent of the Mishits Count Decay Rate. As the rate is exponential, it should be kept moderately low.

      *Note: The value range lies between 0-1, 1 is the highest.

    7. Mis Hits Decay Rate:
      This value exponentially decreases the score based on how many mishits exist. A higher value for the rate decreases the score to a large extent if a high number of mishits exist. A lower value for the rate decreases the score to a smaller extent even if a high number of mishits exist. This is not to be confused with the Mishit Count Weightage.


      The weightage determines how much mishits are worth, whereas this determines the rate at which the score is affected as they aggregate. As the rate is exponential, it should be kept moderately low.

      *Note: The value range lies between 0-1, 1 is the highest.

    8. Type Sensitivity:
      A higher value indicates to increase the score for lower number of hit counts. This can also be thought of as “Hit Count Weightage”. It determines how low/high the hit count needs to be for the score to increase.


      A high Type Sensitivity would increase the score even for a smaller amount of hits, whereas a low Type Sensitivity would require more hits for the score to increase. This should be set high for Types that are not as likely to return false positives, and it should be set low for Types that are likely to resemble others.


      For example, a nine-digit number can be a Social Security number, a nine-digit full Zip code or a custom identifier; such Types should have a low Type Sensitivity as they ideally require a high hit count to be correctly classified as a Sensitive Type.

      *Note: The value range lies between 0-1, 1 is the highest.

    9.  Mis Hit Severity Vs Null:
      This is used for determining which out of the two—the Mishits Count Decay Rate (MCDR) or the Null Count Decay Rate (NCDR)—should take preference. The value ‘1’ implies a higher preference towards Mishits Count Decay Rate, and ‘0’ implies a preference for the Null Count Decay Rate.


      To position both as equally preferred, this value should be set to ‘0.5’. The decision of which should be preferred depends on whether mishits or nulls are a better indicator of the data being non-sensitive.


      For example, in case of primary key and nulls. In such a case, nulls would be a better indicator of the column not containing data matching a particular Sensitive Type corresponding to that primary key.

      Thus, the value should be set to ‘0’. For cases where mishits might be a better indicator of a value being non-sensitive, the value should be set to ‘1’.

      *Note: The value range lies between 0-1, 1 is the highest.

    10.  Confidence Threshold:

      The last configuration for the Confidence Factor is the Confidence Threshold. This is the crucial step as we need to set the minimum value that the Confidence score needs to have, for the data to be flagged as sensitive and for it to be displayed on the Results tab.


      When beginning to scan new environments, this value should be kept low (<10%) to avoid missing sensitive data. Once the environment has become more familiar, and false positives have been reduced via tuning of the other configurations discussed earlier, this threshold can be increased in small steps.


      After every increase in the Threshold the Task Results should be analysed to ensure that no sensitive data was skipped from the Results. If so, the Threshold should be lowered again.

      *Note: The value range lies between 0-100.

   3. Click the Save button, to save the configurations else, click Cancel button.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.