Confidence Factor

The term Confidence Factor tells you how sure you can be with the actual outcome. Confidence Factor is defined as a measure of the effective probability that any given cell in the column for an attribute is sensitive. The value is expressed in terms of percentage and depicts how often the true percentage of the sensitive type will be picked from the database that lies within the confidence interval, defined using other key components.

The word effective is used here because base probability is modified to reflect header match, attribute sensitivity, expression strength, and a mishit to null severity factor. The value for Confidence Factor varies from 0 to 100. Value less than 50 indicates less probability of data being sensitive, and value greater than 50 indicates more probability of the data is being sensitive. The confidence calculation is subdivided into the base confidence factor and the composite confidence factor, which is considered, if there has been a header match or mismatch.

The data being analysed can fall under three categories: Hit, Mishit, and Null. If the regex of a regular expression matches the entry in the cell, then it is called a “Hit” and the number of such hits as “Hit Count”. On the other side, if the regex of a regular expression does not match the entry in the cell, then it is called a “Mishit” and the number of such mishits is “Mishit Count”. Also, there might be possibilities that the cell in the table column contains no information or the cell value is null. Any such cell is called “Null Cell” and the number of such null cells is “Null Count”.

In case of structured data, a column contains a header indicating what type of data is present in that column. For organizations that follow a strict standard in Header/Column nomenclature, this metadata can be helpful in deciding whether a flagged column is truly sensitive or is a False Positive. If the name of the column header matches the regex defined for a sensitive type, then we call it a “Header Match”. Otherwise, it is called a “Header Mismatch”.

