Client side rule based matching data to discover PII* and other sensitive information

A long title for this new post, not sure if that goes well with online media marketing rules but it does cover the topic so here we go (by the way, this is a translation of the recent post in Dutch at https://www.basaltaura.nl/2022/07/19/client-side-regel-gebaseerd-matchen-van-data-op-pii-of-andere-gevoelige-informatie, especially for our international audience šŸ¤“). Let’s start by explaining what PII actually is:

*) PII is all information, managed actively or passively within an organisation, that can be used to potentially reveal the identity of a person. This can be data like name, a Dutch Citizen Service Number (BSN, burgerservicenummer), date and place of birth, but also information about relatives or biometric data, and in addition all other information that is linked or can be linked to a person, including but not limited to medical, educational, financial and employee information.

The PII explanation above is an interpretation of the information at https://csrc.nist.gov/glossary/term/piiĀ AtĀ https://autoriteitpersoonsgegevens.nl/nl/over-privacy/persoonsgegevens/wat-zijn-persoonsgegevensĀ you can also find info in Dutch on personal data, without the term PII being used though.

The label ā€˜sensitive informationā€™ covers all other data, not necessarily linked to persons, that is relevant for a company or organisation in the context of internal and/or external (confident) processes, intellectual property (IP), competition etcetera.

To start with the end goal: organisations needĀ actionable findingsĀ based on applied rules in the scan with the DataFitness Agent that discover this PII or other sensitive information. These findings are automatically delivered at the relevant persons (or systems…) for traceable followup. To support the understanding of the data and monitor progress the online DataFitness customer environment provides access to this information in dashboards for further analysis and visual presentation. Below screenshot shows an example of such a dashboard, followed by further explanation uitleg of the ‘why’ and how to get to these results.

Why client side rule processing with the DataFitness ControlOne Agent?

Consider the following case: in the current (and future!) situation there is a lot of unstructured file based data to be found in your organisation, like Office documents, PDF files, images and other media, project information and moreā€¦). These files can contain sensitive information like PII bevatten, but you don’t know which files. Running the DataFitness content scan can extract the content from these files, and send them to your customer cloud environment for further analysis.

Now what if none of these files and their content can be moved from your organisation IT to an external environment, because of the exact fact that they can contain PII data, but you still have the need to know which files this is about, and the type of sensitive information? Then theĀ client side content checkĀ will run your determination rules locally, and report only the findings to the DataFitness Cloud environment, providing you with the data needed for reporting, determining and monitoring follow up actions.

So if your organisation has a compliancy requirements on PII and other sensitive data in place to ensure only information that satisfies certain criteria can be used (or must be excluded), than the content check is the right option (note that content check can also be used with the meta scan, doing a meta check, but a check on content will result in more findings).

How does it work?

Imagine you have some random data that might contain PII or other sensitive informationā€¦

ā€¦and in addition concrete and/or hypothetical rules you want to use to discover any matches of this PII or organisation specific information. The code below shows some sample rules for Waterschappen (regional Water Authorities in the Netherlands) based on public documents to define concept rules for dossier- and project number. Some basic PII examples like BSN, IBAN, passport and others. These are not complete (because something like BSN also needs an additional calculation to validate the match), but more on that in a next blog.

A default set of rules is available for all DataFitness users, and additional rules can be added by the customer, or obtained from other parties.

During the scan the rules are requested from the customer Cloud environment and applied locally. Looking under the hood shows that based on the content findings are added to the results for every matching rule.

The results are then send to the specific customer DataFitness Cloud omgeving, depending on the configuration settings determined by the customer this can be either the content or findings, or both.

Automatic analysis of the findings is presented in a one or more dashboards, and actions are linked to the findings, like sending out notifications to relevant persons or departments in the organisation highlighting the priority and proposed follow up.

The interactive dashboards offer options to filter information to quickly obtain specific insights.

And the result?

Using client side rule base matching to discover PII and other sensitive information gives concrete insights and control on the data. This enables you to adhere to privacy legislation or other external regulations, and also internal datamanagement structures, providing the possibility to setup responsible and ongoing progress monitoring on unstructured data.

Interested to test this in your organisation, or any other questions? Contact us atĀ info@dataether.nl

“Any questions?” “Can I now finally add my own rules?”

Leave a Reply

Your email address will not be published. Required fields are marked *