Personal Identifiable Information (PII) should not be stored on non-production environments, therefore we need a mechanism to mask the data for non-prod while ensuring it's still meaningful.

What is the best approach to deal with this in Drupal?

Post initially from @Dubs #2848974-9: Privacy Concerns as GDPR Compliance

Comments

mgifford created an issue. See original summary.

gisle’s picture

Isn't this a duplicate of #2971800: Pseudonymisation - Separating PII data from non-PII data?

Can they be merged and one of them closed?

mgifford’s picture

Possibly. I'm fine with going that way.

I see this more about the process to move content between production & other environments. I see #2971800: Pseudonymisation - Separating PII data from non-PII data to be more about identifying the information which includes PII vs that that doesn't.

They are certainly related.

gisle’s picture

It was only a suggestion - it is your call.

My thinking was this: If you have implemented #2971800: Pseudonymisation - Separating PII data from non-PII data, there will be no problem exporting data from your production environment. It is already pseudonymised, so you can safely export it (provided the database with the PII data is secure, and not exported along with the pseudonymised data).

I.e.: Moving data between production and other environments is just a special case of the privacy issues that pseudonymisation is intended to solve.

mgifford’s picture

It will be encrypted on production site, but pseudonymised when exported to staging/dev sites, right?

Maybe I just don't understand the process. But I see them as two different things.

How you export the data so that you have a meaningful replication of production is different than the stage of identifying & encrypting PII.

But ya, maybe I'm getting this wrong.

gisle’s picture

It will be encrypted on production site, but pseudonymised when exported to staging/dev sites, right?

Well, at least that is not how I think about these things.

Here is a brief description of what our current system does:

On the production site we simply replace all items classified as Personally Identifiable Information (PII - i.e. names, phone numbers, addresses, credit-card numbers, etc.) with a 128 bit Universally Unique IDentifier (UUID). Note that the UUID is a pseudonym, it is not an encrypted version of the PII.

Then, in a second production database, we store records that links all UUIDs in the system back to its corresponding cleartext PII. To add another layer of security, you could encrypt this second production database - but according to our DPIA, this is not necessary for us (YMMV), so we don't use encryption.

To export the data (for all purposes) we simply export the pseudonymised database we use in production. The second database (the one linking UUIDs to PII) is not exported, but kept in a very secure location. This means that no-one with access to the exported data will be able to go from the UUIDs they have access to, to PII.

As a nice bonus, this arrangement also let us comply with the right to erasure. When a data subject exercise his or her rights according to Article 17, we just delete the single record in the second database where the relation between the PII and the UUID is stored. Now, everywhere else in our system, what remains is just the UUID (it is no longer a pseudonym because it cannot be linked to the PII – so it is no longer personal data).

How you export the data so that you have a meaningful replication of production is different than the stage of identifying & encrypting PII.

It may be different, or it may be same, depending on what means you use to solve the problems.

mgifford’s picture

Status: Active » Closed (duplicate)

Let's mark this one as a duplicate then. Thanks for the description of how you've done it.