I'm implementing #609892: Multiple custom variables and I'm trying to block all tokens that contain personally identifying information for privacy reasons in the form validation. I have collected a good number, but I think it will become a never ending story to maintain a list like:
$token_blacklist = array(
'[current-user:edit-url]',
'[current-user:name]',
'[current-user:mail]',
'[current-user:uid]',
'[current-user:url]',
'[current-user:path]',
'[node:author]',
'[node:author:edit-url]',
'[node:author:mail]',
'[node:author:name]',
'[node:author:path]',
'[node:author:url]',
'[node:author:uid]',
);
I'm really asking me how to make this generic... My first idea was to block all node:author:*
stuff, but this may leave other stuff out. You may argue this the admin/marketing need to decide what tokens may be used, but I do not think so. I know users like to collect personally identifying information and I do not like to help/support them in any way to archive their illegal goal.
Comments
Comment #1
Dave ReidI'm not sure how this could be accomplished.
Comment #2
hass CreditAttribution: hass commentedThis is very bad... Am I the only person how care about personal data? For now I'm doing this with below code, but this is not save and false positives may occour, too.
We should think about implementing something like a prefix or an info that defines if a value could contain personally identifying information. This becomes more and more an issue and so many people do not follow the data protection rules. Additionally a way to only return tokens *without* personal identifying data in
token_tree
view.Comment #3
Dave ReidDid I ever say I don't care about personal data? I just have no idea how this can be solved currently. The only thing I can think of is for you to perform your own validation on the textfields which users can enter tokens and raise form errors if specific tokens can be found. And how do we define which tokens are or are not and how do we enforce that standard in other contrib modules?
Comment #4
hass CreditAttribution: hass commentedWell nobody have said he do not care, but for the simple reason that I seems to be the very first asking for this it shows clearly that nobody cared about it - until now :-)
I'm not sure about the design of such a feature. From the placeholder structure it's very difficult to maintain a list. Every module names the variables how it likes and a generic naming that could be catched is not really available. It may also be a good idea to simply rename some placeholders to be generic and maybe prefix them in a way that makes it possible to run a regex on it. Today this is not easy if not impossible.
Comment #5
nicksanta CreditAttribution: nicksanta commentedThis is a terrible, terrible idea. Why are you making the assumption that :author tokens contain personally identifiable information? Drupal user accounts aren't necessarily people, nor do they necessarily contain personal information.
There should absolutely be a way to override this, or alter the list to the developer's requirements.
Comment #6
hass CreditAttribution: hass commentedYou are completly wrong!!! A Drupal user account makes an individual personally identifyable (IP, Email, names, timestamps, etc). Additional to this usernames very very often contain real names. You are NOT allowes to push this data to the Google analytics system.
Comment #7
nicksanta CreditAttribution: nicksanta commentedA website I'm maintaining (online publication with 25+ writers) uses user accounts to associate nodes to their writer. The accounts are never used to log in, the editorial team simply changes the node author.
The analytics team are wanting the author of the current node to be set as a custom variable in the page scope to track whether people choose to read articles by the same author.
For the most part the exclude list works perfectly fine, and stops laymen from abusing personal information and violating google's T&Cs. But if someone has the tech skills to write an alter hook and change that exclude list for evil, then they have the ability to write their own custom GA tracking code to do it anyway.
In the end, it's not up to the Drupal-integration maintainer to enforce Google's terms and conditions. I think this patch is a reasonable request all things considered, because you shouldn't be making assumptions about how sites have been put together.
Comment #8
nicksanta CreditAttribution: nicksanta commentedBumping this.
I still do not understand why it is the responsibility of this module to ensure that people do not violate their agreement with Google.
Comment #9
nicksanta CreditAttribution: nicksanta commentedIf the maintainer of this module insists on including this functionality, then I implore him to include a patch from this ticket: http://drupal.org/node/1307452
Comment #10
rcross CreditAttribution: rcross commentedi think the idea of form validation is better than trying to black list these tokens. For anyone who wanted to circumvent this, they could just rewrite some tokens to expose the details. It would be better to have a notice that tells users about the privacy risk and links to google's ToS
Comment #11
hass CreditAttribution: hass commented@Dave: Can I get your support for adding a
personal-identifying-data = TRUE
flag (or any other named flag) to the token info and follow up requirements? If there is no explicit=== FALSE
we default to TRUE. I hope you can share your ideas so I'm not implementing something that has no chance to get in.Idea:
personal-identifying-data
flag contains data that is privacy relevant.personal-identifying-data = FALSE
explicitly set, we expect that this token can be used.personal-identifying-data = FALSE
itemshook_token_element_validate()
and/ortoken_scan()
need to be able to check for this boolean value or token_scan() need to just refuse it as an invalid token.Example: