Change regex on Term ID privacy warning / Allow taxonomy tokens in dimensions [#2037595]

My goal is to read the analytics output at the end of this month and see how many people visited articles about "cats" versus articles about "dogs". I've inserted values into the custom variables section of the configs with what I think should be a correct way of tracking the taxonomy term name of each node visited.

In the custom variable fields, I set the NAME of one variable to be "Animal" and the VALUE to be "[node:field_animal:1:name]". If I understand right, this means that Google Analytics will track the first term stored in each field_animal field of each node visited. However, I get a warning from the module that says,

The Custom variable value #1 is using the following forbidden tokens with personal identifying information: [node:field_animal:1:name].

I think what is happening is that the module is poorly detecting the string "name" in the value as potentially personal information about a user, rather than as the name of the taxonomy term in a node. It sees "name" in the field and prevents me from saving the config.

If I pick a different token from the same field, like "[node:field_animal:1:tid]" (tid instead of name) then I can store the config and assume I'll be getting my information in tomorrow's analytics report.

Why does the term ID pass the tests while the term name fails?

Comment	File	Size	Author
#34	Issue-2037595-by-osopolar-hass-Change-regex-on-Term-D8.patch	3.67 KB	hass
#34	8.x-2.x: PHP 5.5 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 5.5 & SQLite 3.8, D8.2 20 pass, 1 fail PHP 5.6 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 7 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 7 & SQLite 3.14, D8.2 20 pass, 1 fail
#24	Issue-2037595-by-osopolar-hass-Change-regex-on-Term-.patch	1.28 KB	hass
#24
#18	google_analytics-tokens-allow-term-name-2037595-18.patch	408 bytes	osopolar
#18	7.x-2.x: PHP 5.3 & MySQL 5.5, D7 15 pass PHP 5.4 & MySQL 5.5, D7 15 pass PHP 5.5 & MySQL 5.5, D7 15 pass PHP 5.6 & MySQL 5.5, D7 15 pass
#13	google_analytics-term_name_privacy-2037595-13.patch	987 bytes	eelkeblok
#13
#5	google_analytics-term_name_privacy-2037595-5.patch	519 bytes	eelkeblok
#5

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

hass CreditAttribution: hass commented 22 July 2013 at 21:26

Status:

Active

» Fixed

This one of the known unwanted side effects. We need to change the regex for this. Please try to use any other variable.

Comment #2

seaneffel CreditAttribution: seaneffel commented 25 July 2013 at 15:15

Title:	Term ID token trips the privacy warning	» Change regex on Term ID privacy warning
Category:	support	» task
Status:	Fixed	» Active

Ok, instead of dead-ending the issue, let's change the scope and make it a task.

Also, if anyone can recommend a workaround to track taxonomy terms then it will take some pressure off these otherwise busy module maintainers.

Comment #3

hass CreditAttribution: hass commented 25 July 2013 at 20:05

Category:	task	» support
Status:	Active	» Fixed

If you can share a patch, feel free to reopen.

Comment #4

8 August 2013 at 20:11

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #5

eelkeblok

him/his

Dutch

Netherlands 🇳🇱

CreditAttribution: eelkeblok commented 24 July 2014 at 13:00

Version:	7.x-1.3	» 7.x-2.x-dev
Issue summary:	View changes
Status:	Closed (fixed)	» Needs review

File	Size
google_analytics-term_name_privacy-2037595-5.patch	519 bytes

This is the exact issue I am running into. Previously, I've made use of the patch in #1307452, but it no longer applies to the 2.0 version of this module.

Can we make the :name] entry in the blacklist more specific? I suppose it is like it is to be generic and catch any token representing a person's name, but the way it is, it is too generic. I think it is a commendable idea to try and protect users from themselves, but it seems too restrictive, the way it is. The safety net is now preventing perfectly legitimate uses as well.

At least for this particular issue, I propose the accompanied patch. Maybe in general, you should not try and make the list exhaustive, but instead try and catch the most obvious violations. Or alternatively, I'd even suggest to go as far as offering a setting to disable the blacklisting completely and maybe displaying a message whenever it is set, warning the user that the safety net is gone and they are making any edits at their own risk.

Comment #6

eelkeblok

him/his

Dutch

Netherlands 🇳🇱

CreditAttribution: eelkeblok commented 24 July 2014 at 13:01

Category:

Support request

» Task

Comment #7

hass CreditAttribution: hass commented 24 July 2014 at 18:48

Status:

Needs review

» Needs work

This does not work as you leave a lot of alternate user name tokens outside. That is why the name key is inside. I think we can only solve this with a whitelist.

Comment #8

eelkeblok

him/his

Dutch

Netherlands 🇳🇱

CreditAttribution: eelkeblok commented 25 July 2014 at 11:29

The token for which this issue is occuring is one for the name of a taxonomy term in a field. For an example, see the issue summary. So, whitelisting will not work, you'd need a whitelist entry for any taxonomy field (*could* be generated automatically, but I really feel this goes a bit too far for a courtesy function). What user name tokens are we leaving out, this way?

Comment #9

hass CreditAttribution: hass commented 25 July 2014 at 16:23

current-user:name and tons of other as user:name is inherited to deeper structures. See token browser... you may need entity api to see these... not sure. I have seen tons of them.

Comment #10

eelkeblok

him/his

Dutch

Netherlands 🇳🇱

CreditAttribution: eelkeblok commented 31 July 2014 at 07:35

I have entity API, which indeed adds a sh*tload of tokens. I think my changes covers most of the intended behaviour of the filter. I did find a class that is not covered, which is original variants (e.g. current-user:original:name). However, I also found many examples of tokens that are harmless, would be very useful as a GA tracking metric, but are currently being prevented from selection. For example:

Filenames
Taxonomy terms (which led to the opening of this issue)
Content types
(Domain names, although those would not make a likely candidate to add as a custom metric, unless you are doing something wrong, I suppose)

As an aside, an important area the module is not covering is Real Name module. An administrator selects arbitrary fields to serve as parts of the real name of a user. As it is, the GA module does nothing to prevent an administrator from adding those. All in all, I think this feature is very much missing its target, as it is now.

Things to consider:

Does it really need to be this forceful (i.e. validation error, no way to add the token, even if the user knows what they are doing)? (Who's responsibility, ultimately, is it if a site violates Google's terms of use? Wouldn't it be enough to display a warning?)
How many false positives does the solution generate (as it turns out, quite a lot, especially taxonomy terms would seem to be a popular metric).
How many situations are there that are *not* covered by the filtering? (false negatives, basically; I think the most important one is introduced by Real Name module), but you can never take into account aal custom code that is out there).

Comment #11

hass CreditAttribution: hass commented 31 July 2014 at 09:00

I'm happy if we are able to remove the false positives, but not at the full removsl costs, please. Above patch does not cover what you describe and would be a lot better, as I know.

Comment #12

hass CreditAttribution: hass commented 13 August 2014 at 21:57

Status:

Needs work

» Closed (fixed)

If you can share a working patch, feel free to reopen.

Comment #13

eelkeblok

him/his

Dutch

Netherlands 🇳🇱

CreditAttribution: eelkeblok commented 8 September 2014 at 10:12

Title:	Change regex on Term ID privacy warning	» Do not prevent tokens potentially containing personally identifying tokens to be added
Status:	Closed (fixed)	» Needs review

File	Size
google_analytics-term_name_privacy-2037595-13.patch	987 bytes

Alright. Since getting the actual check exactly to everyone's liking doesn't seem to be feasible right now, I propose the validation error (a.k.a. brick wall) is replaced with a warning message, that doesn't physically prevent tokens that trigger the filtering code from being added. I do firmly believe adding personally identifying information is mostly the user's reponsibility, not the module's. This change is implemented by the accompanying patch.

I updated the issue title to reflect this change in approach.

Comment #14

rwohleb

he/him

English

CreditAttribution: rwohleb commented 9 September 2014 at 19:29

I think the hard block is a bit too big-brother, but if the maintainers don't like the idea of removing the hard block, how about we add a hidden bypass variable that can only be set via the settings file. This way a developer can bypass the hard block if they know what they are doing, but the casual users won't be able to get into trouble by accident. Even if the bypass variable is enabled, I still think warning messages are a good idea.

Comment #15

hass CreditAttribution: hass commented 9 September 2014 at 21:33

Title:	Do not prevent tokens potentially containing personally identifying tokens to be added	» Change regex on Term ID privacy warning
Status:	Needs review	» Needs work

#1307452: Provide developers a way to bypass sometimes overzealous forbidden token checking is a dup and as noted more than once - I'm happy to change/fix the regex. You only need to share a patch for this and not any other stuff.

Comment #16

rwohleb

he/him

English

CreditAttribution: rwohleb commented 9 September 2014 at 21:35

Yeah, I was just about to retract my comment after seeing the other ticket :P

Comment #17

hass CreditAttribution: hass commented 4 October 2014 at 09:05

Status:

Needs work

» Closed (won't fix)

Looks like nobody is interested in fixing the false alarms.

Comment #18

osopolar

German

🇩🇪 GER 🌐

CreditAttribution: osopolar commented 4 March 2016 at 18:38

File	Size
google_analytics-tokens-allow-term-name-2037595-18.patch	408 bytes
7.x-2.x: PHP 5.3 & MySQL 5.5, D7 15 pass PHP 5.4 & MySQL 5.5, D7 15 pass PHP 5.5 & MySQL 5.5, D7 15 pass PHP 5.6 & MySQL 5.5, D7 15 pass

Yes, I am interested in fixing this false alarms. It seems to be possible with Negative Lookaheads.

Comment #19

osopolar

German

🇩🇪 GER 🌐

CreditAttribution: osopolar commented 16 March 2016 at 03:28

Status:

Closed (won't fix)

» Needs review

Comment #20

hass CreditAttribution: hass commented 16 March 2016 at 08:28

Great if it is that simple. We should check some more false positives I think.

Comment #21

hass CreditAttribution: hass commented 20 March 2016 at 00:43

#10 in this case here has some other useful tokens that we could whitelist.

Comment #22

hass CreditAttribution: hass commented 3 April 2016 at 11:16

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

Can you add a test, please? Not sure what modules it requires to use [term:name]. I have taxonomy module enabled and some terms exists for sure, but token validation always fails with invalid token. Also tested with entity api, but no success.

Comment #23

hass CreditAttribution: hass commented 3 April 2016 at 11:34

Two things are missing in this patch and until this is done I have no idea how you tested your patch or made it working. In both form elements the 'term'm type is missing and therfore the validation fails and the insert is not possible in token tree.

  $form['googleanalytics_custom_dimension']['indexes'][$i]['value'] = array(
    '#token_types' => array('node', 'term'),
  );

  $form['googleanalytics_custom_dimension']['googleanalytics_token_tree'] = array(
    '#token_types' => array('node', 'term'),
  );

Comment #24

hass CreditAttribution: hass commented 3 April 2016 at 11:38

File	Size
Issue-2037595-by-osopolar-hass-Change-regex-on-Term-.patch	1.28 KB

Comment #25

hass CreditAttribution: hass commented 3 April 2016 at 12:14

Bug: user:original] is not blocked. So I'm adding this here to remind me on this before next release.

Your change introduces new issues. With your change the :name tokens are no longer properly detected. e.g. [user:name] is detected, but [current-user:name] not.

Test tokens I used:

[term:name] [term:tid]
[current-user:name]
[current-user:edit-url]
[current-user:original]
[user:name]

Comment #26

hass CreditAttribution: hass commented 3 April 2016 at 12:13

Title:

Change regex on Term ID privacy warning

» Change regex on Term ID privacy warning / Allow taxonomy tokens in dimensions

Comment #27

osopolar

German

🇩🇪 GER 🌐

CreditAttribution: osopolar commented 4 April 2016 at 21:17

@hass: please see useless last patch from osopolar in #231451: Add hook to alter data before sending it to browser #88 and the example there, where with implementation of hook_googleanalytics_admin_settings_form_alter() more token types were made available.

Comment #28

hass CreditAttribution: hass commented 4 April 2016 at 20:25

Does not make the patch here a correct one.

Comment #29

osopolar

German

🇩🇪 GER 🌐

CreditAttribution: osopolar commented 4 April 2016 at 22:05

I never claimed that! But it may help to test this issue and may help others to solve problems.

I can't see, why [current-user:name] should not be detected. I use Reggy on OS X to test regular expressions. There are also online solutions like regex101.com. I prepared the regex for privacy detection here: https://regex101.com/r/mE2gG9/2 and it detects every :name token except he term:name.

Edit: I'm not sure how you test the issue, make sure do not miss the /g modifier (match all) in the regex tester.

Comment #30

hass CreditAttribution: hass commented 16 April 2016 at 12:11

I'm not sure how you test the issue, make sure do not miss the /g modifier (match all) in the regex tester.

I have simply inserted the placeholder in the module settings page and pressed save settings button.

Comment #31

25 June 2016 at 14:03

hass committed 6827b21 on 8.x-2.x

Issue #2037595 by hass: Tests if Custom Dimensions token form validation...

Comment #32

25 June 2016 at 14:04

hass committed d85c28e on 8.x-2.x

Issue #2037595 by hass: Tests if Custom Dimensions token form validation...

Comment #33

25 June 2016 at 15:07

hass committed edee98b on 8.x-2.x

Issue #2037595 by hass: Login is required for test

Comment #34

hass CreditAttribution: hass commented 25 June 2016 at 15:15

Status:	Needs work	» Needs review
Issue tags:	-Needs tests

File	Size
Issue-2037595-by-osopolar-hass-Change-regex-on-Term-D8.patch	3.67 KB
8.x-2.x: PHP 5.5 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 5.5 & SQLite 3.8, D8.2 20 pass, 1 fail PHP 5.6 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 7 & MySQL 5.5, D8.2 20 pass, 1 fail PHP 7 & SQLite 3.14, D8.2 20 pass, 1 fail

3 files were hidden/shown/deleted

File	Size
google_analytics-term_name_privacy-2037595-5.patch	519 bytes

google_analytics-term_name_privacy-2037595-13.patch	987 bytes

google_analytics-tokens-allow-term-name-2037595-18.patch	408 bytes
7.x-2.x: PHP 5.3 & MySQL 5.5, D7 15 pass PHP 5.4 & MySQL 5.5, D7 15 pass PHP 5.5 & MySQL 5.5, D7 15 pass PHP 5.6 & MySQL 5.5, D7 15 pass

This one should fail and show the bug in this patch.

Comment #35

25 June 2016 at 15:16

Status:

Needs review

» Needs work

The last submitted patch, 34: Issue-2037595-by-osopolar-hass-Change-regex-on-Term-D8.patch, failed testing.

Issue #2037595 by hass: Tests if Custom Dimensions token form validation...

Change regex on Term ID privacy warning / Allow taxonomy tokens in dimensions

Comments

Child issues

Referenced by