Problem/Motivation

The ability to disable IP logging is important for sites that have already disabled IP logging in their server environment (in order to help protect users from government attempts to identify them by seizing servers or subpoenaing data).

See also:

Proposed resolution

Add a new experimental module to obfuscate IP addresses in logs. This could be based on one of the contributed projects such as ip_anon or cryptolog.

I'd expect the module to be a zero-configuration module similar to BigPipe in the sense it just changes behaviour.

Remaining tasks

User interface changes

API changes

Data model changes

---

Background: The United States government, at least, has demonstrated a desire and willingness to capture large numbers of IP addresses from ISPs or demand swathes of IP address data from web service providers, and also to misuse IP address information to raid people's homes. Most disturbingly, at least one city government has subpoenaed in an attempt to get IP addresses to identify activists and journalistic sources .

I understand from this thread that there are more places than the watchdog that record IP address information, but with watchdog now a required part of core, it must establish best practice for allowing or assisting the disabling of IP logs.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

djnz’s picture

The following code in settings.php should make sure the IP address cannot be logged: I can't see that it would cause any problems.

$_SERVER['REMOTE_ADDR']='0.0.0.0';

Given the simplicity and effectiveness of this hack, is it worth developing, testing and maintaining code to do the same thing?

Crell’s picture

Perhaps it should be added to the settings.php file but commented out (lots of things are) with a "to do X, uncomment this line" comment?

mlncn’s picture

Will do and will report if there are problems, so the line could be added.

It would be nice to stop the *logging* of IP addresses without stopping their *use*. (This is particularly the case with the ability to use IP addresses to guess at location, as done by FolkJam.org).

One question is why is the IP address logged in a bunch of places if it isn't used. Aside from the contact form (moving the number of submissions per hour to a high number is a workaround for its use of IP addresses), where else should we be attentive to possible side effects of setting $_SERVER['REMOTE_ADDR']='0.0.0.0' ?

~ ben :: Agaric Design Collective :: http://AgaricDesign.com

Crell’s picture

Hm. Here's an idea. Could a contrib module set the IP address to an md5 of itself? That way it's still unique, so flood control works, but it's hard/impossible to track back to the original person.

Someone with a more paranoid security mind, would that work? :-)

mlncn’s picture

This is still important.

I hear from the Indymedia Worcester group that Akismet, for one, gets flaky without IP addresses, so for a site that has to allow anonymous content but cannot allow IP logging of its users (the classic Indymedia setup), the settings.php hack to set all IP addresses to naught is not a practical solution.

Constant scrubbing looks like the only current approach to help protect users from intrusion of their privacy.

Really, though, it should be an option to simply tell Drupal core, at least, not to log IP addresses in the first place.

Any thoughts on this or Crell's idea to use one-way encryption of IP addresses?

~ben

People Who Give a Damn :: http://pwgd.org/ :: Building the infrastructure of a network for everyone
Agaric Design Collective :: http://AgaricDesign.com/ :: Open Source Web Development

Christefano-oldaccount’s picture

subscribing

nlindley’s picture

I don't think a one-way encryption would stop the government in this case. The problem is there are only 2^32 IP addresses to go through, so even somebody with a below-average computer could calculate and compare all possible values within a few hours, especially if they're targeting a single user. Maybe there's somebody with more experience with encryption that has a good idea.

By the way, I'm able to calculate md5sums (not doing any comparisons or output) of a /8 subnet in about 3 minutes running on a celeron processor under a xen vm with a PHP script. Obviously those are not ideal conditions for cracking. It's just to point out the government wouldn't take long to map IP addresses to hashes.

Crell’s picture

What about using a semi-random fudge factor? e.g., sha1(floor(time() /3600) . $ip_address)? That would keep the hashed address changing every hour, which would only marginally impact the flood control.

The base problem is that if the site is tracking users (flood control), it has to do so in some unique way. If it's done in a unique way, it's potentially trackable. You'd have to completely disable flood control and a few other things if you wanted a completely anonymous site.

mfb’s picture

Version: 6.x-dev » 7.x-dev

I created an IP anonymizer module -- http://drupal.org/project/ip_anon -- to scrub logged IP addresses on each cron run. The retention period is configurable per table so e.g. you can clear out session IPs immediately but leave IPs in the flood table for an hour.

Since the IPs are still recorded in the database at least temporarily, forensic methods might still be able to recover them from the hard disk. Ideally in Drupal 7 there could be an option to disable IP logging in the sessions, comments, accesslog and watchdog tables. In flood and poll_votes tables IPs are actually useful so I'm not so concerned.

Owen Barton’s picture

What about an optional wrapper function, along the lines of custom_url_rewrite - this has minimal overhead and would allow various approaches in contrib. I don't think a single core approach is possible, because it depends a lot on what the privacy requirements are and how much traffic the site gets (sites with little traffic are harder to anonymize).

We should probably have wrappers for datetimes as well as IPs, because they can also be used to identify users/activities (by matching with ISP traffic logs, for example).

mfb’s picture

Generic hooks would also work, something like comment_invoke_comment($edit, 'preinsert'), which would allow the timestamp and hostname to be altered before it's inserted. Also would be needed for session, accesslog, watchdog tables.

Crell’s picture

Hooks and conditional functions can have unpleasant overhead along the critical path. A handlers-style approach is probably going to be more performant: http://www.garfieldtech.com/blog/drupal-handler-rfc and http://drupal.org/project/handler

Yeah, not in core yet, but for swappable systems that is the way to go.

mlncn’s picture

@Crell

Do you think this is a good feature to use to try to get handler-style swappability in core, or is there another issue out there that should be the standard-bearer for the introduction of this pattern to D7?

Crell’s picture

pwolanin is trying to get a very simplified version of handlers (honestly not handlers, but a factory approach which is the basic idea of handlers) working for #259103: make pluggable password hashing framework more generic and use class auto-loading..

For right now, we could probably get away with the same sort of approach. The benefit is that if you make the logging dohicky pluggable as an object, then you instantiate the object once and just call a method on it each time, you can easily swap out the class (via a variable_get()?) and then once you pay the cost of creating the object once into a static variable every subsequent call to a method of it is virtually identical to the cost of calling a function.

Owen Barton’s picture

Here are a couple of patches that implement very simple (and heavyhanded) IP address masking for Drupal 7 via randomizing REMOTE_ADDR that sites that need this functionality now might use - one patch defaults on and a second defaults off. These are not viable approaches for core though I think so not setting to CNR. Pushing to Drupal 8.x in next comment.

Owen Barton’s picture

Version: 7.x-dev » 8.x-dev

We have some options for pluggable systems in core now, so I think we can think about the interface here in a bit more detail. At it's most minimal I think this looks something like:

interface PrivacyMasking {
    public function mask_ip($ip);
    public function mask_datetime($timestamp);
}

It seems like this could potentially be extended to solve the related problem of anonymizing personal information when generating database dumps for use on local sandboxes (drupal.org has a routine to handle this, as does Drush sql-sync). This would potentially add functions for e-mail addresses, "strings" (e.g. names) and numbers, and perhaps other things. I imagine privacy sensitive sites would maintain one class representing their policy on what can be stored on the live site (with passthrough functions for e-mail addresses etc), and a second more restrictive one for what can be distributed outside of that environment.

mfb’s picture

FYI, I just uploaded Cryptolog module - https://drupal.org/project/cryptolog - which replaces $_SERVER['REMOTE_ADDR'] with an HMAC of the IP address, using a random salt which is stored in memory and regenerated each day.

It of course needs to be executed in early bootstrap, but I packaged it as a module so it can be updated. It is enabled by adding $conf['cache_backends'][] = 'sites/all/modules/cryptolog/cryptolog.inc'; at the bottom of settings.php.

yannickoo’s picture

I will write a small module which sets the $_SERVER['REMOTE_ADDR'] to '0.0.0.0'.

Oh, it is more complicated than I thought. I think we just should customize the ip_address function. That would be just a settings check and depending on the setting we would use the $_SERVER['REMOTE_ADDR'] or just 0.0.0.0

What is a right a place for the setting? Site information?

Crell’s picture

In Drupal 8, this may be more difficult as we'd need to modify the request object early enough, and I don't know if that information is modifiable. Rather, we should just make all IP-tracking-functionality optional to begin with.

yannickoo’s picture

What do you think about the patches from #15 for Drupal 7?

mfb’s picture

@Crell: settings.php loads before the $request object is initialized, so I was able to port my Cryptolog module to Drupal 8.

yannickoo’s picture

But why do we need that module? We could do sth. like in #15 right?

mfb’s picture

@yannickoo rather than setting $_SERVER['REMOTE_ADDR'] to a static IP address or a completely random IP address, Cryptolog sets it to a unique identifier per IP address for a 24-hour period. This allows Drupal's flood control mechanisms for user login and contact forms to function as normal, and makes it possible to do some log analysis such as counting unique visitors per day or tracking down a repeat offender abusing the site.

yannickoo’s picture

But when the ip address is encrypted it is the same behavior like the real ip address. The only difference is that you cannot see the real ip address right?

mfb’s picture

Cryptolog does not encrypt the IP address per se, because it cannot be decrypted. It generates a keyed-hash of the IP address.

The only way to recover an IP address would be if you have the key, then you could use brute force to build a rainbow table of all IPv4 addresses. However, Cryptolog stores the key in memory and regenerates it every 24 hours, so older IP address basically cannot be recovered..

yarco’s picture

This patch (patch in #15) disabled $_SERVER['REMOTE_ADDR'], but $real_remote_addr_do_not_store is really useless.
If you want to get the real ip, you need to make this variable global.

So here is the patch's patch...

==== update ====
Sorry, the patch should be : https://gist.github.com/raw/4274844/d64c41a32c91c9722f304fb50dd510d1a731...
Or you can remove /html in the atachement file.

Owen Barton’s picture

@yarco - settings.php runs in the global context (i.e. not in any class/function), so $real_remote_addr_do_not_store is automatically global. That said, for core coding standards, using an explicit GLOBALS is preferred for the sake of clarity (although exceptions are made for other settings.php variables) - but in this case it is semantically and functionally equivalent.

yarco’s picture

@Owen Barton

I think it is in: http://api.drupal.org/api/drupal/includes%21bootstrap.inc/function/drupa...
We are using d7, but i can't get this value, only make it as GLOBALS (not sure about d8, but according to the api, it is also included the same function.)

Dave Reid’s picture

Issue tags: +Privacy improvements
catch’s picture

Component: watchdog.module » base system
Category: Feature request » Task
Priority: Normal » Major
Issue summary: View changes
Issue tags: +Security improvements

I think we should disable this by default, and also need to make the same change for comments (or remove comment.hostname altogether from core).

We permanently store IP addresses for both anonymous and auth comment authors at the moment. This circumvents any effort to anonymize server logs which this issue was originally about. More importantly if a database gets exposed, then you have every IP address any user has ever posted a comment from.

mgifford’s picture

I do like some of @Crell's idea from 7 years ago of how this would be implemented "to do X, uncomment this line".

Security/Privacy is so much of a bigger issue than it was then. I'd hope with D8 we'd be able to make this easier and give folks more confidence that their IP addresses can be properly & completely masked in Core.

yannickoo’s picture

We absolutely do something here. This is not a big issue and can be solved by adding sth. like $_SERVER['REMOTE_ADDR']='0.0.0.0'; to settings.php.

No core developer things that is issue is important? :)

Crell’s picture

Version: 8.0.x-dev » 8.1.x-dev

Not happening until 8.1 at this rate.

The trend lately seems to be moving stuff to a separate core module and disabling it. We can probably do the same here? (Mangling globals is a bad idea.)

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

dawehner’s picture

Status: Active » Needs review
FileSize
3.09 KB

We can do more, like user agent.

Status: Needs review » Needs work

The last submitted patch, 35: 126197-35.patch, failed testing.

mfb’s picture

for the "privacy" module that generates a random IP, it would be important to flag that flood control will no longer work (for user login, contact form, basic auth and anything else that uses it).

along with https://www.drupal.org/project/cryptolog I also released https://www.drupal.org/project/ip_anon for drupal 8

catch’s picture

I like the hmac approach in cryptolog. I've worked on sites where 'sockpuppeting' was an issue, and that would allow for some protection against that as well regular flood control etc.

Wim Leers’s picture

@mfb: wow, very cool! I didn't know either of those modules! Very cool.

P.S.: 8.x-1.0 of ip_anon is marked yellow because the "Supported" checkbox is not checked at https://www.drupal.org/node/240685/edit/releases.
P.P.S.: I suspect you may want to add the Electronic Frontier Foundation as a supporting organization on either or both?

mfb’s picture

@Wim Leers thanks for the tip.

I wrote cryptolog for EFF but not ip_anon, it would violate our privacy policy to log IPs temporarily :)

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

catch’s picture

Tagging for product manager review. This would be a new (experimental?) module for core so needs sign-off.

I think we should have a separate setting in comment module itself to stop storing IP addresses, afaik that's the only place we store them permanently, and it's directly tied to e-mail addresses and content. Since that's a simple change that's complementary to this one, I've opened #2828793: Stop logging comment IP addresses by default with a patch.

HongPong’s picture

Issue summary: View changes

Added the ip_anon to issue summary

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

yoroy’s picture

Issue summary: View changes
Issue tags: +Needs issue summary update

This would be a new (experimental?) module for core so needs sign-off

It's not clear from the issue summary what "this" would be. I added the issue summary template to help clarify things.

catch’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update

I added a link to cryptolog in the summary and updated the proposed resolution.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

webchick’s picture

We discussed this on today's product management meeting. We are not sure why this needs to be in core. It seems like if we went with a more generic API-based approach in core as outlined in #10 and similar comments, this would allow contributed projects to take a variety of approaches to this problem space for those who need it.

Removing the tag for now, feel free to add it back when further feedback is needed.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

C-Logemann’s picture

Issue tags: +GDPR

As long this can be realized I'm fine with a contrib option. This privacy improvement is something we need in some cases because of special needs of our customers and because of law. So I add the GDPR tag for this.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

mfb’s picture

Seems like this issue could/should be closed if there's not much interest in adding it to core?

I found that Cryptolog middleware has to do some sorta-interesting things to work because of the way that Symfony HttpFoundation dynamically builds all the basic request info - like the HTTP host and request scheme (HTTP or HTTPS), by looking at the client IP address and the HTTP headers coming from trusted reverse proxies. This stuff isn't just figured out and statically cached somewhere in the request, it's dynamically generated on demand. So then if you change the client IP address, this is all short-circuited! HttpFoundation no longer trusts the HTTP headers from the reverse proxy and, if it was relying on any info in the HTTP headers, things could be borked.

However, a middleware like Cryptolog can detect this happening and try to fix things before it passes the request on to other middlewares, so it seems to generally work. It's also possible to setup middlewares in functional tests to check various scenarios.

catch’s picture

Status: Needs work » Closed (duplicate)

I think this can probably be closed as duplicate now that #2828793: Stop logging comment IP addresses by default is fixed in comment module.

mfb’s picture

Idk if there's a privacy / data retention section of the documentation for Drupal 10/9/8? Here's a summary:

Client IP addresses are critical for two things to work in Drupal:

  1. IP address-based flood control, also known as rate limiting; and
  2. If $settings['reverse_proxy'] is enabled, determining if a request came from a trusted reverse proxy and, if so, determining basic information about the request from trusted headers (e.g. the actual client IP address, scheme, host, port).

In addition to requiring access to the client IP adddress, Drupal logs this IP address in several situations:

  1. Session activity (sessions database table). Session data, including IP addresses, is deleted according to the session.storage.options gc_maxlifetime parameter in services.yml.
  2. Log events (watchdog table for database log module, or syslog for syslog module). Database log events are deleted according to dblog.settings row_limit configuration, which is not a time but rather a number of events in the log.
  3. Comments posted, if comment.settings log_ip_addresses configuration is enabled, in which case IP addresses are stored indefinitely.

Contributed modules may also store client IP addresses indefinitely, e.g. as part of a Webform submission or Commerce order.

Several contributed modules can help site administrators implement a data retention policy for IP addresses:

  • IP Anon module scrubs IP addresses from database tables according to configurable retention periods.
  • Cryptolog module provides a middleware that masks IP addresses with ephemeral identifiers, which are generated using a salt that rotates each day. This prevents IP addresses from ever touching Drupal logs and database tables, while allowing flood and trusted reverse proxy functionality to continue working as designed. Once the salt has been discarded, IP addresses cannot be reverse-engineered from the logged identifiers.