I'm starting to work on distance searching/views integration on Geofield (see #1469956: [Meta] - Improve Views-powered Geofield proximity searches), and it occurred to me that an exposed filter doing geocoded views searches is likely to run into API limits pretty quickly. Under our regular use (saving geocoded data to geofield), this isn't really an issue, but I don't really have a place to store data with views filters. By utilizing Drupal's caching system, this should be relatively easy to do.

Ideally, I think we could make an option that states whether or not we cache a result. The data probably should be segmented to it's own table (cache_geocode?). For fields, it's probably not worth the effort, but makes sense if we need it for other purposes.

One thing that might be an issue is various providers Terms of Service. Looking at Google's for example (https://developers.google.com/maps/terms#section_10_1_3), they forbid caching results unless you're doing it for performance reasons, which I think could be reasonably argued for something like a views filter search.

Comments

phayes’s picture

Good idea.

What we should do is base the cache-keys off the md5 hash of the incoming data (that way it works on all data, including KML files and the like). Then the results should be a gzipped serialized geoPHP object.

geocoder function should take a $cache parameter, and an option should be added to geocoder widget settings.

phayes’s picture

Status:Active» Needs work

I've added static caching, we still need to add persistent-caching and make it configurable in the UI

sdboyer’s picture

i'll just weigh in quickly and say +1 to having a separate cache bin, e.g., cache_geocode. in fact, i might even go so far as to recommend *multiple* separate bins, given that different providers have different terms of use. that way you can easily create tailored clearing logic for each provider.

you COULD utilize key prefixing and wildcard clears...but wildcard clears aren't necessarily the sanest thing on all cache backends (*COUGH* memcache), so bins are still the cleanest way to segment. that or cache tags, but you can't rely on those until D8 - they're now in core, iirc.

phayes’s picture

Relevant IRC discussion cut and paste:

[09:19am] sdboyer: and if you end up agreeing with my point in http://drupal.org/node/1515372#comment-5970438, there'd like be even more
[09:19am] Druplicon: http://drupal.org/node/1515372 => Caching Results => Geocoder, Code, normal, needs work, 3 comments, 2 IRC mentions
[09:19am] sdboyer: ah well, guidance i can do, but i can't promise a patch
[09:19am] sdboyer: sorry :
[09:19am] sdboyer: :(
[09:21am] phayes: Hi sdboyer - interesting idea about having different cache bins - I was thinking we would just do prefixing...
[09:22am] sdboyer: yeah prefixing has its limitations
[09:22am] sdboyer: cache tags really are the ideal solution to your issue here, but as i said, can't rely on em till 8
[09:22am] sdboyer: you might get a different answer from someone else, but this to me looks like a valid case for multiple cache bins
[09:23am] phayes: Makes the code much tricker - especially given that additional handlers can be added dynamically by other modules...
[09:23am] sdboyer: might seem a little weird, but if it helps users play nice with a providers TOS with minimum interaction...then yeah
[09:23am] sdboyer: well
[09:23am] sdboyer: if you wanted to do multiple cache bins, my approach would be
[09:23am] sdboyer: have a cache_geocode which is used if no other logic is provided
[09:24am] phayes: Here's what we should do -- allow plugins to define a cache-bin  -- if non is defined in the plugin, we use a general cache_geocode
[09:24am] sdboyer: when cache clearing routines run, you check to see if the logic for that provider has the appropriate function for cache management. if not, use generic logic
[09:24am] sdboyer: yep
[09:24am] sdboyer: exaclty
[09:25am] phayes: That way all the "local" handlers (kml, gpx, etc.) can just use cache_geocode
[09:25am] sdboyer: yep
[09:25am] sdboyer: it's the kind of thing that is more elegantly expressed if you have an abstract parent for the generic logic (which is part of why i bring up the OO question)
[09:25am] sdboyer: but it's eminently doable in procedural as well
jenlampton’s picture

I was just testing the static cache, and wasn't able to get persistant caching working - but it looks like that's just because it's not done yet. :)
Let me know what I can do to help move this forward.

jenlampton’s picture

I'm going to take a stab at setting up a cache_geocoder table and storing results in there for views-exposed-filter searches.

*edit* fixed typo, yes cache_geocoder not cache_geocode :)

phayes’s picture

Assigned:Unassigned» jenlampton

@jenlampton++ !

Oh, be sure to call it cache_geocoder (not cache_geocode)

steinmb’s picture

We also interested in some kind of cache. We are currently working on migrating data from an older systems into Drupal. One of the external systems contain addresses that we store on the Drupal-side in addressfields and then let geodecoder decode these into geofield. We rather quickly run into 'OVER_QUERY_LIMIT' at Google.

The migration (migrate module) are getting rerun every time we change the code but the addresses are 'always' the same so hitting Google this hard does not make any sense. I thought about putting a proxy cache between us and google with a looong static cache, but solving it here would be a much better, and smarter solution.

emilyf’s picture

I'm in the same situation as steinmb. Importing loads of content in via a feed importer, not even including a location field in the import but there is an addressfield and geofield in the node type that I'm importing into. Every import triggers 10,000+ OVER_QUERY_LIMIT errors...

Let me know if there is anything I can test.

nd987’s picture

Big +1 on this. Is there any progress towards persistent caching? Seems like a must-have feature to me, especially for those of us who do batch imports daily. Once you have the address geocoded, it ain't going anywhere, so you can cache it indefinitely.

Dean Reilly’s picture

I would like to see some discussion of using a proxy to cache responses from services instead of implementing this in geocoder. It seems to me that this approach might have some advantages:

  1. Firstly, it would be less code that would have to be written and maintained, which is always a good thing.
  2. Secondly, the http cache system is mature and flexible. It will respect whatever caching strategies the service suggests in headers or can be overridden in most cases by the user. It will handle expiry of the stale caches and if the service supports etags can even avoid re-requesting data which hasn't changed.
  3. Thirdly, it makes the cache more easily usable by other applications or modules which need to integrate with the service api.
nd987’s picture

The problem with relying on a proxy to cache geocode results is...you have to have a proxy. While the advantages @Dean Reilly posted are valid, most developers will not have or be able to setup (due to knowledge, hosting, business, or other limitations) an http proxy for this purpose. It could definitely be supported, but we need something that is easier and more trouble-free for the average developer.

Dean Reilly’s picture

Hosting limitations could be an issue. However, I'm not sure how many people would be running a site that would be making 50,000 geocoding requests a day (Yahoo!'s limit), and be running on a service where they don't have the necessary access to set up a proxy.

Exceptions might be sites hosted on a PaaS such as Acquia and Pantheon. But these are hosted on the Amazon and RackSpace clouds respectively. The user can always spin up an instance in the same AZ and configure that to act as the proxy.

There's also an argument to be made for the performance considerations of this module on smaller sites without proxies. I'm not sure the technical costs justify the benefits in this case either, though. Smaller sites on shared hosting probably have bigger performance bottlenecks than a geocoding api. (Although that is just a gut feeling and I have no numbers to back up that statement.)

I also believe the average developer could go away, learn how to and proceed to set up a proxy server fairly quickly.

-------------------------------

I'd like to point out I'm not terribly opposed to the idea all things considered but I do want to point out that there are alternatives and pros and cons to each.

I also think the issue regarding hitting the API limits probably suggests a queuing system which can spread out the load is also required as caching will only take us so far.

nd987’s picture

Keep in mind that Google's API limit is only 2500 per day (for non-business clients). Of course developers could setup a proxy if they really had to, but why make them? Adding a regular caching layer would involve:

  1. Creating a Schema for the cache table in the .install file (just copies standard cache table schema)
  2. When an address is geocoded, wrap the call in a cache_get / cache_set, using the above table name as bin, with a hash of the address as cache key
  3. There is no step 3
Dean Reilly’s picture

Step 3 would be implement different caching rules to comply with the different terms and conditions of the various services. See replies #3 and #4.

------------------

To switch sides quickly, I'd like to give some of the reasons I think it would be a good idea for geocoder to cache results even if a proxy is being used.

I think the big advantage to having caching happen at the api level would be that it's agnostic of the service. So we could make use of multiple service end points and have the cached results of one prevent requests to another.

We can also be smarter with the cache. We can provide caching up to a variable level of resolution. E.g. are we only plotting addresses up to the level of a city to provide anonymity to users? Then we don't need to geocode two different addresses from the same city for different users.

Finally, APIs aren't always brilliantly implemented. The ToS may be much more permissive towards caching than the http headers suggest. Sure, you can configure the proxy to override these headers but that then means lots of people doing the same job. Easier to have it done by geocoder.

------------------------------

It's probably also worth keeping an eye on the pluggable http client issue for D8. This would allow us to use a tool like Guzzle which has support for HTTP caching built in and allow people to get the benefits of a proxy without having to install one.

jpstrikesback’s picture

What if step 3 was simply an opt in on a config page?

drunken monkey’s picture

I've now taken a first shot at implementing this, for now without a config page, but with the caching variant overridable (in code) for each handler. The only problem is: it doesn't work. Seemingly, the PHP serialization functions have some problems with Geometry objects (or at least Points). Direct serialization and unserialization like in

<?php
$point
= unserialize(serialize($point));
?>

seems to work, but as soon as there is some external storage involved (be it a database or just a file or the clipboard), it fails and only returns an __PHP_Incomplete_Class object. Base64-encoding or similar tricks also don't work, as far as I can see.

So, to get this even basically started, without even considering the ToS and configuration problems, we will have to solve this. As far as I can see, we have the following options there:

  1. Someone smarter than me knows of a way this can be done, or discovers an error in my tests.
  2. We implement our own serialization layer for Geometry objects.
  3. We fix the problem elsewhere (i.e., in geoPHP, probably).

Any ideas or comments? What do you think of my approach to the other problems? What we could easily add, too, is some options on the config page to disable caching for some (or all) handlers.

jpstrikesback’s picture

I'm gonna test this and play the moment I get a second, unfortunately that could be 2 weeks from now, but dang it's nice to see this thread lives!

attiks’s picture

Status:Needs work» Needs review
StatusFileSize
new4.22 KB

This should work, I added geophp_load(); to geocoder_cache_get so the classes are known for the unserialize.

drunken monkey’s picture

StatusFileSize
new4.2 KB

Wow, great! Thanks for your help, I was completely lost with that one.

I don't think it's necessary to serialize the objects ourselves, though. Just move the geophp_load() call above the cache_get() and it's fine.

Other reviews would also be very welcome, suggestions how to proceed with the configuration, etc.

attiks’s picture

Status:Needs review» Reviewed & tested by the community

#20 works as well

drunken monkey’s picture

StatusFileSize
new4.2 KB

Re-roll of #20.
(Only whitespace changed, so I don't think we need a new review.)

drunken monkey’s picture

Status:Reviewed & tested by the community» Needs review
StatusFileSize
new4.54 KB

Improved the patch in several ways regarding the cache_function callbacks.

  • Since there is only one callback for both get and set, we should also pass a $geometry argument when getting the cache value (in this case, NULL).
  • Pass additional parameters $processor, $data and $options to the caching callback. We don't know whether custom cache implementations will use the same cache ID we create.
  • Don't return values from geocoder_cache_set().
Sinan Erdem’s picture

Does anybody work on this issue? It would be nice to see the patch committed to a version.

drunken monkey’s picture

From the looks of it, I'd say nobody is working on this module.
See also #1563648: Taking a break from geocoder development. Some new developer would be needed – otherwise, you'll just have to hope phayes returns soon.

mikey_p’s picture

Just a note, that while this patch looks great, the widget that geocoder exposes for geofield and location doesn't use this function and thus doesn't benefit from this caching.

I'm not sure what the simplest solution to this is, outside of re-thinking the plugin system and possibly moving to an object oriented structure.

mikey_p’s picture

Issue summary:View changes
StatusFileSize
new11.85 KB

Here's a slightly hacky patch that allows geocoding through the field widget to be cached as well.

arbee’s picture

I just want to thank you all for your excellent work. So glad I stumbled upon this, I was having query limit problems with a Views proximity search and this looks like it's going to take care of it.

SocialNicheGuru’s picture

I am getting this now though:
Notice: Undefined index: cache in geocoder_widget_get_field_value() (line 283 of geocoder/geocoder.widget.inc).
Notice: Undefined index: cache in geocoder_widget_get_field_value() (line 294 of geocoder/geocoder.widget.inc).

quicksketch’s picture

Assigned:jenlampton» Unassigned

Let's unassign @jenlampton. :)

SocialNicheGuru’s picture

If I add this patch in #27 then cache tables for geocoder are not created when I install using Aegir.

rudiedirkx’s picture

Status:Needs review» Needs work

Doesn't apply to dev correctly anymore. Dev fixed a bug with multiple value fields, but this patch reintroduces it, because of

if (!isset($geometry)) {

so that part should be patched/merged manually.

rudiedirkx’s picture

Status:Needs work» Needs review

Re-roll of #27. With caching, without the multiple values bug.

Jelle_S’s picture

@rudiedirkx I think you forgot someting ;-)

rudiedirkx’s picture

What's that?

Jelle_S’s picture

I think you forgot to attach the patch in #33 ;-)

rudiedirkx’s picture

StatusFileSize
new10.75 KB

Haha crap. Eeeeeeh let me find that one....

Usually I remove my local patches, because "they're on d.o anyway!". Not this time thankgod!

kerasai’s picture

Hi there, sorry to step on anyone's toes but this seems like an exceptionally complicated way to solve a fairly simple problem.

For a recent project build, I rolled a module that provides a geocoder_handler plugin that is essentially a wrapper for the existing Google plugin, and adds the caching at that layer. No need to touch any of the other parts of Geocoder and the integration with fields and Views just keeps on doing its thing.

https://www.drupal.org/sandbox/bmoresafety20/geocoder_google_adv

Take a look, please ping me with questions or input.

rudiedirkx’s picture

@kerasai But you don't want to just cache google. Or any other one handler. You want to cache ALL handlers, depending on a field's config. I agree it's overcomplicated, but it works perfectly and we REALLY need geocoder caching, so let's get it in!

(It's probably not a good idea to serialize and persist Exception objects. I don't KNOW, but it seems wrong... And why do you save into geocoder_google_adv and into cache_geocoder_google_adv?)

sheijtel’s picture

The patch from #23 works fine when just using the geocoder API, great! Patch of #37 does not include the geocoder.install file which includes the schema

rudiedirkx’s picture

StatusFileSize
new12.23 KB

#23 has the bug described in #32

Attached patch should include everything. Thanks @Steven. Please try that one.

blacklabel_tom’s picture

Hi Rudie,

The patch in #41 won't apply for me against the latest dev. I applied it manually and I couldn't get any results through from the module.

A first glance the cache check here is a bit too aggressive. By default $cache_reset is FALSE and $cache_type is 2 meaning ALL requests will look in the cache before doing the lookup regardless of if the query is run for the first time or not.

<?php
if (!$cache_reset && $cache_type > 1) {
+   
$geometry = geocoder_cache_get($cache_id, $processor, $data, $options);
+  }
+  else {
+   
$geometry = call_user_func($processor['callback'], $data, $options);
+    if (
$cache_type > 1) {
+     
geocoder_cache_set($cache_id, $processor, $geometry, $data, $options);
+    }
+  }
?>

I'm assuming this check needs to look if this location has been cached rather than the caching settings the function defaults to?

Cheers

Tom

blacklabel_tom’s picture

Status:Needs review» Needs work
rudiedirkx’s picture

That's weird. Applies fine for me:

rudie@home:geocoder$ patch -p1 --dry-run < patches/geocoder-1515372-41.patch
patching file geocoder.install
patching file geocoder.module
patching file geocoder.widget.inc

0 lines offset even.

I don't know about the actual code. It's weird that there are levels of cache checking, I agree, but I haven't scrutinized the code. I skimmed it, tested it and it worked. It's not perfect, but it's been over 3 years now and we really need caching.

You're very welcome to improve on it, or even start over, I'll review and test and RTBC when I'm happy =) but I don't think there'll be anything committed in 2015 then =(

blacklabel_tom’s picture

Hi Rudie,

I'm going to leave this one well alone as I'm tied to the 1.2 branch at the moment.

Cheers

Tom

Pol’s picture

Status:Needs work» Reviewed & tested by the community

Patch #41 applied and working flawlessly.

pol@x230 ~/git/geocoder $ wget https://www.drupal.org/files/issues/geocoder-1515372-41.patch
--2015-07-27 09:02:14--  https://www.drupal.org/files/issues/geocoder-1515372-41.patch
Resolving www.drupal.org... 93.184.220.99
Connecting to www.drupal.org|93.184.220.99|:443... connected.
HTTP request sent, awaiting response... gi200 OK
Length: 12519 (12K) [text/plain]
Saving to: ‘geocoder-1515372-41.patch’

2015-07-27 09:02:15 (26.2 KB/s) - ‘geocoder-1515372-41.patch’ saved [12519/12519]

pol@x230 ~/git/geocoder $ git apply geocoder-1515372-41.patch
pol@x230 ~/git/geocoder $

I have a question though, are we obliged to use the static cache ?