I was finding that valid addressfield data was systematically failing to return a geocoded location in some countries (France and Republic of Ireland especially, and some other European countries by the look of it).

The reason is that this module takes a naive approach to producing the address string for the geocoding service - it simply concatenates selected fields, and resolves the country code. But for some countries the order is wrong, and where codes are used for administrative-area, they are not expanded.

However, the point of the addressfield module is to be able to produce correctly formatted addresses on a country-specific basis. Geocoder should use this functionality to create the address string to send for geocoding. Doing this is in fact only a few lines of code, and much simpler than the code already in geocoder!

A second problem I was having is that if an address is too specific, it is not recognised by Google (Google is the only geocoding service I've used here), but if the street address component is removed then a location is usually returned, based on the town and postal code. I've also patched the Google geocoder routine to do this retry if the full address fails.

Patch coming up.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Rick J created an issue. See original summary.

RickJ’s picture

RickJ’s picture

Status: Needs work » Needs review
wranvaud’s picture

Title: Geocoding from addressfield fails in some countries » Geocoding from addressfield fails under some circumstances
Status: Needs review » Reviewed & tested by the community

This also applies to US addresses with glitches like a missing/malformed Street, City, State or a ZIP+4 code instead of just 5 digits. Sometimes it would find the location sometimes it would fail depending on an unpredictable combination of the addressfield fields. A correctly formatted address from addressfield seems to solve most of the issues.

In case some geocoding issues still fall through a validation that notifies the user about the field not being geolocated upon node save would be useful as mentioned in https://www.drupal.org/project/geocoder/issues/2860409

donquixote’s picture

Status: Reviewed & tested by the community » Needs work

I agree with the problem description.
E.g. addresses in Spain fail.

However, the patch (for me) produces an address string without commas.
This has the potential to be ambiguous and produce wrong results.

donquixote’s picture

Priority: Normal » Major

And I would say this is major to critical, if it means that for some countries (e.g. Spain) ALL addresses fail to be geocoded.

donquixote’s picture

Also the patch would create a dependency to addressfield, which afaik does not exist until now.
At least some module_exists() would be needed.

RickJ’s picture

The addressfield formatting code is only invoked on a field of type addressfield, so the addressfield module must by implication be present. There is in fact a module_exists() check as well (see e.g. plugins/geocoder_handler/google.inc). I didn't change any of that logic in this patch.

The purpose here is to geocode using the address as formatted by addressfield, which is presumed to be correct for each country. It's not the job of geocoder to incorporate country-specific rules, it's best they are contained in one place - i.e. addressfield.

However, the output of addressfield's formatter is HTML with line breaks, the tags are stripped so just plain text is handed to the geocoding service. Are you suggesting that the line breaks should in all cases be replaced by commas, would that be more reliable for geocoding?

donquixote’s picture

The addressfield formatting code is only invoked on a field of type addressfield, so the addressfield module must by implication be present. There is in fact a module_exists() check as well (see e.g. plugins/geocoder_handler/google.inc). I didn't change any of that logic in this patch.

Thanks for pointing this out! So this point I made is invalid.

The purpose here is to geocode using the address as formatted by addressfield, which is presumed to be correct for each country.

I don't know. What is "correct" for a human visitor might not be "correct" for Google geocoding API, or some other geocoding service.
Addressfield module does not advertise the output as being suitable for an API. It could very well be, but not by design.

E.g. one difference I would make is to send a country code (e.g. "DE") whereas the visitor wants to see a country name in their own language, e.g. "Deutschland" or "Germany" etc. Currently we send the full country name to the API, but I think we shouldn't.

Besides with the invocation of drupal_render(), the output might now look different per theme. A theme override that is meant to customize the visible output would potentially break geocoding.

It's not the job of geocoder to incorporate country-specific rules, it's best they are contained in one place - i.e. addressfield.

Possibly yes. But ideally, addressfield would have a dedicated function that produces strings for consumption by APIs.
But we could also have a different philosophy: Because different APIs might require different input format, it should be the job of geocoder to produce these strings.

However, the output of addressfield's formatter is HTML with line breaks, the tags are stripped so just plain text is handed to the geocoding service. Are you suggesting that the line breaks should in all cases be replaced by commas, would that be more reliable for geocoding?

For sure this would be an improvement.
RIght now with your patch, we get strings like "Calle Espartero 1035010 Las Palmas de Gran Canaria Las PalmasSpain".
(actually I did not apply the patch 1:1 but reproduced the logic in some other place, but I strongly assume that it will have the same result)
You see there are clear problems due to missing spaces, but also some level of ambiguity due to lack of commas.

Btw, in Google Geocoding API documentation examples they typically use "+" (plus) instead of " " (space).
See also my comments starting at #2135849-17: Incorrect order of fields for geocoding.

RickJ’s picture

My experiments with the Google API (the only one I've worked with) indicate that it does absolutely expect a human-readable address. The API returns the same result as entering the same string in the Google maps search box. There is nothing on Google's page on using the API to suggest any kind of coding of the data is required.

The use of + signs in the example is simply that + is the URL encoding of a space. They will be converted to spaces by the web server before the string even gets to the API itself.

I strongly believe that using the country-specific logic in addressfield is the right way to do this. There is a huge amount of code in there to cater for every country world-wide, and there's a lot of different ways the fields are re-arranged, and codes expanded. Initially Ireland was a particular problem for me, and you can't resolve that one just by tinkering with the order of fields. Hacking code into geocoder on a case-by-case trial and error basis would become a maintenance nightmare.

The slight hitch with using addressfield is that it has no function that returns a formatted address as, say, an array of plain-text strings. It formats directly into HTML in the render function, which is why I had no option but to get the encoded string and strip it. The rendering function is not trivial and would require considerable work to split it.

However, I've added a tweak to put commas in the address string. The rendered address actually has new-line separators, even after stripping, but they probably get lost through the HTML query to Google. So I've added a str_replace() to turn them into commas.

Let me know if this is any improvement.

donquixote’s picture

Sounds good, mostly.

It somehow feels wrong to produce themed html output and then turn this back into a flat string for API consumption.
But if it works, and produces good results..

The rendered address actually has new-line separators,

Does this not depend on theme implementations?
There is a render array, so I assume there is a theme hook involved somewhere, or not?
So the line breaks might not be 100% reliable.

There is a huge amount of code in there to cater for every country world-wide, and there's a lot of different ways the fields are re-arranged, and codes expanded.

Yeah, I had a look at this..
It looked a bit weird: To output one administrative area name, they create an element with #options, where the #options are the available names, but it has no #type like 'radios' or 'select'. The motivation, I think, is to be able to use the same logic for form and display.
It is weird, but I guess it works.

The slight hitch with using addressfield is that it has no function that returns a formatted address as, say, an array of plain-text strings.

Maybe we should post a feature request in addressfield?
I suspect this won't give us quick results, so we should continue with the current approach. But it seems like a reasonable request, if our philosophy is that this is a responsibility for addressfield.

cater for every country world-wide, and there's a lot of different ways the fields are re-arranged, and codes expanded. Initially Ireland was a particular problem for me, and you can't resolve that one just by tinkering with the order of fields. Hacking code into geocoder on a case-by-case trial and error basis would become a maintenance nightmare.

Can you be specific? Can you post some example address strings for different countries?
(I hope you can come up with some artificial addresses, or addresses of public places, that don't violate anyone's privacy but still demonstrate the differences.)

Perhaps you are right and there are too many different cases to cover them all.
But it will help to have a list of some cases, to verify if our solution works.

My experiments with the Google API (the only one I've worked with) indicate that it does absolutely expect a human-readable address. The API returns the same result as entering the same string in the Google maps search box. There is nothing on Google's page on using the API to suggest any kind of coding of the data is required.

Yes, but.
It also does not say anywhere that it will "always work". Just that it will try.
If you enter something in the google maps search box, it can happen that Google gets it wrong due to ambiguity.
What we need to try here is to minimize the chance for ambiguity and wrong results.
So even if the API does happily accept human-readable text, and delivers good results most of the time, does not mean this is the best possible format with the lowest failure rate.

One concern I have about human-readable country names is the language..
I would imagine that in some languages, some countries have names that could lead to ambiguities somewhere.
We can send a language with the request, but this is meant for the response not for the request. Or is it?
Or maybe just send all country names in English then?

Maybe you already thought this through, and my concern is unjustified.
So far, all of this, from my side, is armchair diagnostic :)

RickJ’s picture

It somehow feels wrong to produce themed html output and then turn this back into a flat string for API consumption.
But if it works, and produces good results..

>> The rendered address actually has new-line separators,

Does this not depend on theme implementations?
There is a render array, so I assume there is a theme hook involved somewhere, or not?
So the line breaks might not be 100% reliable.

The new-lines are deliberate - this is the comment and code in addressfield.module:

    // Add a linebreak to the HTML after a div. This is invisible on the
    // rendered page but improves the appearance of address field output when
    // HTML tags are stripped, such as by Views Data Export.
    if ($element['#tag'] == 'div') {
      $output .= PHP_EOL;
    }

I use Views Data Export as well (outputs a CSV file from a view), it produces all its output by taking views-rendered fields and stripping them to plain text. AFAIK there's no other generic way to do it. Address fields come out perfectly, with one CSV cell containing a correctly formatted multi-line address. This is ideal for mail-merge, a single field in a document yields a complete address, correct for any country.

In my patch I'm essentially levering this feature to yield an address suitable for a geocoding service that decodes standardised human-readable addresses. Though it's described as an API it offers no features for submitting data in a structured form, in the end it's just Google Maps search.

Country names are resolved using drupal core includes/iso.inc. Google is pretty adaptable at detecting what language you are talking to it in, and I'm sure internally it works in English by default!

In the end, this kind of geocoding is not an exact science. Google can't locate my own house precisely, which seems to be because its street-view car has never driven along the small side road I live in. It gives the location of a house about 100m away! I'm in a built-up area with thousands of houses, but geocoding to the location of a building appears inextricably linked to street view.

I'm happy that using addressfields's formatting feature is the best that's likely to be possible. I don't personally plan to try to replicate what it does within geocoder.

Pol’s picture

Hello there,

What is the status of this ?
I've a bit put aside the Drupal 7 development to be full on Drupal 8.
Plus, I had some personal projects that took me 3 years to complete and they are now done :) \o/
So, I would like to have a bit of a status on this patch, I will release a new 7.x version very soon.

Thanks!

RickJ’s picture

Status: Needs work » Needs review

Hi Pol

Personally, having written the patch, I'm happy with how it works, and it's a big improvement on what was there before. I can't see a better way of using addressfield data than getting the addressfield module to format it for sending to a geocoder.

I'd like to see it RTBC and committed.

Pol’s picture

Cool, sure, let's get this in then!

I'm doing a cleanup of all the issues, so my previous message :-)

Pol’s picture

Could you please re-roll a patch for the current dev-version ?

I'll commit it asap.

Thanks!

Pol’s picture

Status: Needs review » Needs work

Rick,

Could you please re-roll the patch, I'm unable to apply it as it is.

Thanks!

donquixote’s picture

I would say maybe we should implement some tests?
With example addresses from different countries?
Then we can compare before/after.

I opened #2950419: Address to string, for consumption by APIs in addressfield module.

RickJ’s picture

Status: Needs work » Needs review
FileSize
3 KB

No probs, re-rolled for dev build.

donquixote’s picture

hi,
I want to share a little observation that I made.
It does not change anything for this issue, but is interesting to know.

If you use place names in other languages in the &address parameter in Google, it will guess the language and is right most of the time.
The &language parameter seems to have no effect on this.

https://maps.googleapis.com/maps/api/geocode/json?address=52066,Allemagn...
https://maps.googleapis.com/maps/api/geocode/json?address=Spanyolorsz%C3...
(omitting the &key=, for obvious reasons)
(Spanyolorsz is Hungarian for Spain, I looked it up)

If you put a foreign country name in the &components parameter, then the &language defines if you get results or not.

https://maps.googleapis.com/maps/api/geocode/json?components=postal_code...|country:Allemagne&language=fr
https://maps.googleapis.com/maps/api/geocode/json?components=postal_code...|country:Allemagne&language=de

If you use abbreviated country code in &components=, the language does not matter.

https://maps.googleapis.com/maps/api/geocode/json?components=postal_code...|country:DE&language=fr
https://maps.googleapis.com/maps/api/geocode/json?components=postal_code...|country:DE&language=de

--------

I still suspect that, even if the API is generally designed to understand human language, some formats might provide a higher success rate than others. But I cannot come up with a good example to make it fail.

RickJ’s picture

Just realised that last patch #19 was buggy, the resulting code contained a syntax error (not sure how that happened!).
Here's a corrected version.

Summit’s picture

Hi, would love to test this. What sort of geocoding should work, and what should fail?
greetings, Martijn

earthmanweb’s picture

Hi. Did this get reviewed / approved?

Does it work as it is?

RickJ’s picture

Hi

It looks like @Pol was ready to commit it a couple of years ago, subject to a re-rolled patch, which I provided. Nothing's happened since though!

The patch (#21) has been working fine for me since then. I suggest giving it a try. Please feed back any problems, as well as if it's successful.

Thanks

RickJ’s picture

FileSize
3.03 KB

I've re-rolled the patch for compatibility with 7.x-1.7.

Any chance of committing this? I thought it was going to be done some while back.
Thanks.

dqd’s picture

Well made issue report and provided patches @RickJ. +1 and Thanks to @Pol keeping an eye on this important issue.

Sadly I am afraid this issue has not been reviewed enough the last year again nor added to the 8.x dev branch yet?

I wondered why so many users asked me about that their geocoding fails randomly over the last time. I thought it was caused by wrong input like UA, API keys etc. Never looked at a possible conversion issue between address(field) and geocoder until now. *facepalm*

Summit’s picture

Hi,
Would love to use this also in my migrated drupal 9 site!
Thanks for efforts until now!
greetings, Martijn

dqd’s picture

Apart from that we always should be aware of that we can't influence how geocoding libraries and their service providers handle the requests. It is like hunting the white rabbit. I realized often very inexact results. Manually adding data is mostly more exact.

dqd’s picture

Title: Geocoding from addressfield fails under some circumstances » Geocoding from address field fails under some circumstances
Version: 7.x-1.x-dev » 8.x-3.x-dev
Assigned: RickJ » Unassigned
Status: Needs review » Needs work
Issue tags: +Needs backport to D7

If this issue still persist we should move it forward. Since Drupal 8 EOL is up to and Drupal 7 delayed EOL is following in the next year we should set this issue to latest dev and port back in case if it is still required to fix this in Drupal 7 also. This way the chance is higher that people are motivated to work on this.

I am also not sure - after thinking about it a little bit more - if the assumptions made in the issue summery is correct:

The reason is that this module takes a naive approach to producing the address string for the geocoding service - it simply concatenates selected fields, and resolves the country code. But for some countries the order is wrong, and where codes are used for administrative-area, they are not expanded.

Sadly this issue is 3 years old and the reporter and patch provider is possibly not interested no more in this. But I would like him to elaborate a little bit more on if this patch is just fixing it for him and "his countries" and his use cases and makes it possibly troublesome for others or vis versa. Apart from that I would like to invite others to chime in, so I unassigned him until he comes back.

RickJ’s picture

Hi all, I'm the OP. I certainly am still interested in this issue, the module, with my patch, is a key elelement of my site. It's been working fine since I posted, I've not needed to make any changes.

Re. diqidoq's query on whether this was just a quick fix for me, the answer is definitely not. In fact it's the original code that's a quick hack that only works for certain countries.

I wanted to find a generic solution, and it was clear that all the rules for address formatting by country are embedded in the address-field module. Any attempt to replicate these rules elsewhere is not only pointless, but also a maintenance headache. Formatting requirements change from time to time, and it's reasonable to assume the maintainers of address-field will be on the ball.

I obviously haven't tried every country as I don't have the data, but I've tried several European ones where I do, and it all worked. Address formatting across Europe is more varied than you might think!

dqd’s picture

Thanks for coming back on this. As I sad in my comment some days ago above: Well made issue report and provided patches @RickJ. +1

I just had to make sure that we review this patch in 2021 and to get some thoughts of you on this now 3 years later. I would appreciate a reroll against latest dev if you have any chance. This could possible cause interest in other users to chime in for review and maintainers too.

RickJ’s picture

Hi @diqidoq

I originally raised this issue against D7, but I see the issue version has been changed to D8. My patch applies to 7.x-1.7, which is also the latest 7.x dev build.

I'm still running my site on D7, using a lot of modules and custom code, which is going to make upgrading a major headache that I'm still ducking! So I'm not in a position to create a patch for a D8/9 build - sorry! If the D8/9 code is directly derived from D7, the code changes are in fact quite simple.