The company MaxMind.com[1] has agreed to release their cities of the world database under the GPL. The database contains locations by country, city, latitude and longitude. There are over 3,047,000 records in the database. For those of you who have tried the location.module with the zipcodes database from CivicSpace, you will recognize how cool it is and how well this fits with that project and therefore Drupal.

In the sample below you will see the main characteristics of the data. There are two variants of city, one with lowercase letters and only ASCII characters. I assume this is intended for search purposes. The city_accented column contains full city names including accents and special characters.

How we use this data is open for discussion. It is different in nature than the zipcodes database due to the absence of zipcodes, which are used as primary keys in the Civicspace implementation. Also missing are timezones.

Questions like how we fill out the data to include these missing elements, how we maintain this data, how we structure it in respect to supporting only subsets of it for regions or groups of countries, these are all topics for discussion. In the end, modelling location data is tricky.

With data like this a whole new class of applications becomes possible. Thank you MaxMind for the generous contribution!

UPDATE: I have removed the cities.zip from my sandbox. Please obtain the data directly from MaxMind.com:
http://www.maxmind.com/download/worldcities/

Sample data:

# Cities of the world database
#
# Graciously donated to the public by MaxMind.com
# http://www.MaxMind.com
#
# Table structure for table `cities`
#

CREATE TABLE cities (
  country char(2) NOT NULL default '',
  city varchar(30) NOT NULL default '',
  city_accented varchar(30) NOT NULL default '',
  latitude decimal(10,7) NOT NULL default '0.0000000',
  longitude decimal(10,7) NOT NULL default '0.0000000'
) TYPE=MyISAM;

INSERT INTO cities VALUES ('ad', 'aixas', 'Aixàs', '42.4833333', '1.4666667');
INSERT INTO cities VALUES ('ad', 'aixirivali', 'Aixirivali', '42.4666667', '1.5');
INSERT INTO cities VALUES ('ad', 'aixirivall', 'Aixirivall', '42.4666667', '1.5');
INSERT INTO cities VALUES ('ad', 'aixirvall', 'Aixirvall', '42.4666667', '1.5');
INSERT INTO cities VALUES ('ad', 'aixovall', 'Aixovall', '42.4666667', '1.4833333');
INSERT INTO cities VALUES ('ad', 'andorra', 'Andorra', '42.5', '1.5166667');
INSERT INTO cities VALUES ('ad', 'andorra la vella', 'Andorra la Vella', '42.5', '1.5166667');
INSERT INTO cities VALUES ('ad', 'andorra-vieille', 'Andorra-Vieille', '42.5', '1.5166667');
INSERT INTO cities VALUES ('ad', 'andorre', 'Andorre', '42.5', '1.5166667');
INSERT INTO cities VALUES ('ad', 'andorre-la-vieille', 'Andorre-la-Vieille', '42.5', '1.5166667');
INSERT INTO cities VALUES ('ad', 'andorre-vieille', 'Andorre-Vieille', '42.5', '1.5166667');

[1] http://www.maxmind.com

Comments

robertDouglass’s picture

I just read that the folks at Civicspace[1] are also aware of the huge potential offered by MaxMind's databases. In particular, they have thier eyes on the GeoIP database[2] that would allow you to know which country your visitor is in based on IP (with caveats). After reading Vincent de Lau's HOWTO[3] for importing the GeoIP data into MySQL I realize that the SQL code in my sandbox is not needed so I'll be deleting it.

One thing is clear; with Drupal's recent move towards elegantly handling location data and this immense body of free data from MaxMind, real power is soon to be released to the Drupal community!

- Robert Douglass

[1] http://civicspacelabs.org/home/node/12117
[2] http://www.maxmind.com/app/csv
[3] http://www.delau.net/php/geoip.html

-----
www.robshouse.net
www.webs4.com

Steven’s picture

It's a pity that the supposedly 'native' names are limited to the ISO-8859-1 (Latin-1) character set and thus contain transliterations for quite a lot of entries as well as missing a bunch of accents for eastern European names.

You'd also need to convert to UTF-8 to use this in Drupal/CS.

Still a nice db, but it could be better ;).

--
If you have a problem, please search before posting a question.

harald.walker’s picture

What if based on GPS coordinates (more and more people have GPS devices) one could display the nearest town or a node with more information (based on a matching taxonomy term)?

Some people use GPS for photo tracking and Geocaching is also very popular.

A Brief Introduction to GPS Photo Linking

Geocaching

ZIP codes are unreliable because they might change over time. (anyone care for ZIP codes from former East-Germany?!)

Based on longitude and latitude you can create a map with linked data like this:
http://geoimages.berkeley.edu/wwp1204/map/index.html

If you don't have GPS and would like to know a geographic location, you can use an online mapping service like: http://www.multimap.com

I am
here.
:-)

killes@www.drop.org’s picture

I agree that ZIP codes aren't the best solution from a technical point of view. From a user's point of view they make a lot of sense, though. Most people know their ZIP code. The change of the German ZIP codes (both east and west) about ten years ago will probably be the last untill a unified European system is introduced in the distant future.
--
If you have troubles with a particular contrib project, please consider filing a support request. Thanks. And, by the way, Drupal 4.6 will support PHP 5.

kbahey’s picture

Just recently, Google added satellite images to their excellent http://maps.google.com

For example here is a satellite picture of Niagara Falls. Click on the "Map" link, and you get a normal map. You can also get directions, or search for nearby restaurants, ...etc.

It only covers North America, and some areas have hi-res satellite imagery, others do not. But still really cool.

--
Consulting: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

iraszl’s picture

Wow! Excellent news. Congrats! Can't wait to see what can be done with this.
---
http://creativebits.org

bertboerland’s picture

I am interested to know why maxmind.com is releasing this data under the GPL. This data -combined with subject data- is worth something on the market. Also I would like to know if they really own all this data, the can offcourse only make this public if it was theirs in the first place. They own all the data and no data was bought under some restricting conditions?

Having asked that, I do appreciate their contribution and it releases a great potential since location (based services) might be YAKA4D (yet another killer app for drupal)

Can we update the data as well? And arent the two option listed on http://www.maxmind.com/app/geoip_country "illegal". Since the "free" data seems to be the core of the paid data, the paid data should be under the same license hence redistribution?

Dont get me wrong, I do think this is a *great* service.
--
groets
bertb

--
groets
bert boerland

killes@www.drop.org’s picture

Such data is actually easier available then most people think. Look here for example:
http://earth-info.nga.mil/gns/html/cntry_files.html
Conversion to Unicode (don't know which flavour):
http://earth-info.nga.mil/gns/html/gns_faq/gns2unicode/gns2unicode.html

There are no licensing requirements or restrictions in place for the use of the GNS data. However, we recommend using the following citation to identify the GNS as a source:[...]

--
If you have troubles with a particular contrib project, please consider filing a support request. Thanks. And, by the way, Drupal 4.6 will support PHP 5.

akhan’s picture

Hello there,

I am considering using your ‘Names Files of Selected Countries’ in one of our web development projects.

I do however have some questions in understanding ‘GNS-to-UNICODE Character Mapping Tables. It is my understanding that the field ‘ADM1’ in the country file refers to ‘State Codes’ in that country but no definition/description of states are defined in the country file nor in the character mapping table. I may have completely missed it. May I ask you to help me find this information? I would be grateful for your help.

Thanks.

killes@www.drop.org’s picture

Those aren't 'my' files in any way. All I know about those files (and it isn't much) is on their website.
--
Drupal services
My Drupal services

robertDouglass’s picture

I wrote MaxMinds and asked:

In particular, is it suitable for use in an open source project under the GPL (Gnu public license)?

They responded affirmatively, as long as they are cited as the source of the data.

- Robert Douglass

-----
www.robshouse.net
www.webs4.com

killes@www.drop.org’s picture

As long as

is not a valid modification of the GPL.
They can of course put such a notice into the bundled files themselves as indicated in the licence.
--
If you have troubles with a particular contrib project, please consider filing a support request. Thanks. And, by the way, Drupal 4.6 will support PHP 5.

robertDouglass’s picture

sorry Gerhard, I didn't catch what your point was. Could you clarify?

- Robert Douglass

-----
www.robshouse.net
www.webs4.com

sandip-1’s picture

I guess he means that once an entity releases something under GPL, it cant impose additional conditions - in other words you can't modify GPL and still call it GPL.

In this case, the additional condition being compulsory attribution(advertising clause?) as you just mentioned above.

- Sandip

-----------
Sandip Bhattacharya
http://www.sandipb.net

robertDouglass’s picture

- Robert Douglass

-----
www.robshouse.net
www.webs4.com

StewardManscat’s picture

Just stumbled into this. Exciting. Been looking for something like this, (and astounded at the price tags). Now to get my hands dirty. Uh-oh. Missing piece of puzzle. Hope this is the place to begin...

The database has a region code which I cannot make sense of.

For instance, the city database conatins these entries for 'Abbotsford':

country city city_accented region latitude longitude
AS abbotsford Abbotsford 02 -33.8500000 151.1333333
AU abbotsford Abbotsford 02 -33.8500000 151.1333333
CA abbotsford Abbotsford 02 49.0500000 -122.3000000
CA abbotsford Abbotsford 10 45.4333333 -72.8833333
GB abbotsford Abbotsford 00 55.6000000 -2.7833333
NZ abbotsford Abbotsford 00 -45.8833333 170.4166667
US abbotsford Abbotsford WI 44.9463889 -90.3158333

The Abbostford I want is in Canada. Using the awesome little tool at multimap, I find the answer (langitude and lattitude) are given by country=CA and region=02

I can ask a user to select a country. But where does this region code come from? I thought I was happy when I discovered geoipregionvars.php, but this gives me a region of 'BC' under CA for the area of Abbotsford.

Am I making sense? Can anyone toss me a clue?

My application: simple dating stuff. Everybody has to enter a location, which presumably I can translate into lat/long.

At search time, choose a country, then a city or region, and poof, we should be able to return all users within X furlongs (yeah, that's another issue) of the given point.

What is the region code, and how can I make use of it to bridge the gao between countries and cities?

Many thanks in advance.

Notice, in the first post here, the region code has not even been incorporated into the table...

robertDouglass’s picture

When I first posted I was under the apparently false assumption that the region code wouldn't be needed to uniquely identify a city. Unfortunately it isn't readily know what the region code is and how to map it to meaningful strings. In the first post I was aiming more at making this data match the format of the zipcodes database from Civicspace, but that now doesn't seem like a very interesting goal.

Make sure to check out the location module, it has functionality that returns points within a certain radius of a point (rather a box around the point).

- Robert Douglass

-----
www.robshouse.net
www.webs4.com

allyak’s picture

After some digging about on the MaxMind website, I've found a table that translates region codes in the cities database to state / region names.

USA & Canada: http://www.maxmind.com/app/iso3166_2
Rest of the World: http://www.maxmind.com/app/fips10_4

Hope that helps everyone.

Ally

robertDouglass’s picture

Since we already have all of these regions for most countries anyway, this will be very helpful in efforts to make the individual country databases.

- Robert Douglass

-----
http://www.hornroller.com/French_Horn_News
www.robshouse.net

shyam2053’s picture

Have there been any successes with country databases?

I am trying but no luck

paul_cool_fl’s picture

Does it look like the one displayed here - http://www.salpure.com/weblocations-countries-database-cities-database-d...
I want the data in country->province->city format .. Any idea how can I convert this database into that weblocation one?

RichieB’s picture

This database is great, especially since it is free, but it is rather conteminated with many duplicates. About 10% of the 3 million cities are in fact duplicates (same country, city and region), but with (slightly) different longitude and latitude data. About 4000 cities are exact duplicates (same longitude and latitude as well).

You can see one in the small sample at the original article at the top of this page:

INSERT INTO cities VALUES ('ad', 'aixirivali', 'Aixirivali', '42.4666667', '1.5');
INSERT INTO cities VALUES ('ad', 'aixirivall', 'Aixirivall', '42.4666667', '1.5');

It is not too difficult too weed out the duplicates, but how does one decide which of the duplicates to keep?

I'd like to santize the file and repost it. But where? Am I allowed to do this?

I'm afraid I was made happy with a dead sparrow (as we Dutch like to say).

robertDouglass’s picture

I would normally say in the sandbox but I'm still not clear on how the latest sandbox rules (guidelines) are shaping up.

One thing is clear; as long as you give MaxMind.com credit for their work on the original, they have granted permission to use the data. You might send them a copy of your sanitized version.

As for choosing the right lat/long for duplicates where they are otherwise identical, I suggest you make a list of the occurrances and perhaps there is a mapping service (Google Earth?) that can help us confirm them. Just an idea.

- Robert Douglass

-----
Rate the value of this post: http://rate.affero.net/robertDouglass/
I recommend CivicSpace: www.civicspacelabs.org
My sites: www.hornroller.com, www.robshouse.net

allyak’s picture

ESRI, a US-based outfit that are leaders in mapping software, offer a web-based mapping service. Even better for us, they offer a free, public set of SOAP-based web services for non-commercial use.

This set of tools includes:

  1. an address finder, for converting address strings (e.g. "23 Lala St, Austin, TX) into lat-lon coordinates;
  2. a place finder for converting city names into lat-lon coordinates (this is what is being asked for in the above post).
  3. a route finder between two locations, and
  4. a 2-D web mapping service

Although it'd be bad karma to use their services to copy their location database, I can't see any problems with using it to verify / clean up the MaxMind database, especially when trying to pick between two duplicates.

There is some excellent documentation for these web services at http://www.esri.com/software/awspublicservices/index.html but given my limited experience with SOAP, and my limited time, I can't really investigate these services much further. But if anyone else wants to jump on this and take it further, got for it!

Ally

Anatole’s picture

Richie, did you manage to sanitize the file for duplicates? If so, did you make it available anywhere? If so, much thanks, and if not, I'll take a crack at it...

Physician’s picture

Hi

I really hope that you worked it out up to now. If not, I'm joining the team effort in order to clean up the database code.

But first, did anyone have any succes so far?

I just modified the settings for my file upload website at www.publicupload.com in order to accept uploads of 35 MB. So, please, if you want, upload the db here.

Cheers!

ressa’s picture

Hi All!

I could really use a sanitized version without the duplicates, so if any of you guys managed to do it...

Thanks in advance 8o)

jcshay56’s picture

I have finished sanitizing, most of the programming took about 30 minutes, minor tweaks here and there took another 90 minutes. Upon request, I'll send a zip file that includes the sanitized file, the STATA code, and the log file. (Please credit my name as listed in the STATA code, if you use the code or any derivative for your work, commercial or non-commercial.)

If there were multiple rows with the same country, region, and accentcity, I deleted all duplicates. I kept the most southwest coordinates.

If there were more than 5 rows indicating the same latitude and longitude, I deleted all but 5. I kept the rows where the accentcity's were the 5 strings of shortest length (the reason being that the longer names seemed to be associated with neighborhoods, like trailer parks).

Some interesting statistics:
Roughly .07% of the rows are dropped as duplicates of the first type.
Roughly 1.5% of the rows are dropped as duplicates of the second type.
There is one country/region/accentcity that is repeated 13 times in the original file (the greatest frequency).
There is one location (of precise latitude/longitude) that is associated with 61 different accentcity's (the greatest frequency).

Please contact me at cmhsieh54_at_yahoo_dot_com and mention somewhere in the subject line "Request for Drupal World Cities Database Sanitized File."

-CM

-------------------------------------------

. insheet using c:\temp2\cities.txt ;
(6 vars, 2673764 obs)

. su ;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

country | 0

city | 0

accentcity | 0

region | 0

latitude | 2673764 28.64162 21.84569 -54.93333 150.7833

-------------+--------------------------------------------------------

longitude | 2673545 28.39819 62.65318 -179.9833 180

. sort country accentcity region latitude longitude ;

. by country accentcity region: gen countryaccentregionsame = _n ;

. gen lengthaccent=0 ;

. replace lengthaccent = length(accentcity) ;
(2673764 real changes made)

. sort latitude longitude lengthaccent ;

. by latitude longitude: gen latlongsame = _n ;

. tab countryaccentregionsame ;

countryacce |

ntregionsam |

e | Freq. Percent Cum.

------------+-----------------------------------

1 | 2,671,916 99.93 99.93

2 | 1,596 0.06 99.99

3 | 147 0.01 100.00

4 | 47 0.00 100.00

5 | 21 0.00 100.00

6 | 15 0.00 100.00

7 | 8 0.00 100.00

8 | 4 0.00 100.00

9 | 3 0.00 100.00

10 | 2 0.00 100.00

11 | 2 0.00 100.00

12 | 2 0.00 100.00

13 | 1 0.00 100.00

------------+-----------------------------------

Total | 2,673,764 100.00

. tab latlongsame ;

latlongsame | Freq. Percent Cum.

------------+-----------------------------------

1 | 1,817,627 67.98 67.98

2 | 534,298 19.98 87.96

3 | 181,286 6.78 94.74

4 | 71,725 2.68 97.43

5 | 31,715 1.19 98.61

6 | 15,489 0.58 99.19

7 | 8,105 0.30 99.49

8 | 4,573 0.17 99.67

9 | 2,653 0.10 99.76

10 | 1,706 0.06 99.83

11 | 1,116 0.04 99.87

12 | 791 0.03 99.90

13 | 570 0.02 99.92

14 | 417 0.02 99.94

15 | 315 0.01 99.95

16 | 247 0.01 99.96

17 | 184 0.01 99.96

18 | 144 0.01 99.97

19 | 123 0.00 99.97

20 | 99 0.00 99.98

21 | 82 0.00 99.98

22 | 72 0.00 99.98

23 | 64 0.00 99.99

24 | 51 0.00 99.99

25 | 46 0.00 99.99

26 | 38 0.00 99.99

27 | 33 0.00 99.99

28 | 27 0.00 99.99

29 | 24 0.00 99.99

30 | 19 0.00 100.00

31 | 16 0.00 100.00

32 | 12 0.00 100.00

33 | 12 0.00 100.00

34 | 8 0.00 100.00

35 | 5 0.00 100.00

36 | 5 0.00 100.00

37 | 5 0.00 100.00

38 | 4 0.00 100.00

39 | 4 0.00 100.00

40 | 4 0.00 100.00

41 | 4 0.00 100.00

42 | 4 0.00 100.00

43 | 4 0.00 100.00

44 | 4 0.00 100.00

45 | 3 0.00 100.00

46 | 3 0.00 100.00

47 | 3 0.00 100.00

48 | 3 0.00 100.00

49 | 2 0.00 100.00

50 | 2 0.00 100.00

51 | 2 0.00 100.00

52 | 2 0.00 100.00

53 | 2 0.00 100.00

54 | 2 0.00 100.00

55 | 2 0.00 100.00

56 | 2 0.00 100.00

57 | 2 0.00 100.00

58 | 1 0.00 100.00

59 | 1 0.00 100.00

60 | 1 0.00 100.00

61 | 1 0.00 100.00

------------+-----------------------------------

Total | 2,673,764 100.00

. drop if countryaccentregionsame>1 ;
(1848 observations deleted)

. drop if latlongsame>5 ;
(37097 observations deleted)

. keep country city accentcity region latitude longitude ;

. su ;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

country | 0

city | 0

accentcity | 0

region | 0

latitude | 2634819 28.58837 21.92139 -54.93333 150.7833

-------------+--------------------------------------------------------

longitude | 2634620 28.02092 62.73728 -179.9833 180

tgotchi’s picture

Have anyone download the Free City Database from GeoDataSource.com? I prefer the database format because we do not need additional look up database for the region/state and county.

http://www.geodatasource.com

web2’s picture

My website http://www.nashikit.com is webportal Nashik city of India. I am confused about how I can use this data in http://nashikit.com or in http://www.indiancity.org

http://www.indiancity.org
http://thethrees.net

Summit’s picture

Hi,

I really would like to know how to get the cities of the world in my taxonomy (I use drupal 4.7.6 + category module).
I have succesfully imported the countries xml file (with xml_taxonomy module), but I would really like to have the cities in theses countries also.
I am building a development aid website www.ontwikkelingswerk.net in which I want to let the east meet the west.
I am SEO-expert so the website will be found in time in the google index. My strategy is to get people to state where they come from country/region/city to get in contact with each other. The west can help the east.
Pilot project will be Nepal. Content of the website will be focussed on Nepal at first.
Please help this project also by making a link to it: www.ontwikkelingswerk.net/link-suggestie (it's a dutch development aid site)
And please help me to fill taxonomy with all the cities of the world.
Any other suggestions also welcome off course.

Greetings,
Martijn
www.ontwikkelingswerk.net

paul_cool_fl’s picture

Hi .. I am yet looking for data in countries>states>cities format and want to do it in PHP. Does anyone
have done that using this free database? example of the same is http://www.worldcitiesdatabase.com/world-cities.aspx Appreciate if anyone could share readymade database export of this free maxmind database

soop1’s picture

There's a ready to go database available at http://www.placedatabase.com for a small price. I like the search app that they have to check the data. All the other sites just have drop down demos.. too slow to check!

sanjith’s picture

hi

I have downloaded the list of cites from http://www.maxmind.com/download/worldcities/
but it was coma separated shall I get sql this list as SQL file

pitxels’s picture

Serbia and Montenegro are missing :(

--
Drupal Theming at
www.pitxels.com

worldcitiesdata’s picture

We have Database for sale of all the Countries, broken up to States/Regions, Counties and all the Cities with their Suburbs/Neighbourhoods (in Excel and SQL format).

Example: United Kingdom has over 44,000 Cities with Suburbs or Neighbourhoods

United Kingdom > England > Yorkshire & Humberside > West Yorkshire > Leeds > Beeston

We charge £49 per Country. For extra information please contact us by worldcitiesdata@gmail.com

citydatabase’s picture

I was comparing worldcitiesdatabae.info with www.worldcitiesdatabase.com , since they both are low cost compared to others, and it appears the latter one to be more resourceful.
However, maxmind has a free database so then why buy a cities database. Just get the free one and there is lot of help available on internet about maxmind db. If you put in extra efforts then you can save those few bucks :)