at frontend united 2012 amsterdam i talked to nod_ about my plans on writing a thesis on mapping in drupal and if he had any idea on cool topics that would be out there that i should work on. he said that a implementation for "server-side clustering" would be very cool. i agreed and decided, ok let's do it :)

so this issue is basically about getting the people knowledged in mapping & drupal 7 on board to discuss what would actually be a good way to implement such feature in coordination with the tools that we already have.

for storing and retrieving data on the server-side i believe that geofield + views_geojson would be a good basis to start.
we might need additional frontend support for the interaction with openlayers but i would like to keep this generic in order to support leaflet or mapping some day.

right now this is really just a basic idea, i haven't done much research yet but will definitely do so. this all should go into the thesis that i'm writing for my master degree in Software Engineering & Internet Computing at Technical University Vienna. related, i have already done some work with openlayers in drupal 7, see AustroFeedr - Mapping Open Data with OpenLayers in Drupal 7.

Questions

  • what server-side geo clustering techniques are out there?
  • what would be the requirements for a clean & basic implementation of a server-side clustered geo-search in drupal 7 ?
  • which drupal modules & techniques should be used to implement such those?
  • can this live in a single module and if so, how should should it be named?
  • any fancy ideas about nice to haves?

looking forward to collaborating in an open way and reading from your ideas, regards dasjo

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dasjo’s picture

added a sprint proposal for drupal dev days in barcelona, i will be there from june 13-18
http://barcelona2012.drupaldays.org/improve-mapping-drupal-7

dasjo’s picture

Issue summary: View changes

Updated issue summary.

dasjo’s picture

Issue summary: View changes

Updated issue summary.

dasjo’s picture

Issue summary: View changes

Updated issue summary.

dasjo’s picture

In addition to using built in functionality in OpenLayers, EveryBlock has been able to extend OpenLayers for displaying large quantities of data, using custom server side tools to create clusters of data. Using OpenLayers vector styling, they are then able to create automatically resizing features – using in-browser vector drawing support – to represent various items – news stories, photos, and more. By using the functionality in OpenLayers, EveryBlock is able to create compelling maps that tell stories, within the limited framework of the browser

http://docs.openlayers.org/casestudies/everyblock.html

batje’s picture

after the quake in Haiti, the Ushahidi project needed to cluster points server-side. I never looked at the implementation, but I know they coded it. http://www.ushahidi.com/

dasjo’s picture

thanks for the link batje.

more details on the progress posted on g.d.o:
Mapping sprint at Drupal Developer Days Barcelona 2012
http://groups.drupal.org/node/234168

jeffschuler’s picture

I'm thinking out loud here:

What if Views GeoJSON accepted clustering args (like zoom-extent and cluster radius,) and did the clustering... then just let Views cache the results?

Then when OpenLayers (or the Mapping module, etc.,) requested GeoJSON feeds, they'd just send those clustering args -- similar to how we're doing bounding-box filtering, (see: #1333324: Bounding Box filtering and #1493344: Support BBOX strategy in the GeoJSON layer type.)

dasjo’s picture

some notes i have collected so far:

Clustering motivation
summarize data at high zoom levels by clustering
allow exploration of individual points at lower zoom levels
http://blog.davebouwman.com/2012/03/24/server-side-clustering-why-you-ne...

Clustering
http://en.wikipedia.org/wiki/Carrot2
http://136.159.122.181:8080/geoclustering/help.php
http://136.159.122.181:8080/geoclustering/

Solr
Solr SpatialSearch
http://wiki.apache.org/solr/SpatialSearch

Google MarkerClustering
http://code.google.com/p/google-maps-utility-library-v3/wiki/Libraries#M...
http://googlegeodevelopers.blogspot.com/2009/04/markerclusterer-solution...
http://google-maps-utility-library-v3.googlecode.com/svn/trunk/markerclu...

K-Means
http://en.wikipedia.org/wiki/K-means_clustering
Polymaps k-means
http://polymaps.org/ex/cluster.html
no interactivity, static
PostGIS
http://pgxn.org/dist/kmeans/doc/kmeans.html
how to use
http://gis.stackexchange.com/questions/11567/spatial-clustering-with-pos...
MySQL
SQLDM – implementing k-means clustering using SQL
http://www.abibasystems.com/white_paper/sqldm.pdf
Java impl
http://www.javaworld.com/javaworld/jw-11-2006/jw-1121-thread.html?page=1
Drupal ideas
http://groups.drupal.org/node/104014
paper
http://ilpubs.stanford.edu:8090/778/
k-means++: The Advantages of Careful Seeding

Google maps Perl impl + discussion
http://flylib.com/books/en/2.367.1.102/1/

Google maps php example
Google maps haversine php example
https://github.com/tuupola/php_google_maps/tree
http://www.appelsiini.net/projects/php_google_maps/cluster.html?center=1...
http://www.appelsiini.net/2008/11/introduction-to-marker-clustering-with...

PHp example
http://web.archive.org/web/20071011143643/http://forum.sydphp.org/?a=top...
request
http://uwmike.com/maps/dams/index.php.source
http://uwmike.com/maps/dams/map_functions.js.source
http://uwmike.com/maps/dams/map_data.php.source
http://uwmike.com/maps/dams/style.css.source
prep
http://uwmike.com/maps/dams/data/dams_au.txt.source
http://uwmike.com/maps/dams/data/create_inserts.php.source
http://uwmike.com/maps/dams/data/create_clusters.php.source

Quadtrees + hilbert courve blog post
http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-wit...

Region quadtree
http://en.wikipedia.org/wiki/Quadtree#The_region_quadtree
http://gis.stackexchange.com/questions/5394/incremental-spatial-clusteri...

Vizmo
Hierarchical Clustering by Meaningful Units
www.globalimpactstudy.org/wp-content/uploads/.../vizmo-poster.pdf
http://www.globalimpactstudy.org/2011/12/open-source-presentation/

SnapToGrid
http://postgis.refractions.net/docs/ST_SnapToGrid.html

Solr clustering
http://stackoverflow.com/questions/8399152/how-to-best-do-server-side-ge...
Outdated localsolr
https://issues.apache.org/jira/browse/SOLR-773
Implementierung
http://blog.sybit.de/2010/11/geografische-suche-mit-solr/
Good discussion
http://www.mail-archive.com/solr-user@lucene.apache.org/msg40651.html

More
http://postgis.refractions.net/pipermail/postgis-users/2006-March/011431...

Pre-cluster / store clusters in db

Examples
http://www.crunchpanorama.com/
http://gmaps-utility-library.googlecode.com/svn/trunk/markerclusterer/1....

Libraries / Vendors
http://www.maptimize.com/

Articles
Google maps with lots of data, comparison of libs
http://www.svennerberg.com/2009/01/handling-large-amounts-of-markers-in-...

Thesis
Spatial clustering of structured objects Antonio Varlaro
www.di.uniba.it/~varlaro/Varlaro_PhDThesis.pdf
CORSO

Openlayers implementation
http://dev.openlayers.org/releases/OpenLayers-2.11/lib/OpenLayers/Strate...

Drupal 6 github
https://github.com/ahtih/Geoclustering

dasjo’s picture

i have just committed a first draft to the sandbox.

the views field handler geocluster_handler_field_geofield does the clustering in post_execute. it has to replicate some logic from views_handler_field_field::post_execute which could lead to a views patch if we decide to go this route.

the cluster handler provides an option for setting the cluster_distance. the geocluster_distance is a a quick and dirty implementation that we should consider improving and moving to geoPHP.

more @todos from the handler
- make a real cluster, currently it still links to the first result
- calculate and set cluster center

scroogie’s picture

Version: » 7.x-1.x-dev

Have you seen this on github: https://github.com/ahtih/Geoclustering

And perhaps there is some good information in OSGeo projects like GeoServer or similar.

dasjo’s picture

hi scroogie, thank you for the link. i actually posted it above at the end of #6 :)

while making some progress on doing more research and getting more ideas on how to tackle geoclustering, i have added separate issues for documentation & discussion purposes:

#1662172: Motivation
#1641854: Clustering example websites
#1662432: Geohashes for clustering

dasjo’s picture

i'm still trying to figure the right way to integrate with views for clustering the points.

maybe somebody has ideas on that issue: #1791796: Allow to inject a custom aggregation implementation

ahtih’s picture

Hi, I am the author of D6 Geoclustering module (mentioned above). (sorry @dasjo , I noticed your June tweet just now.. I am not really using Twitter). Glad to see other ppl working on similar ideas. I am myself no longer developing it further. My code is used live at http://www.letsdoitworld.org/wastemap

dasjo’s picture

so i have made some progress here. a first demo is working now:

clustering with min. distance of 40px between markers
geocluster_leaflet_preview1.png

clustering with min. distance of 15px between markers
geocluster_leaflet_preview2.png

the code is in the sandbox, but far from being production ready.

the issues i'm trying to figure out right now are:

thanks for your feedback :)

pvhee’s picture

Hi dasjo, did you also consider client-side clustering if your data set is not huge? The relatively new leaflet library Leaflet.markercluster does an amazing job at it, and is performant enough for datasets up til 50.000 points (check out the demos). And it could still be optimized since it uses a greedy algorithm to cluster the points.. For a simple drupal integration module you have the module Leaflet Markercluster.

dasjo’s picture

hi pvhee,

yes, client-side clustering is a good approach for not-too-large data sets. for my project i want to focus on server-side clustering, as client-side already has tons of implementations which are easy to drop-in.

i'm curious though, how performance will compare in the end. server-side clustering requires more requests to the server to recalculate clusters when zooming/panning on the other hand it reduces bandwidth requirements by only transferring clustered data and processing time on the client.

if you know about good resources for measuring performance in that regards, i'd be happy to know.

regards dasjo

dasjo’s picture

i have extended the prototype to work with views_geojson. what i especially like about this approach is that we can use the bounding box strategy and update the points when zooming/panning.

it requires #1799870: Add hook views_geojson_feature_alter

dasjo’s picture

you can now see a live demo of the current state of geocluster at
http://dev.geocluster.gotpantheon.com/maptable

the base layer was created using mapbox

dasjo’s picture

dasjo’s picture

Issue summary: View changes

Updated issue summary.

dasjo’s picture

Issue summary: View changes

summary

dasjo’s picture

i have made some progress with implementing the solr plugin, but still in the rough phases.

in the meanwhile find some high-level architecture diagrams here: #1807358: Drupal Mapping Diagrams

dasjo’s picture

i have formalized my planning for geocluster as the exposé of my diploma thesis, see the attachment

dasjo’s picture

Issue summary: View changes

uo

dasjo’s picture

for those who are interested in checking out the current state of geocluster, i have created a first, functional but non-stable, non-feature complete alpha release:
http://drupal.org/node/1820964

dasjo’s picture

dasjo’s picture

again, did a batch of updates, hope to roll an alpha2 soon.

in the meanwhile i have created a generic module for handling leaflet + views_geojson integration:
http://drupal.org/project/leaflet_geojson

dasjo’s picture

just released alpha2 with views aggregation based clustering: http://drupal.org/node/1854228

dasjo’s picture

Slides from today's Geocluster update presentation at the drupal austria meetup:
http://bit.ly/RpTJg5

dasjo’s picture

hi, here are some updates geocluster:

solr integration has been implemented. see the geocluster_solr and geocluster_solr_demo features of the current dev release.
see #1800850: Use a solr plugin with search_api_solr for clustering

i'm currently writing my thesis on geocluster.
clustering and web mapping foundations + the drupal & clustering related state of the art have already been documented. upcoming: conception, implementation & evaluation of the geocluster module. if you are interested, see the latex source files on github.
see https://github.com/dasjo/Geocluster-thesis

chriscalip has contributed some cool ideas on enhancing geocluster with a mix of server- + client-side clustering and even recorded a screencast
see #1914704: Progressively enhance server-side with client-side clustering

dasjo’s picture

dasjo’s picture

Status: Active » Fixed

time to close this i guess :)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

up