Taxonomy - some guidelines for effective design of taxonomies

(I wrote this as a comment and was told to make it a page, so I have).

In most small sites taxonomy is obvious, but in larger sites, especially those where the expertise of the readership may not be that of the authors, taxonomy design can make the difference between a site being good or bad.

VIEWS Module - with it's powerful taxonomy based Filters - makes taxonomy design even more important than before.

General Rules

1) Unless a Vocab is well known to all anticipated users, and alphabetical (e.g. Countries of the World), try and keep them below 30-40 Terms.

2) If your Vocab has Parent-Child structures, think about dividing it up because it's likely to be going to get too big, and is probably badly designed.

Example (sort from an online arts shop) - Traditional Arts Single Select

Europe
-Lapp
-Sami
-Celtic
Australia
-Aboriginal

This represented the way the shop classified their items.

But as such it prevents some clients saying "show me European Art".

Multiple Select is an option but much better to have two Vocabs - one for Region and one for Culture, make them both multi and a collector could say "What have you got which is Lapp, or Celtic or Chinese?"

3) Too Many Terms

"Perfect" taxonomies are always too complex and you need to fight to make them more manageable. (Esp if you have just cut up a few Parent Child vocabs into several smaller ones each...)

The advantage of VIEWS is that the multiple taxonomy terms start to build context, and that can be captured by views.

There are not many sites that ever need to show more than 3-4 filters to users, even if there are 5-6 more hidden ones .

Plus building them into structure adds even more granularity.

Eg a site for ALL the small towns of America, where people are interested in their little towns stuff, using Book for basic structure:-

Top level - States - 50 items
Off each State - Counties
Off each County - settlements

Click the named settlement and a get a VIEWS screen with filters for News, Culture, Announcements.

(Hidden filters in the Views filter by State/County/Settlement)

Highly non scary - any users can use that to get what they need.

4) Clients are not experts on taxonomy, not even their own

Taxonomy is a communications issue and if there is a budget for the site it's always worth running it past an outsider - but note that they will need to get to understand the purpose of the site and also to an extent the jargon of the subject.

This normally requires a least one decent face to face meeting to force the client to decide what is important, and what can be cut out. (Trust me, these decisions will have to be made, and if they are not, then people are probably not thinking carefully enough.)

5) Taxonomy creates Legacy issues - SO GET IT RIGHT

Once you have a load of tagged data, it's hard to make changes to taxonomy structures (apart from adding terms) without rendering existing nodes much harder to find. Trust me, NO ONE will go back and edit existing data, not in real life, unless there is massive funding for that purpose.

6) Taxonomy is trial and error.

It should be the first thing you do on a site, but by adding test data you'll find flaws, and refine and eventually go live with something that works. In between times you solved the CSS and templating...

I once spent 3 days testing different ways to classify cars for one site - the makes/models complexity of past 20 years is a nightmare, and which there are several ways ot do it, all of which work, some are more userfriendly than others!

Hope this helps.

Ian Dickson - community specialist with a sideline in taxonomy because its the buidling block for EFFECTIVE social software... .

You say... 5) Taxonomy

daouverson2 - November 7, 2006 - 15:35

You say...

5) Taxonomy creates Legacy issues - SO GET IT RIGHT

Once you have a load of tagged data, it's hard to make changes to taxonomy structures (apart from adding terms) without rendering existing nodes much harder to find. Trust me, NO ONE will go back and edit existing data, not in real life, unless there is massive funding for that purpose.

What is your process (I'm sure it's iterative) for developing categories, voacaabularies, terms

You presented several good "how tos" and "don't dos", but I'm still wondering (after reading this page (nice job!)) how to go about developing vocabularies; how to prevent legacy challenges.

Maybe I should pick up a book on IA. I'd preferr to read something from the Drupal community.

Thanks.

===
Doug Ouverson
hear | see | say | do | teach

Legacy issues are dealt with

iandickson - November 10, 2006 - 11:41

Legacy issues are dealt with by trying to get the initial design right, which means understanding the use case and it's reasonable expectation of change based on "wider adoption by more people but where the people are broadly the same as today". Major unexpected use case changes in the future can't be planned for effectively and are a problem for the organisation at that time. For example a "cancer info site for doctors in this region" could easily grow to be "cancer info site for medically informed people around the world" - doctors, nurses and expert patients within it's initial taxonomy setup, but probably not capable of easy transition to "cancer site for non technical public."

Generally start from the end point - what will people be looking for - and work back. This helps ensure maximum flexibility, and also allows you to define the largest reasonable number of displayed filters.

Then also consider what information will be reliably entered - and test it. Organisations tend to think their staff love entering data, when in fact they hate it, and there is no point asking for people to enter data unless THEY perceive the value. If you force an unimportant vocab, people will commit errors, and errors are much more annoying than absences, (provided that more general searches will find detail absences). If resources allow this is something that can be tested during development. If not, try and use common sense. E.g "Second Hand Cars for sale". You would allow people to define engine size, number of doors, colour etc, but not REQUIRE it. Dealers would provide full info, casual sellers would probably just do make,model,location, price. From a Buyers POV you'd have several windows "price/location" for those working to a budget, "make, price/location" for those looking for , say "Audi, nearby" to "full tech" for those who want the exact match, if available.

Audience is important - a health site for doctors would probably require a taxonomy written in latinate language - no heart attacks, just myocardial infarctions - but one for the public would use common language.

To an extent if the audience is non technical, and you are not technical in the field, then you can be the reality check - if it doesn't work for you, it doesn't work. If you do have technical skills in the area, try and run it by some laypeople.

If it's a technical audience and you are not technical, then you should probably do some reading, and again, try and get someone else technical to review, pref from outside the organisation, but certainly from outside the organisations development team.

Iteration - I have found that the content that actually gets put up is often different from (typically narrower) than the content hoped for. Watching the initial adoption and content path can help determine taxonomy - typically by pointing to the areas needing fine tuning.

Tip - allow multiple selections. If certain terms keep getting paired, they might, for practical purposes be synonyms and reduced to one.

Tip - in development and testing, allow freetagging. Ideally with an initial content set ask people to freetag. With luck there won't be too much in the way of surprises, but worth doing mainly because it helps indicate the level of granularity that people like. E.g. a bunch of docs about cancer - do people tag with specific cancer names, or more general terms, or both?

Tip - when you have done the above, test the results by asking people which of the taxonomy approaches they feel makes it easier to find "documents about" - they might tag a document as "small cell lymphoma" but actually like to see docs about "lymphoma". This makes sense because if you cannot be sure that what you need exists, "results around" can be of great value. For example there might be a treatment described for another type of lymphoma that could be applied to the small cell case.

This is why VIEWS is so good - you can set up a Lymphoma View that automatically brings in all the types, and allows the user to search on other aspects of Lymphoma in general, or choose to be type specific.

Summary - designing taxonomy is all about understanding the ACTUAL USE CASE and, as an expert, trying to spot where that might differ from what your clients think they need, esp if they are a small group within a larger organisation. As to methods - if you have the resources you can test, if you don't, use common sense and try and monitor initial use to giude finalisation.

Ian Dickson - community specialist.
www.emint.org - Association of Online Community Professionals

 
 

Drupal is a registered trademark of Dries Buytaert.