The way in which the similarity is calculated isn't really explained.

I've discovered through experimentation and reading the queries that the similarity doesn't work as I intuitively expected it to, as it does consider terms that are surplus to the given node.

For example, suppose I have the following nodes, in the form "node: term, term, term":

A: X, Y
B: X, Y
C: X, Y, Z
D: X

Node A is our subject node.

I would expect node B to be 100% similar, node D to be 50%. Those work as expected.

Node C is a bit of a mystery. If node C were the subject, then A would only be 66% similar -- if we expect similarity to be a reflexive relationship, then we should get 66% here too, though very tricky to produce in a query.

Intuitively, it's not as similar as B, as it doesn't match node A exactly. If our taxonomy terms were topics, and A was 'monkeys, apples' and C was 'monkeys, apples, Jupiter' then clearly C is not as similar to A as B.

As I say above, I don't see how a query could easily produce this result -- it would have to delve into every single node in the result and get *its* total count of terms and then divide the node_count we currently use by that. Probably doable with subqueries but messy.

In which case, some overview in the documentation of how the similarity concept works in this module would be good.