Yesterday, I installed Alchemy module with the Autotagging module recommended here.
Loving it already, the tags returned are at least (if not more) relevant than those hand picked by my editors, and obviously generated in a fraction of the time.
One issue I have, and I'm not sure if this is an issue with Alchemy or Autotagging, but some of the returned tags contain HTML entities.
I have the threshold set to 60% relevancy and I'm getting back tags such as "target=", "keyword", "Keyword ", etc..
Wouldn't it make sense to have a checkbox that can be ticket to strip HTML entities from all text before sending it off to the Alchemy servers?
Comments
Comment #1
thedavidmeister commentedoh whoops, the html that i was trying to quote came through :P
and nbsp have caused me troubles so far.
Comment #2
thedavidmeister commentedoh whoops, the html that i was trying to quote came through :P
and nbsp have caused me troubles so far.
Comment #3
thedavidmeister commented*
Comment #4
thedavidmeister commentedsimple fix:
line 77, just under
add this:
Comment #5
thedavidmeister commentedcould someone review #4 so we can get it committed?
i find it hard to see how such a simple change could break anything, but someone else might know something i don't about that PHP function.
Comment #6
TomDude48 commentedI added it, let me know if it is doing what you needed.
Comment #7
TomDude48 commentedComment #8
thedavidmeister commentedit is mostly doing what i need, but characters like slashes and html entities are still being sent.
i expanded it to this:
and that got most things that were still annoying me, but when i tried to use a regex to strip everything except "word characters" and white space alchemy sent an error.
now that i think about it, that could have had something to do with me sending a 1000 word "sentence" to alchemy after i stripped out all the punctuation.
anyway, i ran out of time to work on this as it is working fine for 90% of our articles with those three lines i posted above, but if you want to take it further you definitely could.
btw, we've noticed that the quality and relevance of the tags returned by Alchemy in general are vastly improved when you send it plain text with simple punctuation rather than html.
Comment #9
TomDude48 commentedfixed in latest commit (in dev branch)
Comment #10
technologywon commentedDrupal 6 is no longer supported