Problem/Motivation

The existing usage of "Categories" on Drupal.org is weak. Approximately half of the projects have no category at all, and some categories are no longer useful, and some are duplicates.

Proposed resolution

  • Access Control - Grant or restrict access to content, assets, or site functionality, or extend the authentication/login process.
  • Accessibility - Enhance the site to provide a great user experience to the broadest range of people or help to audit for compliance with accessibility standards like the Web Content Accessibility Guidelines (WCAG).
  • Administration Tools - Empower site builders and administrators with no-code tools to setup, enhance, configure, or maintain the site.
  • Automation - Enable the site to initiate automated actions from conditions, events, or defined schedules.
  • Content Display - Configure the layout and format of content and data presented to site visitors.
  • Content Editing Experience - Enhance the editorial interface and improve the processes and workflows around creating, editing or removing content.
  • Developer Tools - Empower developers with tools that assist with developing and debugging the frontend or backend of the site.
  • E-Commerce - Assist with aspects of running an online store, such as product management and display, shopping carts, inventory management, fulfillment, payments, taxes, and shipping.
  • Import and Export - Help transfer content and data into or out of the site, by migration, backup, or exposing data to external, headless, or decoupled systems.
  • Integrations - Use a third-party CSS or JS Framework, a self-hosted service like a CRM, or a third-party service with the site.
  • Legal Compliance - Help protect users' privacy by anonymizing or encrypting data, or ensuring compliance with local laws and regulations, such as GDPR or Terms & Conditions.
  • Media - Enhance functionality related to media, or expand media resource types, such as images, videos, audio files, or documents.
  • Multilingual - Provide tools for translation and display of text in multiple languages and support for regionalization/localization for dates, numbers, currency, measurement, or other local contexts.
  • Performance - Improve the real or perceived speed of the site, or monitor performance metrics.
  • Search Engine Optimization - Manage or improve the site's search engine ranking by running audits, assessing metrics, or making the site's content and data more digestible by search engines.
  • Security - Help protect the website from attackers or bad actors, by identifying, preventing, or mitigating security vulnerabilities.
  • Site Search - Enhance functionality relating to the search of content and data on the site.
  • Site Structure - Extend the structure of the site by way of content models, data storage, field types, and navigation, so it is more understandable to users.
  • User Engagement - Enhance the site so that visitors can directly interact with it or among each other, enabling things like user-generated content, comments, voting, chat, or forms for data collection and interaction.

Remaining tasks

  • ✅ Meet as needed to come up with a plan for each existing term, and propose new terms
  • Implement new taxonomy strategy on Drupal.org; include a migration strategy for old terms to new
  • Write descriptions and help pages for maintainers to understand which categories to use
  • File issues with projects to update their categories appropriately

User interface changes

Likely will make the category field required and limited on Drupal.org for maintainers.

Data model changes

(Same as above.)

Comments

chrisfromredfin created an issue. See original summary.

chrisfromredfin’s picture

Current work-in-progress spreadsheet for categorization exam:
https://docs.google.com/spreadsheets/d/1igNmyQLybkRhK8x40Gjo1lK8DRaDoGcg...

Grienauer’s picture

about category/tag discussions we currently had also in a meeting. I would have another proposal:

We could do it with machine-learning…
here are 2 options and a suggestion
As 1) There are some modern keyword extractor models that can be used out-of-the-box (e.g. YAKE and KeyBERT). Disadvantage: good tags rarely appear literally in the description. The best approach I found is described here: https://jaketae.github.io/study/keyword-extraction/
You can play around with YAKE for example here: http://yake.inesctec.pt/demo/user they also have an API
Ad 2) if you develop a clever list of tags, you can get quite far with zero-shot classification (that means you don’t have to label the data individually beforehand). With some labelled training data, however, the thing gets better and you can do good-old supervised learning.
Disadvantage: you can’t handle new labels.
how it could work:
1. we need all descriptions also in that sheets
2. we have to agree to a list of categories (OR at least tags to expand the search possibilities. could be lots of tags)
3. and some peeps categorise/tag a bunch of elements correctly
then we could just use a good old supervised learning on it :)

There have been also some comments on my slack post already:
https://drupal.slack.com/archives/C01UHB4QG12/p1633677153459400
TLDR: others suggested, to look at the code of a module to get from that the categories.

mandclu’s picture

As someone who maintains a number of modules (a few of which I adopted) I'd be happy to participate in the discussions around changing the categories. In a different thread, I also suggested that we might also consider more than one vocabulary moving forward, which also allows for more options. For example, if we had separate vocabularies for the functionality provided and the type of module, modules could be allowed a maximum of two each.

We could develop a mapping for existing categories into the new ones. In terms of how to handle the situations where a module might currently have more terms applied than would be allowed moving forward, I would suggest two courses of action:

1. Communicate the change to module owners and encourage them to self-assign to the new vocabularies
2. For modules that don't manually self-assign into the new categories, develop a "ranked" list of the new categories, based on the popularity of the equivalent terms in the existing structure. As a "good faith" approach to an automated migration, migrate to the highest popularity terms first, up to the maximum allowed under the new structure

chrisfromredfin’s picture

Issue tags: +Novice
chrisfromredfin’s picture

Issue tags: -Novice

j/k this issue itself isn't Novice; but we need Novice help with these decisions. :)

tim.plunkett’s picture

One note: while we can't (or shouldn't) recategorize modules from within the code, we can opt to not show existing categories like CCK in the meantime.
A small issue to hide certain categories + #3274577: Not all Categories are listed on admin/modules/browse page should be enough.

chrisfromredfin’s picture

Cross-posting this with the BoF where more of this discussion is happening, but a friend of mine who does data science and machine learning help me set up a Bag of Words keyword extractor to do clustering against the descriptions of about 7300 Drupal modules compatible with Drupal 9.5+ - this led me to the following clusters of keywords. I set it up to target 20 categories, but it can easily be tweaked. We might use this to cross-check the lists we came up with:

Cluster 0: user login users password email account site role page admin
Cluster 1: field formatter type fields widget text link values display new
Cluster 2: time date field range content events picker datetime set site
Cluster 3: view mode content modes display entity user permissions select admin
Cluster 4: site use content api modules configuration users integration simple using
Cluster 5: taxonomy terms term vocabulary reference content nodes vocabularies import hierarchy
Cluster 6: group groups permissions content members requirements extends based associate create
Cluster 7: entity reference field entities content type fields form create types
Cluster 8: search api solr index backend engine autocomplete functionality indexing views
Cluster 9: language switcher links languages content set different ip negotiation settings
Cluster 10: commerce payment gateway integrates checkout payments integration product order shipping
Cluster 11: views filter plugin style view display filters adds fields page
Cluster 12: consider ukraine fight freedom safety supporting ukrainian europe maintained developers
Cluster 13: data api json database source export external site content storage
Cluster 14: node content nodes type edit page add entity adds pages
Cluster 15: cache tags events time blocks clear purge settings new tag
Cluster 16: media entity file entities library core image files source embed
Cluster 17: block blocks layout display add content page site builder using
Cluster 18: image images style field formatter styles responsive file background optimize
Cluster 19: video embed videos field url youtube handler add support pasting

bsnodgrass’s picture

Interesting approach... While taking a pass through the resulting list on https://www.drupal.org/project/project_browser/issues/3311475, my purely subjective opinion is the is feels pretty complete, while certainly being shorter than the current number of categories.

rkoller’s picture

I've added #3314350: [meta] Usability improvements for Project Browser only as related issue since it was already linked to the google data studio parent issue.

bsnodgrass’s picture

Finally finished my pass at this. https://docs.google.com/document/d/1PO24Bkd34Hd0yZB75ZN2DrpA-sBOOWTzQOZc...

Sorry I think I was supposed to add this to the child issue.

benjifisher’s picture

@rkoller and I discussed the current categories during #3326232: Drupal Usability Meeting 2022-12-16. That issue has a link to the recording of our discussion. We plan to return to the question in an upcoming Usability meeting, when the list is closer to its final form.

Since there were only two of us, these suggestions should not be considered recommendations of the usability group, just ideas to consider:

  • Two of the most commonly used modules are Chaos Tools Suite (ctools) and Libraries API. It is not clear what category these should have. Maybe it is a good thing to make them harder to find, since they will be installed automatically as dependencies of the modules that turn up in the search.
  • Consider combining Privacy and Security, especially if each of these categories represents a small number of the top 100 modules.

chrisfromredfin’s picture

Issue summary: View changes

We still have some user testing to do at DrupalCon Pittsburgh around these, but this is the current draft of 19 categories that we've come up with.

  • Access Control - Grant or restrict access to content, assets, or site functionality, or extend the authentication/login process.
  • Accessibility - Enhance the site to provide a great user experience to the broadest range of people or help to audit for compliance with accessibility standards like the Web Content Accessibility Guidelines (WCAG).
  • Administration Tools - Empower site builders and administrators with no-code tools to setup, enhance, configure, or maintain the site.
  • Automation - Enable the site to initiate automated actions from conditions, events, or defined schedules.
  • Content Display - Configure the layout and format of content and data presented to site visitors.
  • Content Editing Experience - Enhance the editorial interface and improve the processes and workflows around creating, editing or removing content.
  • Developer Tools - Empower developers with tools that assist with developing and debugging the frontend or backend of the site.
  • E-Commerce - Assist with aspects of running an online store, such as product management and display, shopping carts, inventory management, fulfillment, payments, taxes, and shipping.
  • Import and Export - Help transfer content and data into or out of the site, by migration, backup, or exposing data to external, headless, or decoupled systems.
  • Integrations - Use a third-party CSS or JS Framework, a self-hosted service like a CRM, or a third-party service with the site.
  • Legal Compliance - Help protect users' privacy by anonymizing or encrypting data, or ensuring compliance with local laws and regulations, such as GDPR or Terms & Conditions.
  • Media - Enhance functionality related to media, or expand media resource types, such as images, videos, audio files, or documents.
  • Multilingual - Provide tools for translation and display of text in multiple languages and support for regionalization/localization for dates, numbers, currency, measurement, or other local contexts.
  • Performance - Improve the real or perceived speed of the site, or monitor performance metrics.
  • Search Engine Optimization - Manage or improve the site's search engine ranking by running audits, assessing metrics, or making the site's content and data more digestible by search engines.
  • Security - Help protect the website from attackers or bad actors, by identifying, preventing, or mitigating security vulnerabilities.
  • Site Search - Enhance functionality relating to the search of content and data on the site.
  • Site Structure - Extend the structure of the site by way of content models, data storage, field types, and navigation, so it is more understandable to users.
  • User Engagement - Enhance the site so that visitors can directly interact with it or among each other, enabling things like user-generated content, comments, voting, chat, or forms for data collection and interaction.
chrisfromredfin’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.