Problem/Motivation

Drupal.org will have recipe browsing and a recipe browsing API at some point. For the results to be good, we need to know what recipes are popular.

Right now, we have download numbers for recipes, like https://packagist.org/packages/drupal/events_recurring/stats. Those numbers aren’t necessarily great since it measures when composer is run with the recipe as a dependency. And the key action for a recipe is its application, not download.

It would be better to have an API, similar to update status. So we have metrics for when recipes are applied.

Proposed resolution

The server-side is ready: https://updates.drupal.org/recipe-applied. Our CDN returns a static synthetic response immediately. Like update status, the data will be in query parameters and then we do log analysis to get useful data.

Conditions for sending data:

  • Respect opt in/out options
  • When the recipe is applied
  • Only if the recipe is from Drupal.org, in the drupal/ namespace on Packagist.org, like https://packagist.org/packages/drupal/events_recurring
  • If there is a way to know if GitLab CI is being used, not sending for CI would be ideal

Data to send:

  • name recipe name, like events_recurring
  • version recipe version, like 1.0.0-beta1
  • site_key same arbitrary site key used by update status module
  • anything else we need?

The final request will be like https://updates.drupal.org/recipe-applied?name=events_recurring&version=...

Drupal does not need to wait for a response.

Remaining tasks

Decide on any opt in/out options. This is the same privacy policy as update status data. We don’t collect the site URL, don’t share logs, only aggregate, anonymous summaries.

Determine whether to include inherited as well as direct recipe application. Maybe track both, with a parameter for direct vs. inherited. See comments #24 and #25. If so, consider opening a follow-up issue for extensions--tracking those that were installed directly rather than inherited as dependencies.

Finalize any other data collected.

Once the final query parameters are set and in core, we can start on the server-side log analysis. The math will be a bit different since recipes are applied once, not installed.

User interface changes

If there is a new opt in/out UI.

Introduced terminology

None

API changes

Not for Drupal as a client.

Data model changes

n/a

Release notes snippet

To be determined

Issue fork drupal-3489066

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

drumm created an issue. See original summary.

drumm’s picture

Issue summary: View changes

Add version data to send

chrisfromredfin’s picture

I'm not sure how the telemetry will work for doing the reporting, but I can say that there's a core event when a recipe is applied that is where we can know in code if it's happened and report it. Project Browser listens to that event here:

https://git.drupalcode.org/project/project_browser/-/blob/2.0.x/src/Reci...

catch’s picture

Tagging as a Drupal CMS release target. If we were to implement this in update module, would that save an additional opt-in? It's almost exactly the same data that we already send from update module so might be fine. Then update module needs to do something with the event - would suggest adding it to a queue item, and then having the queue runner actually send the data so it's as light as possible when applying a recipe.

phenaproxima’s picture

I would suggest that we simply add this to the Update Status module as an event subscriber, respecting its opt-in/out.

joshuami’s picture

+1 to recipes calling home for update status.

I was thinking about this a bit when the site templates concept was described. D.o gets directionally accurate stats about module and theme installs based when the available updates call home. That's a little more accurate than composer downloads, and it would align from an accuracy standpoint to our other metrics.

We kinda need this sort of telemetry for Drupal CMS install metrics as well. If recipes called home about updates, we could use drupal_cms_starter as an indicator that at least part of the site was based on Drupal CMS. As it stands now, we can only guess at Drupal CMS installs based on one of the dependent feature modules with drupal_cms_ in the name.

One drawback to recipes calling home is that it kinda assumes that recipes will be added to a site and not removed. That might not be a best practice as many recipes will not continue to apply as a site develops over time.

jannakha’s picture

catch’s picture

One drawback to recipes calling home is that it kinda assumes that recipes will be added to a site and not removed. That might not be a best practice as many recipes will not continue to apply as a site develops over time.

I think it would only every be sent to d.o once, not like modules which are every update status check. So it would be possible to track the cumulative times a recipe has been applied, and month by month comparisons, but different from current project usage stats.

zaporylie’s picture

I wonder about a scenario in which a recipe is applied multiple times because it might be a dependency for many other recipes. That would be a common case for starter/base recipes. Said recipes can be deduplicated, which is an approach the recipe installer kit is promoting, or simply applied multiple times. The main issue I see here is inconsistency, which results in slightly off telemetry data.

Re #5: while I agree that the proposed approach is clean, I wonder how this could be respected if the recipe is applied via installer (no option to opt-in/out) or the site is installed from the recipe (drush si ../recipes/drupal_cms_starter).

Re #7: Recipe Tracker (thanks for mentioning it here) is meant to track the application of recipes locally, within your Drupal instance, and never sends any data outside the Drupal instance context.

drumm’s picture

The issue summary mentions including the update status site key, which is used to de-dupe update status data per-site.

We will have to decide on a new algorithm for translating the recipe application numbers into popularity for ranking. Something along the lines of including the last N weeks, potentially with some decay function so the last week contributes more to the rank, and each earlier week contributes less. Until we have some real data, it isn’t worth much speculation about the specific implementation. The earlier we have data, the better.

Additional metadata that might help is welcome, as long as it doesn’t send identifying information about the site, or delay initial implementation in core. Adding method of installation to the query string with the request would be useful - installed via installer, project browser, drush, as a dependency, etc.

catch’s picture

@zaporylie it might be possible to add recipes to a queue in the installer, and then run through the queue on first cron run when updates status is either installed or not.

I don't think we want recipe application itself to trigger an http request to Drupal.org, so also going via a queue might be a good idea anyway. We can make it clear in the documentation for the hook/event that it's triggered an indeterminate amount of time after the recipe is applied.

zaporylie’s picture

I appreciate the feedback in #10 and #11. Routing recipe application d.org update requests to the queue, with the update module acting as the queue consumer, sounds like a clean approach that respects global tracking settings.

I’m curious about one more thing — the issue summary mentioned the name as one of the properties featured in the request. I’d like to clarify two points:
- The recipe directory name—which is essentially what the recipe name comes down to—has only a loose connection with the project on d.org. I believe we should only send an application update if the recipe includes a composer.json file, so we can verify that it belongs to the drupal/ namespace as outlined in one of the conditions in the issue summary. This would filter out all custom recipes added at the project level (i.e., not available on d.org even if the recipe name matches a general project on d.org), as well as recipes under other vendor namespaces.
- This also means that all core recipes would never be tracked. Are we okay with that, or should we support core recipes too? If so, we’d need to collect version information from the drupal/core package.

drumm’s picture

- The recipe directory name—which is essentially what the recipe name comes down to—has only a loose connection with the project on d.org. I believe we should only send an application update if the recipe includes a composer.json file, so we can verify that it belongs to the drupal/ namespace as outlined in one of the conditions in the issue summary. This would filter out all custom recipes added at the project level (i.e., not available on d.org even if the recipe name matches a general project on d.org), as well as recipes under other vendor namespaces.

That sounds correct to me.

- This also means that all core recipes would never be tracked. Are we okay with that, or should we support core recipes too? If so, we’d need to collect version information from the drupal/core package.

Yes, we do want core recipe information, so that will need to be a special case.

chrisfromredfin’s picture

All of this sounds good to me. Only thing that I take a quick note of is that we already have the "dependency popularity" problem with modules... Chaos Tools being one of the top-installed modules... that users almost never care about.

Our consideration for this was to maybe allow projects to opt into a checkbox like "library-only" and then hide those from PB's default filters (allow them to be shown if the user wants that).

Not sure that applies to recipes, as most base recipes ARE "useful unto themselves," so maybe it's different. Anyway, just a little something to consider/think about.

drumm’s picture

We’ll handle any browsing requirements at #3447063: [Meta] Plan for recipe findability on Drupal.org and Project Browser. For now, we just need the data as soon as practically can be done.

(Off-topic - I took a quick survey of Drupal.org’s modern sites, the only reason we install ctools is pathauto, https://www.drupal.org/project/pathauto/releases/8.x-1.11 it’s general popularity should decrease, but that will take years.)

dww’s picture

Note, this would be moving in the opposite direction of #3448401: Move project telemetry reporting from the update module into core. I'm not totally opposed to adding this, but wanted to point out that some folks already don't like how the telemetry aspect is bundled into Update Status...

dww’s picture

Status: Active » Needs work

Preliminary draft MR now up to get this started. Not sure if this should stay under 'recipe system' or move to 'update.module', but leaving it in recipe system for now.

Currently it only creates / deletes a queue in install/post_update/uninstall, and adds an event subscriber to begin to populate the queue as recipes are applied.

Needs lots of help to get the details right in there, and something to process the queue and actually phone home.

dww’s picture

Probably too late to get this into 11.2.0 at this point, but tagging for now. Maybe a miracle will happen. 😂 If not, we can move it to an 11.3.0 priority...

p.s. Also crediting @drumm and @phenaproxima for Slack support

dww’s picture

@drumm: Thoughts on what URL (structure) we should be pinging for these requests?

@all: Anyone more familiar with recipe internals want to run with this? I don't have time to track down all the details and make this work. Please assign yourself and push commits to the MR to remove @todo and make this really work. Thanks!

dww’s picture

Note: I’m in the middle of an extremely busy time right now, and will be totally offline for a few weeks at the end of June and early July. There’s a tiny chance I’ll be able to do anything with this until mid July. So if anyone else wants to pick this up and run with it before then, please do!

mxr576’s picture

I believe this issue is a blocker for reliable usage-data collection. Currently, Recipe Installer reports a recipe “applied” event every time a parent or higher-level recipe references it; so shared or base recipes can be counted repeatedly. This inflates the telemetry and distorts our understanding of real recipe adoption.

Until we dedupe or canonicalize recipe application ("only count once per site" or similar), any usage metrics collected via this API will likely over-report popular or foundational recipes.

drumm’s picture

No, that issue is not a blocker. The issue summary is calling for using the site_key, which enables per-site deduping.

nedjo’s picture

[Edit: the following comment is left for context, but note that the information contained is incorrect. Contrary to what's written here, RecipeAppliedEvent is triggered for all applied recipes, not only for those that are applied directly.]

TBD: do we want to track here (a) only recipes that are applied directly or (b) all recipes that are applied?

Background: Unlike for example hook_modules_installed(), which receives data for all modules that are installed, RecipeAppliedEvent is triggered only for recipes that are directly applied--not for those that are applied by virtue of being listed as recipes in another recipe's recipe.yml file.

If we want to track all recipes that are applied, RecipeAppliedSubscriber::onRecipeApplied() would need to recurse through included recipes. This might be as simple as move the body of ::onRecipeApplied() to a new method with a recursive call:

  /**
   * Notices when a specific recipe is applied and adds to a queue for processing.
   */
  public function onRecipeApplied(RecipeAppliedEvent $event): void {
    self::addTelemetryQueueItem($event->$recipe);
  }

  /**
   * Adds a recipe to a queue for processing.
   */
  protected function addTelemetryQueueItem(Recipe $recipe): void {
    $recipe_info = $this->getRecipeInfo($recipe);
    if (!$this->needsTelemetry($recipe_info)) {
      return;
    }
    $recipe_queue = $this->queueFactory->get('update_recipe_telemetry', TRUE);
    $recipe_queue->createItem($recipe_info);
    foreach ($recipe->recipes->recipes as $inherited_recipe) {
      self::addTelemetryQueueItem($inherited_recipe);
    }
  }
drumm’s picture

Maybe track both, with a parameter for direct vs. inherited.

If we tracked everything the same, we might have utility recipes you wouldn’t use independently rising to the top, like ctools did for module usage.

If we didn’t track inherited at all, we would have a lot more complicated of a task if we did want to find the important building blocks.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

nedjo’s picture

Issue summary: View changes

Maybe track both, with a parameter for direct vs. inherited.

That does sound useful. If we choose this direction, we might want to open a follow-up issue for extensions--tracking those that were installed directly rather than inherited as dependencies. Updating the issue summary accordingly.

I wondered briefly whether there was some relation here to work around tracking recipe application locally on the site. I concluded that, no, these are distinct use cases and our current approach in this issue - dispatching data to a drupal.org endpoint rather than storing locally - is correct. For the sake of completeness, I'll jot down some notes here.

Currently, we're planning to store data on a site's applied recipes remotely on Drupal servers but not locally on the site itself. There are use cases for local storage of such data, albeit for a superset of what we're wanting here--all recipes, not just those that meet our ::needsTelemetry() filter. Two implementations (both of which store data on direct but not inherited applied recipes) are:

Should we anticipate that use case here?

In short, we don't need to. Even if core also ends up tracking locally which recipes have been applied, that's not the point here. For this issue's use case, we don't want to periodically consult the site's version of what's ever been applied. Instead, we want to capture the ephemeral event of recipe application. At best, in the theoretical case that this code is applied in update and subsequently moved into a core library, this issue may serve as a base for a solution that both saves data locally and dispatches selected data to a drupal.org endpoint.

nedjo’s picture

Issue summary: View changes

I would suggest that we simply add this to the Update Status module as an event subscriber, respecting its opt-in/out.

The problem is, when it comes to telemetry data, we don't really have an opt-in/out in Update Status. You can either install Update Status or not. If you install it, your data is sent to drupal.org. If you don't want that, you can uninstall. All or nothing, take it or leave it.

Which, with updates, at least makes a certain amount of sense, because you can't fetch information on available updates without sending information about what extensions you have installed locally.

When @merlinofchaos, @dww, and I first sketched out what became the update API, we mocked up a data model with a two-way client-server data exchange representing a parallel set of requirements. Aside from the site key, the data sent by the client was only what was needed for its own requirements. Client update needs matched server telemetry needs. That parallelism has held ever since.

Until now. The problem is, this issue is all about telemetry and not at all about updates.

My intuition is:

  • Given we're looking at a one-sided data transaction--from client to server, not the two-way exchange we've had in Update Status up till now--it feels like we're crossing a threshold where a higher level of explicit consent is appropriate.
  • The Update Status module is not an obvious fit. If we're going to put this piece here, even on an interim basis, we could look at what extra steps might help with transparancy about attendant questions and issues.

In the installer, if a site admin runs it, we do offer a notice:

When checking for updates, your site automatically sends anonymous information to Drupal.org. See the Update Status module documentation for details.

That notice, at least, and the similar help text we give at /admin/help/update, would need to be changed, as anonymous information would no longer be limited to checking for updates.

catch’s picture

The problem is, this issue is all about telemetry and not at all about updates.

This is true in terms of how the issue is written currently, but I'm not sure it necessarily should be. This came up in the recipes channel on slack recently https://drupal.slack.com/archives/C2THUBAVA/p1770743860959319

There is a theoretical situation where a recipe ships with insecure config. Let's say - a view accessible to anonymous users that lists every user's e-mail address.

It's impossible to 'update' that recipe to fix the security problem because recipes can't have upgrade paths, but it would be possible to release a security advisory for it. That security advisory could then have mitigation steps for sites that have applied the recipe (manually delete the view, manually add the 'administer users' permission etc.). Sometimes we have manual mitigation steps for other SAs - like uninstall a module when it's made unsupported due to a security issue not being fixed.

In the context of update status, if we had the recipe name + version that was applied, we could then compare that with security releases, show them as 'available' and link to the security release.

It would require work in update status to look for new security releases of recipes that are applied, and would likely need interface changes too (not sure we'd ever want to show non-security 'updates' for recipes). However, if we think this is a valid use case and something to cover in update status, then update status would be sending the recipe installed information as part of checking for updates after all.

I also think this is something to figure out before proceeding here, because if we need to send the recipe installed information every time we check for updates, this is very different to sending it once.

zaporylie’s picture

I was looking into the recipe_tracker module and its compatibility with the recipe unpacking plugin, and I noticed a parallel issue to what we’ll run into here.
When a recipe is unpacked, we still have the composer.json file, so we can pull repository information for reporting. However, I think we irreversibly lose the ability to determine which version of the recipe was available (or installed) at the time the recipe is applied.

drumm’s picture

The same update status API is available for general projects, including recipes: https://updates.drupal.org/release-history/ai_chatbot_recipe/current so that will be updated as usual.

If the recipe were to stay in composer.json, composer audit checks the API like https://packages.drupal.org/8/security-advisories?packages[]=drupal/ai_c..., which would have the advisory information when/if there is one.

nedjo’s picture

In the context of update status, if we had the recipe name + version that was applied, we could then compare that with security releases, show them as 'available' and link to the security release.

Makes sense. Given we're not actually (yet) talking about updates for recipes, we might want to do a bit of wordsmithing in the UI, like change "When checking for updates" to "When checking for updates and security notifications".

In the long term, it would be great if this was in fact a step towards actual updatability of recipes. "[P]roviding updates to sites that are already using your distribution" was one of the goals of the original Distributions and Recipes initiative overview and roadmap, and early in the recipes initiative we spent a lot of time and effort working through update scenarios, see #3283900: Define recipe runtime configuration update requirements and child issues. See also #3475693: Consider how to provide users a way to reapply a recipe.

All that said, recasting this issue as adding update support for recipes looks like a fairly big lift.

To mention just one detail, if we want recipe status updates, we're no longer talking about a single environment.

Recipes are usually applied in a dev environment. Other environments don't need to know or care about them. Project Browser, for example, stores data on applied recipes in state. If you're browsing projects, only the current environment needs to display the status of a given recipe (applied/not applied).

But for update purposes, we're presumably into the territory of staging data on applied recipes between environments, like we do for installed extensions with the core.extension simple configuration. Doable, but more complex.

If we do end up storing data on recipe application locally, this issue will potentially interact with the proposed RecipeDiscovery iterator we're working up in #3446354: Create a simple class to allow discovering recipes in the file system and/or with the Recipe object. For example, the applied state could be an attribute of the Recipe object, or there might be a relevant constructor property on RecipeDiscovery to filter by activation status. For reference, it's worth noting that Project Browser models both Extension installation and Recipe activation states or statuses using an ActivationStatus enum.

Aside from technical questions, we've raised some data ethics ones about how well this new telemetry feature request fits with what we have currently in Update Status. Are there people in the project we could reach out to to see if they might give this issue some focused thought and consideration from an ethics and data collection perspective?

nedjo’s picture

Are there people in the project we could reach out to to see if they might give this issue some focused thought and consideration from an ethics and data collection perspective?

I'm no expert, but in case it's helpful, I'll sketch in some questions.

First, let's take the technically simpler approach that's in the current patch and issue summary, focused on telemetry requirements and not on security notifications. Under this approach, on all sites running Update Status, whenever a recipe is applied, data is sent to drupal.org servers logging the event. Here are some possible opt-in/out cases.

  • Status quo. We continue to use the current approach. Installing the module is effectively opting into data sharing. Site administrators may opt out of data sharing by uninstalling Update Status.
  • We introduce a new opt-in/out setting specific to Recipe activation data. TBD: is this setting enabled by default?
  • We introduce a new setting that includes recipe activation data and other potential data that might be collected and sent to drupal.org servers to enhance the project browsing experience but excluding the data on installed extensions that is already collected. TBD: is this setting enabled by default?
  • We introduce a new setting that includes recipe activation data and the data on installed extensions that is already collected. To maintain the current state, this setting enabled by default on existing and new sites.
  • We introduce a new setting that includes recipe activation data and other potential data that might be collected and sent to drupal.org servers to enhance the project browsing experience including the data on installed extensions that is already collected. To maintain the current state, this setting enabled by default on existing and new sites.

For each of the cases, under the current patch's approach, how well does it satisfy generally accepted ethics and norms for data collection?

I also find myself wondering: from a legal perspective, if site admins are opting in, who are they making an agreement with?

Second, let's take the technically more complex approach of integrating recipes into the update workflow such that recipe application is tracked locally and sites receive notifications of security announcements for recipes that have been installed locally.

Does the integration of recipes into the update workflow--specifically, the fact that data on installed recipes is sent to drupal.org with every update request, rather than being sent once only at recipe apply time--make a difference in terms of what data agreement model is appropriate? Or, from a data ethics perspective, are the two transactions (dispatch each recipe activation event as it occurs vs. periodically post data on all past recipe activations as part of fetching data on available updates) functionally equivalent?

nedjo’s picture

In an attempt to move the opt-in/out conversation and work forward, I've pulled it into a child issue, #3576103: Add framework for Drupal telemetry consent management.

I'm only sketching in one possible approach. There may be valid arguments that it's best to keep this issue more narrowly focused on the task at hand, and those will be important to hear.

That said, I'll expand a bit on why this more general approach might be a good idea. So far it looks like we've been approaching recipe activation as pretty much analogous to extension installation. Which, fair. But another possible framing is that it's something qualitatively different. In this alternate view, what extensions you have installed is a question about site architecture--and, aside from versioning, a fairly static one. Recipe activation, in contrast, is an instance of a distinct category: site events. Even if we limit ourselves to the very narrow question, what kinds of site events would be useful to help rank recipes, recipe activation, per se, is only one of many events we might track, and not even necessarily the most valuable event-based metric.

Which may put us into the territory mapped out years ago in #2940737: Add more telemetry to Drupal core. If that's the case, we may be best off addressing the important questions of telemetry consent management not indirectly but head on.

nedjo’s picture

Unlike for example hook_modules_installed(), which receives data for all modules that are installed, RecipeAppliedEvent is triggered only for recipes that are directly applied--not for those that are applied by virtue of being listed as recipes in another recipe's recipe.yml file.

Given this probably isn't the expected behaviour, I opened #3582059: In RecipeAppliedEvent, distinguish between indirectly vs. directly applied recipes.

nedjo’s picture

For the record, turns out what I was seeing in #24 (and repeated in #35) was totally an error on my end and RecipeAppliedEvent is already doing just what we'd expect it to--triggering for both direct and indirect recipe application. So I've refocused #3582059 just on the feature request of distinguishing between direct and indirect recipe application as suggested here in #25.