This is a bold proposal, which is known to be unpopular and/or even unwanted. Read the summary first, and only vote for "won't fix" afterwards.

Problem

  • Portfolio wants to optimize for a defined use-case, but has no way to figure out how much optimization is possible/acceptable.

Goal

  • Allow Portfolio sites to "call home" to provide basic information about environment parameters and enabled modules/features.

Details

  • Unlike Drupal core, Portfolio wants and needs to resolve a concrete problem of its users to be successful.
  • As a product with a specific purpose, Portfolio has to improve and solve its users' needs over time.
  • Portfolio is not able to say: "There are multiple solutions to your question. Go away, search and use contrib." like Drupal core — Portfolio's entire purpose and reason to exist is to prevent and avoid that question in the first place.
  • As clarified in #1488250: Portfolio manifesto // design parameters, Portfolio won't lock you into anything. However, if 80% of all users are going to use, enhance, or configure it via X, then the product design team needs to know.
  • Every solution always depends on the environment parameters (constraints) as well as facilities and possibilities (assumptions) at hand.
  • If you don't know the constraints and if you're not able to make assumptions, then it is impossible to optimize.

Proposed solution

  1. Develop a contrib module that is able to retrieve and send the following data to an install-profile-specific destination:
    • Web server software + version
    • PHP version
    • Databases + versions (available vs. actively used)
    • Enabled modules
    • Enabled features (configuration), including detailed configuration options, but excluding all that aren't machine-readable or of no use for statistical analysis
    • Further statistical data (such as # of registered users, # of configured user roles)

    Thus, pretty much like Update module in Drupal core today, but way more and much more granular statistical information, with the purpose of actually answering questions (instead of a dumb popularity contest). Still 100% anonymous.

    @sun hinted at that in http://drupal.org/project/debug years ago already (but never had time nor passion nor reason to implement it until today).

  2. Include this module in Portfolio, integrate it into the Drupal installer (like Update module in core), and enable it by default.

    Note: You can still uncheck the checkbox and your Portfolio site does not participate at all.

  3. Fully disclose what the checkbox means (in the installer as well as the runtime site configuration option later on).

    (Note that @sun participated in most discussions around similar proposals and knows ~90% of the privacy and security concerns as well as technical networking issues that have been raised so far.)

Comments

sun’s picture

I'm filing this feature request (too) early, because we're about to finish and sign-off #1399188: Analyze common functionality and content requirements of use-cases. Thus, if this is fundamentally acceptable for the community, then it ought to be part of Portfolio's minimum viable product (MVP), so it can improved and optimized way faster from there.

dww’s picture

In principle, I'm not opposed to trying to gather more hard data about how a given site is actually configured to make informed decisions about what functionality to expand and what should be purged.

However, there's a very similar proposal for for Drupal core and d.o over here:
#1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules.
I wrote there why I think it's unfeasible in core, and also why I think the effort required to get it working (both client and server side) outweighs the benefit we'd actually get in statistics like this. Maybe I underestimate how many useful projections/predictions we could accurately make with more of this data.

But, maybe there's a reasonable way to do it in contrib. It'd be Better(tm) if the phone "home" was to somewhere on d.o itself (maybe not updates.d.o, but something like it). But, I'm also worried about the server strain if every site is sending a ton of arbitrary data back to d.o. That's a big part of why I'm not thrilled about doing it in core, but an opt-in contrib might mean so few sites are actually reporting things that it wouldn't be a big load on our infra.

So, +0. ;)

Cheers,
-Derek

DamienMcKenna’s picture

+1, if you give a good disclaimer, maybe even have a separate page of the installer just to describe it.

Dave Reid’s picture

If it's fully-disclosed and opt-in that seems like it would be reasonable.

sun’s picture

@Dave Reid: Given a full disclosure, would an opt-in still be required? Or would an opt-out work, too?

That said, I've clarified the disclosure (+reasoning) by adding a new point 3) to the proposed solution. (No other changes.)

Grayside’s picture

"Configuration options" needs more elaboration. Sounds like a higher comittment level for opt-in.

Wim Leers’s picture

I wouldn't object, especially not when it'd be opt-in.

marcvangend’s picture

I wouldn't mind as long as it's anonymous, fully disclosed and optional. If the configuration option is 'hidden' on a settings page, I'd say it should be opt-in. IMO it can be opt-out if the choice is given explicitly during the install process.

But here's another question: Can statistics about environment parameters answer the question if Portfolio succeeds to resolve the concrete problems of its users? I doubt it. IMHO the data that you propose to gather cannot measure if users manage to complete their tasks and if they are satisfied with the result. That is what usability testing is for. (Maybe you can ship Portfolio with Bojhan and allow users to opt-out of that :-))

sun’s picture

Alrighty. At least no one voted for "won't fix"... :)

So it looks like the functionality - if properly, transparently, and carefully presented, recorded, disclosed, and optional - wouldn't be objected. Since that module doesn't exist yet, further details should be hashed out in its architectural design process. ;)

Figuring out that architectural design is going to be a major task on its own, since this basically hints at statistical values being collected for certain fixed identifiers/keys, which potentially have to be namespaced (e.g., by module name). Whereas none of the namespaces and identifiers/key are known or defined upfront. I'm relatively sure that there are data storages that have been designed for exactly this purpose, but that needs research.

Meanwhile, instead of blatantly collecting lots of data, I'm considering to have each data collector as an own plugin, with the underlying idea of defining a proper question to be answered for each one. Thus, ensuring that all data being collected serves a clear purpose and is self-documenting over time.

For example, pseudo-code:

class SystemPhpVersionDataCollector extends DataCollector {

  public static $title = 'Which PHP version is used?';
  public static $description = 'Determines whether modern code and modules can be used.';
}

At the same time, splitting collected data into separate plugins also allows to present the exact information being gathered for when the checkbox is presented. On top of that, it would even be possible to allow to manually toggle the collection of individual data on and off (though not sure whether that makes sense; but having the possibility sounds good).

dww’s picture

To restate: although I'm not saying "won't fix", I am saying: "This could be a huge amount of work for very limited value (since it's hard to know how many decisions we could actually make based on the kind of data we'd be collecting), so be careful before wasting too much time on this."

dww’s picture

Issue summary: View changes

Updated issue summary.