Problem/Motivation

Drupal used to have a reputation for having enormous arrays. In D8, that's changed, and yet it's stayed the same... we no longer have info hooks, but we still have big arrays for render and Form API, and in addition we now have data structures in annotations and YML files: these are basically the same as arrays.

We need to properly document how these data structures work: what can developers create with them and what properties and values they accept.

But what makes this challenging is that they are *extensible*: one component invents the structure and uses it, but other components or modules add other properties or values that extend the structure. So we can’t document the whole thing in one place, because the complete picture is scattered over various parts of core. But for developers wanting to read about a particular structure, we want all the relevant documentation to be presented in one place.

As an example, consider routing.

The base routing system is defined in core/lib/Drupal/Core/Routing. That would be a good place to put documentation for MODULE.routing.yml files.

But other components are modules go on to define their own properties of routing files:

- core/lib/Drupal/Core/Controller invents the _title property and others
- core/lib/Drupal/Core/Entity invents _entity_form and others
- user module invents _permission
- other modules might invent other things!

Logically, none of those should be defined in the basic documentation for MODULE.routing.yml files, as the core/lib/Drupal/Core/Routing doesn't depend on them, and shouldn't need to know about them.

But a developer who wants to read about routing should be able to read a single document that covers everything -- they shouldn't have to go round piecing together from lots of different documents.

Proposed resolution

I think we need a generic system that lets us tie together the documentation for the base structure, and extra elements that extend it.

So this is made up of two things:

1. Mark the documentation for the base structure with an ID. (For the time being, leave the matter of how to document a YML file. Let’s suppose we have a way of doing that.)
2. Mark the documentation for the extending element:
- the ID of the base structure it extends
- the position in the structure it can occupy as either a key or a value

API module can then stitch all these things together by matching the data structure IDs.

Example 1: routing

There’s no set way yet of documenting yml files, but for now let’s just assume the core structure is documented somewhere in core/lib/Drupal/Core/Routing.

This documentation (whatever it may be, a class docblock, an api.php file, some sort of new api.yml file) gets a doxygen tag:

@datastructure routing

Now let’s document the various _title properties of routing files. These are invented in core/lib/Drupal/Core/Controller/TitleResolver.php. Suppose we document these in the docblock for that class:

@datastructurekey routing */defaults/ _title
  Defines a fixed title string for a route.

That says that:
- we’re defining an extra key for a data structure
- it belongs to the ‘routing’ structure
- this key goes in the structure tree at */defaults.
- the name of the key is _title

Example 2: services file

The main thing that extends the services YML structure is new tags. Same thing as above: the base documentation for a services.yml file gets a tag:

@datastructure services

The code that invents a particular tag documents the tag as being part of the services data structure:

@datastructurevalue services services/*/tags mytag
  Documentation about the tag here..

That says that:
- we’re defining an extra value for a data structure
- it belongs to the ‘services’ structure
- this key goes in the structure tree at services/*/tags.
- the value is ‘mytag’

(There’s a proposal for those at https://www.drupal.org/node/2745947, but the proposal is to invent a new docblock tag just for service tags. I don’t think it’s maintainable to invent a new docblock tag for each new data structure element we need to document: we’ll end up with hundreds of them and API module will need to be taught about each one.)

Remaining tasks

- Discuss the proposal
- Devise a way to document base structures when these are YML files -- should be in another issue.

Comments

joachim created an issue. See original summary.

joachim’s picture

Issue summary: View changes
joachim’s picture

Issue summary: View changes
jhodgdon’s picture

Could you perhaps put more detail in the issue summary or in a comment? I'm not understanding:
- Where the @datastructure tags go
- Where the @datastructurekey tags go
- Where these would be collected and how they'd be displayed (presumably by the API module)

jhodgdon’s picture

What I'm trying to say is that this is too abstract for me to wrap my brain around, and a concrete example of where these tags would go in actual Core files would really help.

joachim’s picture

Part of the problem is that we don't currently have a way of doing a YML equivalent of an api.php file, which would be needed for this, and for an example to do with structures that are created in YML.

So what I might do instead is an example using FormAPI arrays, since those are all in PHP, and we have ways of documenting PHP.

jhodgdon’s picture

Um... I'm not sure that will make it clearer. We can already write @defgroup topics in PHP files, sprinkled with bullet lists, to document PHP things, along with adding prose/lists to class/function documentation headers.

But you're saying we need a generic way to document ... things ... I would like to see an example that we cannot currently document, and have the example worked out completely enough so that we can see clearly (a) that we need this and (b) that the proposed structure will fill the need and (c) that our existing documentation efforts/structures are lacking.

Crell’s picture

jhodgdon: The point of joachim's proposal is, I believe, that we can document "things" but not "Extensible things". Eg, we have no master list of service tags that mean something. We have no master list of tags for database alterable queries that mean something. We have no master list of routing "default" keys that mean something. The proposal is for a mechanism to provide data that some tool (presumably api.module) can dovetail together into a master list.

To extend the routing example a bit more, I think the intent is something like this:

/**
 * // ...
 *
 * @datastructure route
 *   A route defines a mapping from an incoming request to the code that will handle it.
 *
 * @datastructurekey route */defaults/ _controller
 *   The callable that will be invoked for this route.
 *
 */
class Router extends UrlMatcher implements RequestMatcherInterface, RouterInterface {
  // ...
}
/**
 * // ...
 *
 * @datastructurekey route */requirements/ _permission
 *   One or more permissions that the user must have in order to access this route.
 */
class PermissionAccessCheck implements AccessInterface {
  // ...
}
/**
 * // ...
 *
 * @datastructurekey route */defaults/ _title
 *   A user-visible title string for this page.
 */
abstract class ControllerBase implements ContainerInjectionInterface {
  // ...
}
/**
 * // ...
 *
 * @datastructurekey route */defaults/ _entity_revision
 *   Wait, what does this do again?
 */
class EntityRevisionRouteEnhancer implements RouteEnhancerInterface {
  // ...
}

And when api.module runs it will analyze all of those annotations and spit out a list that looks like this:

# route
A route defines a mapping from an incoming request to the code that will handle it.

## defaults

* /defaults/ _controller
   The callable that will be invoked for this route.
* /defaults/ _title
   A user-visible title string for this page.
* /defaults/ _entity_revision
   Wait, what does this do again?

## requirements
* /requirements/ _permission
  One or more permissions that the user must have in order to access this route.

(Joachim, if I got that right feel free to steal the above for issue summary.)

It sounds like a possibly promising direction to me. I have two concerns:

1) It's still using an unstructured format to define unstructured formats. I fear the inception problem, and documentation problem of the documentation.
2) Some keys are non-global, or contextual. The requirements block of a route, for instance, can have a regex for any parameter. The defaults block can have a default for any parameter if it's the last parameter or all parameters after it also have a default set. Some keys only make sense in the presence of other keys. Etc. This could also become an issue with FAPI, where we have keys that are on 1/3 of all form elements, other keys that are on a different, overlapping third, etc.

jhodgdon’s picture

OK, that's more illuminating, thanks!

I don't quite understand the proposed structure for the @datastructurekey line, though. A lot of / and * and stuff. What does it mean?

joachim’s picture

@Crell, yup, that's pretty much the gist of it. Thank you for expanding on what I wrote and making the examples :)

There's a few things where I was thinking along very slightly different lines, so I'll expand on those first here before changing the summary straight away.

First, I was thinking that the tag that defines the base structure could also include the parts of the structure that are defined by the original inventor.

So with the Router example, the Router class recognizes some basic keys in the route structure. So we might have:

/**
 * // ...
 *
 * @datastructure route
 *   A route defines a mapping from an incoming request to the
 *   code that will handle it.
 * - *: The name of the route. An arbitrary string, which must be
 *   unique across the site. It's a good idea to prefix this with
 *   the name of your module.
 *   - defaults: Properties of the route that yada yada (I have no idea what
 *     the difference between defaults and options is, as it happens ;)
 *     - _controller: The callable that will be invoked for this route.
 *       This may
 *       be one of (and this is where we hit a problem, because I want 
 *       to make a
 *       bullet list here, but it's NOT part of the structure!): 
 *       a) a function
 *       name, b) a static method, in the form 
 *       '\Namespace\Class::method', c) a
 *       method on a service, in the form 'service.name:method'.
 */
class Router extends UrlMatcher implements RequestMatcherInterface, RouterInterface {

(Sorry for the awful wrapping -- I had it wrapped to 80 in my text editor, but d.org's formatting seems to be set to narrower, so having to hack at it in haste.)

> I don't quite understand the proposed structure for the @datastructurekey line, though. A lot of / and * and stuff. What does it mean?

With the core structure as a single tree as above this is hopefully a bit clearer:

/*
 * @datastructurekey route */defaults/ _title: The title yada yada.
 */

This says:
- An extra item is being defined, which should be spliced into the tree for the 'route' structure.
- This item should be inserted as a child of the */defaults item, where the / is a path separator in the tree structure.
- The key of the item is _title.
- The description is 'The title yada yada.'.

Basically, to the developer reading the documentation page rendered by API module, it would be as if the _title property's definition had been written alongside the definition of _controller in the documentation for the base structure, rather than added by another component or module elsewhere in the codebase.

The other thing that #8 doesn't cover is that I envisioned that the definition of the base structure could be in some sort of YAML equivalent of api.php files, analogous to the way we document hooks. But I'm happy to drop that for now, as it's an extra complicated thing to figure out. If anything, documenting the base structure in PHP keeps it closer to the code that invents it, which is a good thing.

And one thing I didn't cover in my summary is that sometimes we need to define possible values, such as service tags.

Now for Crell's concerns, which I agree are things that need to be figured out, and I'm not yet sure exactly how.

> 1) It's still using an unstructured format to define unstructured formats. I fear the inception problem, and documentation problem of the documentation.

I'd be happy for this to have a strictly-defined format, and indeed it probably should, since it's to be machine-parsed. Is that what you mean?

> 2) Some keys are non-global, or contextual. The requirements block of a route, for instance, can have a regex for any parameter.

It occurred to me as I was trying to refine my explanation of this (and thank you for beating me to it, I really appreciate it!) that this is maybe slightly related to our config schema system:

core.entity_form_mode.*.*:
  type: config_entity
  label: 'Entity form mode settings'
  mapping:

That's saying 'This is the definition of all configurations whose keys are of this form, with wildcards in these positions'.

One complication I'm thinking about is that there's probably a difference between saying '*/foo' where '*' represents the top-level keys which are arbitrary, such as route names or form API element names, and something where the '*' represents any key from the valid ones (I can't think of an example right now...)

> This could also become an issue with FAPI, where we have keys that are on 1/3 of all form elements

Indeed, here we sometimes want to say that #options is a valid key for elements with a certain value of #type... not sure how we handle that yet.

jhodgdon’s picture

OK. With some details and refinements, it can probably work.

I think it is not a good idea to mix the @datastructure declaration with some of its @datastructurekeys. That is just too complicated to parse out. I also don't like the idea of parsing bullet lists to figure out what the structure is.

So I think that after @datastructurekeys, you would list (a) the name of the structure it's in, and (b) *one* string for the key, using / to separate levels, and no : would be needed... Much easier to parse if it is always 2 strings only... so it would be something like:

@datastructure route
  Docs for the route structure as a whole

@datastructurekey route *
  Docs for the * element/key

@datastructurekey route */defaults
  Docs for the defaults key/element

@datastructurekey route */defaults/_title
  Docs for the title key/element
Crell’s picture

Joachim: Re point 1, it's more that it's not lintable. Using for-reals PHP classes and types to define something is nice, because about a third of all possible errors will be caught for you by your IDE or the compiler, and another third will be caught with a fatal the instant you try to run it. That leaves only a third of possible things you could mess up that require actual thought. (The ratios here are for demonstration purposes only, but you get the idea.) And there's a really really really good parser that knows how to give you useful error messages. And there's a definition of the data structure you can easily look up to know how it works. (Class or interface definition.)

Yes, we're talking about how to document the parts of Drupal that are not that, and possibly can't be that. However, we should be careful not to define an undocumentable, unlintable, informal data format in order to document our undocumentable, unlintable, informal data formats. :-) Swallowing that dog to catch the cat to catch the mouse to catch the fly doesn't seem like it leads anywhere useful. At some point we need to just accept that undocumentable, unlintable, informal data formats are a problem, not a solution, and adjust accordingly. (Drupal 8 did a great deal of that, certainly, and then also regressed in some areas, such as the annotations or YAML files you mention.)

Side note: Annotations are actually documented already as they have a corresponding class that defines them. They're still hard to lint-early, but at least a definition exists.

joachim’s picture

> I also don't like the idea of parsing bullet lists to figure out what the structure is.

That doesn't seem to me to require handling that's much different to what API module already does for something like hook_entity_info() on D7.

But let me say now that I will be happy to roll up my sleeves and help with the work needed on API module. (I keep meaning to get involved in API module in general...)

> Side note: Annotations are actually documented already as they have a corresponding class that defines them.

True, but I feel that documentation for annotations is very definitely been a step backwards from the documentation we had for info hook structures on D7 -- see #2092757: specialize output for Annotation classes for my reasoning (with pictures!).

Also, annotations don't handle structure extensibility at all. For instance, here in \Drupal\Core\Entity\EntityType, we find:

  /**
   * The route name used by field UI to attach its management pages.
   *
   * @var string
   */
  protected $field_ui_base_route;

Field UI, a lowly non-required core module, has got its mitts into a core component!

> Swallowing that dog to catch the cat to catch the mouse to catch the fly doesn't seem like it leads anywhere useful. At some point we need to just accept that undocumentable, unlintable, informal data formats are a problem, not a solution, and adjust accordingly.

Well it's inventing one new format to document several undocumented formats. So that's an improvement. And the new format would be much simpler that the formats it's documenting. And it wouldn't be extensible in the same way; it would have all of its own documentation in one place. So I'd say it's more like a single spider that's catching lots of flies ;)

Docblock seems a good idea to me because it allows the documentation to be directly alongside the code that is related to it, in the class or method docblock.

But I am not wedded to my suggestion of inventing new docblock tags. The real meat of what I'm suggesting here is the dovetailing idea, so that we document pieces of structures in the module or component that invents each piece.

Another possibility would be using YAML, since it works nicely for documenting schema structures, but it sounds like @Crell thinks that's not ideal either.

> Using for-reals PHP classes and types to define something is nice

I can see that it has advantages. I'm not sure what we can use to represent the structure pieces that are to be dovetailed together though. Have an interface represent the structure a piece is a part of? How do we specify the path into the structure? I'll have a ponder and see if I can think of anything, but I emphasise I'm going to be pluckign things out of thin air...

Finally, might there be existing systems for this sort of thing that we could use? I have no idea what to search for though.

jhodgdon’s picture

No, there's a difference between parsing PHP code (which we do via a parser class), and parsing a bullet list in documentation to find information that is embedded in the structure of the bullets, breaking it up, and storing it in a way that we can collect it into a documentation page (as opposed to just printing it out as a UL list). People are really bad about following documentation guidelines carefully, and a fidgety structure that requires certain things to be in a bullet list in a documentation block is ... not likely to be good.

Parsing something like "@something followed by one string that gives the name of the structure, and another slash-separated string that gives the position in the array" is quite a bit better than trying to figure out the depth of bullet lists in docs based on indentation.

Don't get me started about documenting annotation structures. I am not going to say anything...