Decide how to negotiate between the 2 JSON-LD serializations [#1797210]

Problem/Motivation

We want to support two different data models in our JSON-LD.

Literal as string

This is how most consumers expect literals to be handled.

site:node/1 content:encoded "This is the body text" .

Which looks something like this in JSON-LD:

{
  "@id": "site:node/1",
  "content:encoded": [
    "This is the body text" 
  ]
}

Literal as blank node

To preserve information about the literals, such as the format of a text field, we need a separate JSON-LD serialization. This would be used for the deployment use case.

site:node/1 content:encoded _:blanknodeID .
_:blanknodeID site:value "This is the body text" .
_:blanknodeID site:summary "This is the summary, used as teaser" .
_:blanknodeID site:format "full_html" .

Which looks more like this:

{
  "@id": "site:node/1",
  "content:encoded": [
    {
      "site:value": "This is the body text",
      "site:summary": "This is the summary, used as teaser",
      "site:format": "full_html"
    }
  ]
}

Proposed resolution

Support two serializations in JSON-LD.

To allow content negotiation between the two, there are a few options, which have been listed below.

The current proposal is to use a custom media type for the Drupal specific JSON-LD, application/vnd.drupal.ld+json.

Remaining tasks

Decide.

Comments

Comment #1

fago

German

Vienna

CreditAttribution: fago commented 29 September 2012 at 00:35

Well, as to the plans from http://groups.drupal.org/node/237443 we'd have the structure "entity - multiple field items - single item - item value" anyway, just as in the new entity field API. So wouldn't we always have that possible range problem then?

It sounds to me that a for us useful representation wouldn't match very well to RDF use cases, so maybe we should accept that and do another mapping for pure RDF use cases?

Comment #2

Anonymous (not verified) CreditAttribution: Anonymous commented 30 September 2012 at 17:44

Title:

Decide how to handle literal values in JSON-LD

» Decide how to handle literal values (e.g. text, numbers) in JSON-LD

This means we would use different JSON-LD representations for the CreateJS use case and for the deployment use case. AFAIK, CreateJS depends on the RDFa and JSON-LD model being the same. The RDFa model has to follow standard practice, since people will almost certainly use it to get Google Rich Snippets using Schema.org, and Schema.org consumers expect literals to be handled in the standard way.

I believe providing two JSON-LD serializations which are modeled in two different ways creates complications. First, I don't know how we would negotiate between the two versions... what accept header would we use to specify which of the two should be returned? Second, I think that it might increase the complexity for consumers, who would then have to decide which model they want to use.

Comment #3

scor CreditAttribution: scor commented 5 October 2012 at 00:46

I've hit similar problems while working on the JSON-LD module in D7 and Entity RDF, see in particular the nested field property values issue in the entity RDF post.

If you're going to support both the deployment use case and exposing your data to third party sites which don't know nor care about your internal site architecture, you're inevitably going to have to offer two variants. I've suggested exposing different serializations depending on the consumer before, and according to Crell (see response to that comment) that was actually what they discussed in Paris as well. Quoting crell:

I think both [serializations] should be in JSON-LD for simplicity but we do need to handle both use cases separately

To answer specific question of #2.

I don't know how we would negotiate between the two versions... what accept header would we use to specify which of the two should be returned?

By default you'd get simple JSON-LD structure usable by third parties (Literal as string). In the case of the deployment scenario, you would have to trigger the advanced format so that entity property values can be saved properly across the board. Let's see some of the options we have to support both formats. I know some of these aren't going to make some people happy but we should review them anyways, as unholy as they can be :)

HTTP header field: Accept

The consumer could send an HTTP header accept field indicating a preference for the raw/unprocessed version of the entity with some quality param, for example:

Accept: application/x-drupal-raw-ld-json; q=1.0, application/ld+json; q=0.8, application/json; q=0.7, */*; q=0.1

Using our own custom MIME type in HTTP headers isn't totally new, it was suggested in #1533366-12: Simplify and optimize Drupal.ajax() instantiation and implementation for XHR requests.

Ideally we would use a value that indicates a subtype of JSON-LD using an optional parameter, with some imagination it could look like this:

Accept: application/ld+json;raw=true, application/json, */*;q=0.1

but I'm not sure this is allowed by the JSON-LD MIME type definition. form is the only parameter listed at present and cannot be used for this purpose.

HTTP header field: Pragma

The Pragma general-header field is an implementation-specific header. Example:

Pragma: drupal-serialize-raw

non-standard request header

Non-standard request headers are conventionally prefixed with x-. Example:

X-Drupal-Serialization-Format: raw

URI parameter

When requesting a entity, instead of adding a parameter in the HTTP request, the URI of the entity is appended with the right parameter. Not very RESTful, but there seems to be some conventions out there that people are using, for example the expand parameter for saving bandwidth on mobile applications. In this case, you can specify what fields you want expanded directly in the initial response, to avoid having to request every single resource after receiving the initial response. Here is an example taken from the JIRA's Expansion in the REST APIs where the field names to be expanded are specified:

https://jira.atlassian.com/rest/api/latest/issue/JRA-9?expand=names,renderedFields

We're trying to achieve something similar, in saying that we want the field value to be raw/unprocessed as opposed to the default case where they are processed.

Second, I think that it might increase the complexity for consumers, who would then have to decide which model they want to use.

No doubt the simple serialization format should be served by default. If they are not doing deployment or if they don't have a Drupal site on the other end, chances are they should probably stick to the default format. I don't know of a practical use case yet for using full blown entity serialization outside the deployment scenario, and if someone was to come up with something, I bet they'd be smart enough to figure out which format to choose. I don't think we need to cover such unknown grounds yet.

Comment #4

Anonymous (not verified) CreditAttribution: Anonymous commented 5 October 2012 at 13:45

Thanks for outlining some possibilities. I've been considering this more since I posted, and it probably does make sense to separate them. So lets look at the concerns with these options.

HTTP header field: Accept: I've read that custom media types can break consumers. If we do go this way despite that, I think we should use vnd. rather than x, but I would defer to someone who's worked with them more.
HTTP header field: Pragma: Hadn't thought of this. Sounds reasonable, but I'd be interested in opinions from others.
non-standard request header: The X prefix has been deprecated. We would just go with Drupal-xxx-yyy.
URI parameter: Agreed, not very RESTful. I believe we want to avoid this if possible based on the emphasis the initiative has had on REST.

We will still have an issue with data modeling of literals for the basic case, though. I'm going to draft an explanation, hopefully will post today.

Comment #5

Anonymous (not verified) CreditAttribution: Anonymous commented 14 October 2012 at 18:21

Status:

Active

» Needs review

I propose that we go ahead with a vnd prefixed custom media type. The only consumers we expect to have requesting that are Drupal instances, so we don't have to worry about the fact that consumers won't recognize it. The JSON-LD module can have a Denormalizer and Decoder which are aware of that custom media type. And using a custom media type will make it easier for the Serializer to use the appropriate Normalizer and Encoder, which outputs the corresponding data structure.

I propose vnd.drupal.ld+json. Other JSON-LD requests will just use ld+json (or possibly just json).

I'm marking this as needs review so that we can make the decision. Any work that needs to happen can be in other issues.

Comment #5.0

Anonymous (not verified) CreditAttribution: Anonymous commented 14 October 2012 at 18:21

Issue summary:

View changes

Added JSON-LD equivalents.

Comment #6

Anonymous (not verified) CreditAttribution: Anonymous commented 15 October 2012 at 13:30

Title:

Decide how to handle literal values (e.g. text, numbers) in JSON-LD

» Decide how to negotiate between the 2 JSON-LD serializations

I updated the issue summary to reflect the consensus in this issue, and to post the proposal from the last comment.

Comment #7

Anonymous (not verified) CreditAttribution: Anonymous commented 15 October 2012 at 13:50

Issue tags:

+WSCCI

Comment #8

Crell CreditAttribution: Crell commented 15 October 2012 at 18:52

vnd. media types is what I figured we'd use, so I'm fine with that. I'm a bit unclear though on the difference between the two formats. One is the "internal" (Drupal-proprietary data) format, application/vnd.drupal+json, and the other is the "sanitized for public consumption" format, application/ld+json? (Is the latter the standard mime type? I don't know.)

Comment #9

scor CreditAttribution: scor commented 15 October 2012 at 19:42

the other is the "sanitized for public consumption" format, application/ld+json? (Is the latter the standard mime type? I don't know.)

@crell: This is the MIME type currently defined in the JSON-LD Syntax 1.0, it is not standardized yet and still under review, but the hope is that it will be standardized by W3C/IETF in the future.

Comment #10

Anonymous (not verified) CreditAttribution: Anonymous commented 15 October 2012 at 20:04

I'm a bit unclear though on the difference between the two formats.

The difference would be how they structure values for certain field types; Specifically, things that correspond to primitives like text and numbers. Many consumers expect primitives to be handled as direct properties of the entity, not as properties of an intermediary object/array. To accomodate those external consumers, we would use the common media type. To provide the richer data that Drupal knows how to consume, we would use the vendor specific media type.

For a body field, application/ld+json would return the following structure, where the body value and the summary are direct properties of the entity.

{
  "@id": "site:node/1",
  "schema:articleBody": [
    "This is body text, very long" 
  ],
  "schema:description": [
    "This is summary text"
  ]
}

For the same body field, application/vnd.drupal.ld+json would return the following structure.

{
  "@id": "site:node/1",
  "site:field_body": [
    {
      "site:value": "This is body text, very long",
      "site:summary": "This is summary text",
      "site:format": "full_html"
    }
  ]
}

Comment #11

scor CreditAttribution: scor commented 15 October 2012 at 20:44

I looked a bit more into the header options of #3 and #4:
- RFC 6648 deprecates and discourages the X- and x. forms, and that applies to both the "HTTP header field: Accept" and the "non-standard request header"
- I'm not worried about the vnd.* mime type break consumer since this MIME type won't be served by default, and it is safe to assume that any consumer requesting for application/vnd.drupal.ld+json should know how to support it

So I'm on board with application/vnd.drupal.ld+json. Leaving this issue open some more in case fago has anything to add re. the Entity property API aspect of this maybe?

Comment #12

lanthaler CreditAttribution: lanthaler commented 16 October 2012 at 14:51

The cleanest solution is probably to use a profile media type parameter to distinguish between two valid JSON-LD serializations. We just decided in today's telecon to add a "profile" parameter for just this: https://github.com/json-ld/json-ld.org/issues/164

Hope this helps

Comment #13

Anonymous (not verified) CreditAttribution: Anonymous commented 16 October 2012 at 15:22

Thank you for initiating that conversation on the call, Markus.

I think that this is a solid approach. On the call, they discussed that the usage of profile would be in line with a current proposal to the IETF. I believe they may have been referencing this proposal. So I believe that our Accept header could look something like the following:

application/ld+json; profile="http://drupal.org/project/deploy/content-staging"

We can defer the decision as to what URI to use for the profile.

We currently wouldn't be able to take advantage of this profile easily, since our stopgap ContentNegotiation class can only negotiate based on media type. However, we will hopefully be able to take advantage of that when fabpot adds full conneg to Symfony.

@scor, I know you are following the Symfony conneg work, could you make sure those folks know about this?

I propose that we use the custom media type until we switch over to Symfony's improved conneg, at which point we hope to use the profile media type parameter.

Comment #14

Crell CreditAttribution: Crell commented 16 October 2012 at 16:50

Oh dearie dear. The current Symfony coneg work would not handle profile, I believe. It's still rather primitive. I'm not sure I want to bank on that level of elaborate use of Accept headers being supported, even if it is the "pure true" approach. vnd.* mime types are much easier to support.

With Lin's clarification I am +1 on moving forward with the two mimetypes for now; we can revisit using profile later, even after feature freeze, or in concept if we have a proper negotiation library there's nothing stopping contrib from supporting the profile version, either.

Comment #15

scor CreditAttribution: scor commented 16 October 2012 at 17:25

@lanthaler great idea re the profile parameter. I was toying with a similar approach in the second example of the HTTP header field: Accept, but could not come up with a name for a parameter that would be generic enough and non-Drupal specific, but profile with our own URI seems like a sound idea.

@linclark right, left a comment to that effect. afaict, it is not supported at the moment, and I'm not even sure it will be in the first version of this Symfony conneg lib. What about using the vnd MIME type for now (since it is basically supported) in order to not hold progress on this issue, and then revisit the profile idea (which I like better) later after Dec 1st?

Comment #16

Anonymous (not verified) CreditAttribution: Anonymous commented 16 October 2012 at 17:28

@scor I believe your question means you agree with my proposal, correct me if I'm wrong.

I propose that we use the custom media type until we switch over to Symfony's improved conneg, at which point we hope to use the profile media type parameter.

Comment #17

scor CreditAttribution: scor commented 16 October 2012 at 17:29

@linclark #16 absolutely! +1

Comment #18

effulgentsia CreditAttribution: effulgentsia commented 16 October 2012 at 18:10

Status:

Needs review

» Reviewed & tested by the community

+1 to the 2 mime types until such time as Symfony's content negotiation library supports Profile in the Accept header. If 8.0 ships with the 2 mime types, I see no problem with that.

Looks like there's consensus here, so setting to RTBC. @linclark: is there anything else you need before setting it to fixed?