Problem/Motivation
When an entity has multiple entity reference items in the same field item list which reference the same entity, the response fails JSON API schema validation with the message, "There are no duplicates allowed in the array", in reference to the resource identifiers under the relationship field in question.
Proposed resolution
Because the JSON API spec presumes that different relationships will be represented differently, we need to add information to the meta member of our relationship resource identifiers to differentiate distinct entity reference items.
As was proposed on in the JSON API issue queue, we should add an arity to the meta member when serializing EntityReferenceItems as resource identifier objects.
Remaining tasks
User interface changes
API changes
Data model changes
Comments
Comment #2
e0ipsoI believe this is a bug, and a serious one. I suspect that the schema validation is complaining about adding duplicate item multiple times, which should not be allowed by the spec.
@Grimreaper can you share the entry in your logs to see the exact error this is throwing?
Comment #3
grimreaperIn dblog I obtain 3 entries when requesting jsonapi/node/article, I only have one article that references 2 times a tag and 1 time another tag:
1
severity: debug
type: jsonapi
2
severity: debug
type: jsonapi
3
severity: error
type: php
@e0ipso: Do you need more info?
Comment #4
grimreaperHello,
I confirm that the problem occurs when the same entity is referenced twice in the same field.
If it is referenced on a different field, for example, two tags fields on the article content type, each referencing the same tag, there is no problem in this case.
I will try to make a patch that add a test that should currently fail.
Comment #5
grimreaperThis patch add a test that should fail.
Comment #7
grimreaperIn schema.json file, line 255
If I remove the
"uniqueItems": trueit is ok. I will check the JSONAPI spec.Comment #8
grimreaperHello,
Here is a patch with the test and that remove the line in schema.json.
I searched why there was this restriction in the JSON API spec but I didn't find why.
I found the schema.json used in jsonapi module on the URL http://jsonapi.org/schema
But on the page http://jsonapi.org/format, I didn't see any restriction on relationships. I saw http://jsonapi.org/format/#document-compound-documents
But that apply to the main document not the relationships, right?
I will go on another issue while waiting for feedback on this one.
Comment #9
hampercm commentedIn JSON API, a relationship is a logical linkage between two different resources: "Resource 'A' is related to Resource 'B' by way of the 'author' relationship," for instance. From that perspective, it doesn't make sense to have Resource 'A' related to Resource 'B' twice; a relationship either exists, or doesn't exist between two resources.
I'm not sure this is a bug, per se. It seems more like a limitation of the standard as it is defined.
Modifying the
schema.jsonfile is not a good idea, as that comes from an upstream source: https://github.com/json-api/json-api/blob/gh-pages/schemaIt might make sense to submit a PR for the JSON API standard itself, or that schema file at the above link and see what comes of that. Otherwise, the fix may be to disable validation on sites that need to have duplicates in a relationship by removing the validation library. It is meant to be a dev-only requirement, anyway.
Comment #10
grimreaperHello,
@hampercm: Thanks for your reply.
I agree that if JSONAPI module schema.json diverges from the official one is not the best solution or even "a solution". My patch was just to highlight the problem.
And I agree that having the same relation twice is a very rare case but it is a possibility in Drupal.
I will try to open a discussion on github.
But I don't understand your sentence "It is meant to be a dev-only requirement, anyway."
Comment #11
hampercm commented@Grimreaper the response validation should only be running if:
1. You've
composer required the JSON API module with the--devoption, to install development-only dependencies.2. You have assertions enabled in PHP.
In other words, validation should only be running in a development environment, not in production, and only if you really want it to be running.
Comment #12
grimreaper@hampercm : Thanks for the reply. Ok I have forgotten that the library "justinrainbow/json-schema" is for dev only.
Comment #13
grimreaperAfter seeing the contribution guidelines https://github.com/json-api/json-api/blob/gh-pages/CONTRIBUTING.md, I opened a discussion on http://discuss.jsonapi.org/t/impossible-to-have-a-relationship-on-anothe...
Comment #14
e0ipsoThanks for your big help with this @Grimreaper and @hampercm.
This is blocked until we can get a response from the JSON API maintainers. Maybe it's worth adding a PR against the schema as well so it is clearer what the issue is about?
Comment #15
e0ipsoRemoving the RC blocker.
Comment #16
grimreaperHello,
@e0ipso: Thanks for your reply.
Pull request created: https://github.com/json-api/json-api/pull/1156
Comment #17
wim leersI updated the PR: https://github.com/json-api/json-api/pull/1156#issuecomment-351113732
Comment #18
wim leersComment #19
wim leersJust mentioning that nearly 5 months later, there still is no response from the JSON API spec maintainers: https://github.com/json-api/json-api/pull/1156.
Clarifying issue title, updating issue metadata. This is not a bug in the JSON API module, it's a limitation of the JSON API spec. And it currently only affects DX for "dev" sites, "prod" sites do not (should not!) have JSON API response schema validation enabled, and hence are unhampered by this, which makes this a DX-only issue. (Although perhaps certain JSON API clients do also make this assumption!)
Comment #20
gabesulliceHeh, I've never noticed this issue before. I'm sorry to throttle the the issue metadata, but after digging in, I think I've had some revelations which will give us a path forward.
I disagree. This is another case of us trying to "serialize the entity reference field" rather than represent relationships between resources. Resource identifiers are not meant to tell you about field values. They exist to tell an API consumer about linkage between resources.
FWIW, a contributor to the spec did respond. The (distilled) answer was that multiple distinct relationships should have differentiating data under the
metamember of the resource identifier.I agree with that assessment.
Two notes from our own documentation:
Therefore, marking this as a bug and updating the title/issue summary (again). Also getting rid of the DX tag, because this really is a spec issue (albeit a very subtle one). Keeping the "minor" priority because it only significantly impacts dev sites, as @Wim Leers pointed out.
Comment #21
gabesulliceComment #22
wim leersThanks for challenging my assessments. ❤️ That's what helps build better software!
Adding the issue comment you linked to as a related issue.
Your argument makes sense. I'm fine with the approach you propose (i.e. do what a JSON API spec contributor said). I think that makes this a 2.x issue, because it could be considered a BC break for some.
More importantly, I think this actually points to a weakness or at least ambiguity in Drupal's data modeling in general, and entity reference fields in particular. Is this really something Drupal is explicitly designed to do? I searched, and found the following relevant issues:
entityreference_view_widget) #2301131: Allow the same entity to be referenced twice in the same field — this lists an explicit use caseentityreference) #1572938: Can Multi Value Entity reference filed be unique?\Drupal\Core\Entity\Plugin\Validation\Constraint\ValidReferenceConstraintdoes not have a message about "duplicate reference", confirming that this is not something that core's entity reference field type cares about.Note: #1801304: Add Entity reference field added the contrib
entityreferencemodule to core, in February 2013.Conclusion: we'll need to expose this through
meta, because we will definitely not be able to make core prevent duplicate references without breaking lots of sites.Comment #23
wim leersComment #24
gabesulliceCan you elaborate on that? I don't see it.
Comment #25
wim leersThis would introduce a
metakey in a location that today has nometakey. I suspect quite a few applications will have been written without taking this into account.Of course, per our documentation, we don't consider this a BC break. So you're right.
Comment #26
wim leersAdding unit test coverage.
Comment #28
wim leersGreat, that failed!
And here's the fix, which adds
[meta][arity].Comment #29
wim leersAlmost forgot: we can now revert the JSON API schema change being made by earlier iterations of this patch!
Comment #30
gabesullice@Wim Leers, this is great work, but I think it went in a slightly different direction than I was thinking it would. BUT, you were following the suggestion exactly, I just misread it to begin with. I'm hopeful that this can work, but I'm concerned about one thing.
I had thought we would end up with:
I think "arity" is the wrong word for this example. I should have realized that before.
IOW, that we would represent every relationship, not collapse them.
With this approach, is it possible that multiple entity reference items with a shared target_id but different values for their extra properties will lose data?
The use case that I'm thinking of is an image field referencing the same image twice, but with different alt and title text.
Comment #31
e0ipsoThat's a very good point @gabesullice.
Comment #32
wim leersWoah, that's even MORE edge-casey! I'd never even thought of that.
This honestly seems fairly contrived.
Regardless of what I or we think, here's the thing:
… this still violates the JSON API spec/the
uniqueItemsconstraint I just restored in #29… 😢Here's the other thing:
As long as the pairs of metadata remain intact, we can retain what I'm doing in #28/#29 and hence comply with the JSON API spec:
Let me try that.
Comment #33
wim leersExpanded test coverage. This should fail.
This is what I set out to achieve, for a very complex example:
Comment #34
wim leersAnd logic.
Back to both of ya. 🏓
Comment #36
gabesulliceI'm not yet convinced this is true. Looking at
justinrainbow/json-schema, it seems likeuniqueItemsis doing a deep equality test.Looking at the spec itself, there is no preclusion against multiple resource identifiers under a relationship field. In fact, the spec says elsewhere WRT to "full linkage" that "Compound documents require “full linkage”, meaning that every included resource MUST be identified by at least one resource identifier object in the same document." (emphasis mine).
Are we really sure that the schema is invalid when the type and ID are duplicated while the meta information contains unique data? If the answer to this is "yes", #34 is seems like an elegant compromise.
Comment #37
wim leersBut … isn't that exactly what was reported originally in this issue? Isn't that exactly what we've been working towards? It definitely contradicts @hampercm's excellent explanation in #9 and I think it also contradicts your own comment at #20?
The bits you're quoting about full linkage are specifically for compound documents. This is about resource linkage.
No, but how can we be? The spec is not specific enough about this. The closest information we have is this bit under http://jsonapi.org/format/#document-resource-object-linkage:
I think your proposal in #30 is literally contradicting your argument in #20, where you wrote:
#30 says
arityis the wrong word, which likely means the preferable word isdelta. If that's a correct interpretation, then that would mean we'd be literally "serializing the entity reference field".Really curious to see your reply :D
P.S.: there's a lot of "you" in what I wrote, but this is of course not personal/not about attacking you! It's merely disagreeing with the proposal :) Sorry if that sounded harsh! It's late and I'm tired :)
Comment #38
gabesulliceIn #20 I said: "The (distilled) answer was that multiple distinct relationships should have differentiating data under the meta member of the resource identifier."
That's consistent with not believing that "[Multiple resource identifiers with unique metadata] still violates the JSON API spec/the uniqueItems constraint." Because it's the "differentiating data under the meta member" that makes the resource identifier unique. My idea was that adding an "arity" to each duplicated resource identifier would provide that "differentiating data". I hope my examples below will make that more clear.
Later, I said: "I'm thinking of is an image field referencing the same image twice, but with different alt and title text." You thought was MORE edge-casey.
The point I was trying to make is that it's plausible to have duplicate type + IDs representing different relationships. Perhaps my example was a bit contrived, but I do think it's plausible. Think of a blog post that's getting written and prepared for publishing, the author might know that they want to add a few images, but wants to find them later or have illustrations made. The author just references a standard placeholder kitten image with the alt and title text for the image they want to be there. Later, the author may plan to get the rights to some stock photography or have a designer make some illustrations. That could easily happen in a real Drupal site (I've done it this way :P).
Take another example that may illustrate the point:he case I'm trying to make clear is that it is possible to have different relationships between the same two entities that changes over time. Like this:
It's important that they are not collapsed for that reason. Otherwise, we'll have to really bend over backwards to accommodate POST/PATCH to relationships.
These are two sides of the same coin.
Let's be clear, there are TWO pieces to this argument. Violating the specification and passing schema validation.
The original issue is about the latter, passing validation. We aren't technically violating the spec here. The
json.schemahas auniqueItemsconstraint that is derived from the spirit of the spec.The constraint that we're violating is simply that the resource identifier objects must be unique.
When I asked "Are we really sure that the schema is invalid when the type and ID are duplicated while the meta information contains unique data?" I meant "does an example like the one above with "friend" and "colleague" pass or fail schema validation?"
I'm pretty sure that the examples above would pass (because of the deep equality check I referenced). While the example below without differentiating metadata would fail...
The point of adding some unique data under the meta key was simply to pass validation on the edge case where there is not already unique information under the meta property.
I intentionally chose
arityto avoid a naming collision with Drupal's use ofdelta. In my head, I was thinking that the result would look something like this:Notice that "arity" is not the same as the Drupal's delta. The arity doesn't have to do with field item delta because
0and1both appear twice.It's a simple trick to make the first two items pass schema validation while only adding a very little bit of irrelevant data to the second two items that don't actually need it.
I prefer this trick to "collapsing" the items because it avoids adding a new structural convention under the meta property à la partial-success. IOW, a client that needs to know that there is both a "friend" and "colleague" connection would not have to write special code just for our implementation of JSON API to "uncollapse" the
meta.metasproperty it would just work as expected.Comment #39
wim leers😂 👏
What an EXCELLENT example! 👌 I wish I'd thought of that! That's indeed far less contrived, more realistic!
And this is an EXCELLENT remark too! That would indeed be an important consequence!
Thanks for disambiguating this! That's indeed very important to keep in mind while discussing this :)
Right, I realize that now. But I did not realize this before!
Right. And I interpreted
arityascount. Which is wrong, it's the position or order. Consequence of a non-native speaker interpreting terms he hasn't heard before based on the context, and reaching the wrong conclusion. Sorry about that! 😊 At least, if I'm understanding https://en.wikipedia.org/wiki/Arity correctly.Also: you didn't choose arity,
@beaubydid at https://github.com/json-api/json-api/pull/1156#issuecomment-325377995, so don't apologize!+1
+1
+1
Works for me!
Comment #40
wim leersFirst, modifying the test coverage to implement #38. (@gabesullice, please review this carefully to ensure this is indeed what you intended!)
The good news is that thanks to the previous test coverage iterations, I only have to modify the existing test coverage to test AFAICT all edge cases :)
Comment #41
wim leersAnd logic changes. This should be green!
Comment #43
gabesulliceYay for shared understanding! This looks really good and the test coverage is awesome. I didn't think about not assigning
arityto single occurrences.Let's be really verbose in the comments.
Nit: Missing newline.
Let's also link to this issue.
s/Linkage/IdentifierObjects/I think this can be simplified. What about something like:
Alluded to this above, but even though we know these are UUIDs, let's key this by type + ID. Since the spec reads:
Comment #44
wim leersSelf-review:
Should be
ctimg1andctimg2. Fixed.Comment #45
wim leersAnd fixing remaining CS violations.
Comment #46
gabesulliceComment #49
gabesulliceComment #50
wim leers🎉
One more edge case that works as expected!
Comment #52
rpayanmMoving to Drupal core's issue queue.
I'm working on https://www.drupal.org/project/drupal/issues/3122113
Comment #53
yang_yi_cn commentedI believe the fix is not ported to core, as I'm experiencing this issue on local dev (with composer require --dev drupal/core-dev), as some comments said, PRD sites don't have dev package can run ok.
Can we reopen this issue?