Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
When a JSON API request uses the page[size]=10
query parameter, the response returned will have fewer entities than the requested page size if any of the entities in that "page" are inaccessible by the user.
Proposed resolution
Create documentation pages that explain what is going on and how to fix it. See #31.
Original report by [username]
I'm sorting by a date field, and applying a page[size]=10 filter to the call, and the results are returned in reverse - the first page consisting of ~6 items, the rest the full 10. Seems like it's the last page being displayed first.
Comment | File | Size | Author |
---|
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commenteddrewbolles created an issue. See original summary.
Comment #2
e0ipsoAs you can see in the tests (which are great examples) you should use
page[limit]
now instead ofpage[size]
.Comment #3
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commentedMy tests still try to use page size - im on alpha3, should I be using the dev version?
Comment #4
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commentedComment #5
hampercm CreditAttribution: hampercm at Acquia commentedI haven't been able to reproduce this bug. These are the steps I tried:
1. Added a Date field to the Article content type.
2. Created 5 Articles with ascending dates, titled: One, Two, Three, Four, and Five
3. CURLed
http://drupal-8x.dd:8080/jsonapi/node/article?_format=api_json&sort[sort-date][path]=field_date&sort[sort-date][direction]=DESC&page[limit]=3&page[offset]=0
and got: Five, Four, Three4. CURLed
http://drupal-8x.dd:8080/jsonapi/node/article?_format=api_json&sort[sort-date][path]=field_date&sort[sort-date][direction]=DESC&page[limit]=3&page[offset]=3
and got: Two, OneRepeated with direction ASC and again got expected results:
One, Two, Three on offset=0,
Four, Five on offset=3
Is this similar to what you tried, @drewbolles ?
Comment #6
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commentedSo I had unpublished entities that were included in the page limit - and then the data was in a meta property I had missed before, so that's my bad.
I had assumed default behavior was to only return published nodes, and when I added a status filter to the call I received the nodes in the order I expected.
So we can indeed close this out! Thank you so much for looking into this, and for the great module.
Comment #7
dawehnerThis is a good point. I think we should filter out unpublished entities by default indeed. People could still opt out, if needed, but we need some proper test coverage for it.
Maybe some similar logic to
\Drupal\node\Plugin\views\filter\Status
should be applied on top.Comment #8
e0ipsoThe problem is not with published nodes, the problem is that the current user doesn't have access. There are many ways to affect access in Drupal (most notably node grants / entity grants). The feature request you need is in the Entity Query API, to be able to only return entities that have access granted for the current user. Any workaround circumventing that is going to be hacky (and should probably be custom code).
Comment #9
hampercm CreditAttribution: hampercm at Acquia commentedI think filtering out unpublished nodes by default is a bad idea; it goes against the intent of JSON API as a general-purpose API. Many use cases will need unpublished nodes included in responses. The current behavior is also consistent with how the Core REST API works.
Unpublished nodes won't be listed unless the user has access to them, so use of Drupal's permissions system can accomplish the filtering-out of unpublished nodes.
Comment #10
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commentedI think the only reason I assumed unpublished would be filtered is maybe just past default views config from core, was certainly not the best assumption.
I know the docs aren't quite complete, but just mentioning that unpublished nodes will be returned if proper user access checks aren't set up (my front-end doesn't know anything about Drupal users or permissions) you'll need to make sure to add filter[status][value]=1 to ensure only published returns might be beneficial.
Comment #11
hampercm CreditAttribution: hampercm at Acquia commentedDrupal's user and permission checks ARE performed when JSON API is formulating a response. If those are set up properly, a front-end won't need to worry about filtering.Re-read #10. Yes, adding a bit on that to the docs seems like a good idea.
Comment #12
Anonymous (not verified) CreditAttribution: Anonymous at Chapter Three commentedI'm totally speaking from my rear end about how the module / Drupal is behaving, but the permissions & access are correct on my Drupal site, but my front-end is just pinging a URL is all I meant by "doesn't know anything" about the perms. I could have a small configuration error on my API side that let's unpublished nodes out w/o the filter, though.
Comment #13
e0ipsoThis should be a documentation issue. Please feel free to close this when the docs are added.
Comment #14
hampercm CreditAttribution: hampercm at Acquia commentedI hadn't realized that the inaccessible items were "taking up space" in the paged collection. This seems like more than a documentation issue; it feels very counterintuitive and not a good DX.
Comment #15
e0ipso@hampercm I agree, but unless Entity Query can return only accessible entities for the current user, the fix for this is not reasonable in terms of complexity.
Comment #16
dawehnerEntity access is tricky. There is a difference between access to a single entity and access to a list of them.
For a single entity using
$entity->access()
is absolutely the right API. For listing of entities the query needs to already return the right entities, otherwise a pager doesn't work.In views we are solving the problem by automatically adding the right query access tags, which for example node picks up and alters the query as needed. Node query access though by default doesn't take into account node status, as its not enabled by default. That is the reason why views adds the node.status filter by default. This is the underlying reason for my previous reasoning.
Given that we IMHO should do the following:
Add the status field by default, unless the user has access of (view all unpublished
$entity_type
) (feels hacky). Especially when we deal with anonymous users, node query access is often not applied, but we need to filter by node status.Comment #17
e0ipso@dawehner I thought that (http://cgit.drupalcode.org/jsonapi/tree/src/Query/QueryBuilder.php#n83) adding
->accessCheck(TRUE)
was enough to get all the access tags to the query. Should we do anything else to accomplish:On the other idea, I think that adding a hidden default filter to the listing will impact DX badly.
Comment #18
dawehnerOh yeah those 2 things are synonmys. Views just doesn't use entity query, so we need the underlying API.
But as said, when we drop the $entity->access() call, we need to at the node status field by default, unless opted out.
Comment #19
Wim Leers@dawehner, can you write an initial patch? With your Views expertise, you know much more about this than we do!
Comment #20
dawehnerSo this is kinda how this would look like. It just sadly requires node specific logic.
Comment #22
dawehnerHere is an alternative which iterates over items until we have until. It is not necessarily the nicest solution though.
Comment #23
e0ipsoI think that the feature we are trying to solve here is more confusing than the current state of things. I don't mind having the access errors shown in the collection, if they are a real burden then consumers can add the
published
(or alternative) filter to avoid it.Dramatization
Consumer: Hmmm I'm getting many unpublished nodes in my collection that lead to 403. It's nice that the partial error message tells me that I don't have access to view those nodes. I suppose I'll have to add the filter for
published: true
.Comment #24
hampercm CreditAttribution: hampercm at Acquia commentedI'm not sure there is a good code solution for this, as much as I'd prefer that. The approach in #20 is specific to the published state of nodes. Any other factor that excludes access to an entity would need to be handled individually, which is impractical. The approach in #22 is more general purpose, but it won't scale well to sites with a large number of nodes.
That leaves only a few possibilities I can come up with:
1) Fail the request with a 403 (or 400?) if any entities are inaccessible.
2) Use the iteration approach, and come up with a caching scheme to keep track of what DB offset to start from for a given JSON API query, paging parameters, and user. This seems like it would be complex, and have unpredictable performance.
3) Stay with the existing behavior and document it thoroughly.
Comment #25
dawehnerThere indeed seems to be no good solution and the two proposals posted above aren't an ideal solution either,
This would be though at least in line what entity reference is doing for their selection plugins:
\Drupal\node\Plugin\EntityReferenceSelection\NodeSelection::buildEntityQuery
,\Drupal\comment\Plugin\EntityReferenceSelection\CommentSelection::buildEntityQuery
and\Drupal\Core\Entity\Plugin\EntityReferenceSelection\DefaultSelection::buildEntityQuery
Well, at least for views we certainly tell people: You cannot rely on
$entity->access()
for your access checking, but rather you need to implement something on the query level. For simple cases this means that people HAVE TO add node.status as filter. For complex cases this means that people should better use the node grant system. What you call impractical is the reality of Drupal since a long time :)Comment #26
e0ipsoI advocate strongly for option 3) in #24.
As you could have guessed by my comment in #13
As pointer out by @dawehner in #25 this may be documented elsewhere already since this happens in many other places.
It also seems that Views is doing what I was proposing on #23. Again quoting @dawehner
Thoughts?
Comment #27
hampercm CreditAttribution: hampercm at Acquia commentedI agree with keeping the existing functionality and documenting.
Comment #28
dawehnerWhat about asking the security team what they think should be done here?
Comment #29
hampercm CreditAttribution: hampercm at Acquia commentedThere isn't really a security concern regarding this, as far as I can see, as a user without access to certain entities will get 403 errors for those, even as part of a collection.
Comment #30
dawehnerWell, maybe we should change the way how core deals with listings.
Comment #31
Wim LeersI think the main problem raised in this issue is:
This is extremely common. It's also not reasonable to ask that every client implement additional logic to check what the actual number of returned items is: if you got page 1 of 5, then you expect to have received 10 resources.
However…
This is also totally fair. And it solves the problem 99% of the time. Requiring the user to be explicit is sensible.
That then only leaves very complex set-ups. Those will have to implement the necessary things on the query level.
Conclusion: I agree that this should be fixed through documentation. We need to document:
published: true
solves the problem 95% of the timepublished
link, so that that is even more easily discoverable, and because that's the 95% case.True, but we need JSON API to move forward in the mean time anyway. Changing how core deals with listings is a massive undertaking.
Comment #32
e0ipsoIt seems that the agreement is moving towards fixing this in documentation. I'll rephrase the title and IS to reflect this.
This will require a code change. I'll add a ticket for it.
Comment #33
e0ipsoComment #34
e0ipsoComment #35
e0ipsoComment #36
e0ipsoComment #37
Wim LeersComment #38
gabesulliceI think this is resolved. See: https://www.drupal.org/docs/8/modules/json-api/pagination
If there are still unresolved issues, please open a new support request with the specific documentation needs.
Comment #39
Wim Leers🎉
Comment #40
Wim LeersNext step: #2857730: Add a link to get only published entities in collections.