Problem/Motivation

When a JSON API request uses the page[size]=10 query parameter, the response returned will have fewer entities than the requested page size if any of the entities in that "page" are inaccessible by the user.

Proposed resolution

Create documentation pages that explain what is going on and how to fix it. See #31.

Original report by [username]

I'm sorting by a date field, and applying a page[size]=10 filter to the call, and the results are returned in reverse - the first page consisting of ~6 items, the rest the full 10. Seems like it's the last page being displayed first.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

drewbolles created an issue. See original summary.

e0ipso’s picture

Status: Active » Closed (works as designed)

As you can see in the tests (which are great examples) you should use page[limit] now instead of page[size].

Anonymous’s picture

My tests still try to use page size - im on alpha3, should I be using the dev version?

Anonymous’s picture

Title: Sorting by a field, and leveraging page[size] seems to return some results in reverse » Sorting by a field, and leveraging page[limit] seems to return some results in reverse
Issue summary: View changes
Status: Closed (works as designed) » Active
hampercm’s picture

I haven't been able to reproduce this bug. These are the steps I tried:

1. Added a Date field to the Article content type.
2. Created 5 Articles with ascending dates, titled: One, Two, Three, Four, and Five
3. CURLed http://drupal-8x.dd:8080/jsonapi/node/article?_format=api_json&sort[sort-date][path]=field_date&sort[sort-date][direction]=DESC&page[limit]=3&page[offset]=0 and got: Five, Four, Three
4. CURLed http://drupal-8x.dd:8080/jsonapi/node/article?_format=api_json&sort[sort-date][path]=field_date&sort[sort-date][direction]=DESC&page[limit]=3&page[offset]=3 and got: Two, One

Repeated with direction ASC and again got expected results:
One, Two, Three on offset=0,
Four, Five on offset=3

Is this similar to what you tried, @drewbolles ?

Anonymous’s picture

So I had unpublished entities that were included in the page limit - and then the data was in a meta property I had missed before, so that's my bad.

I had assumed default behavior was to only return published nodes, and when I added a status filter to the call I received the nodes in the order I expected.

So we can indeed close this out! Thank you so much for looking into this, and for the great module.

dawehner’s picture

This is a good point. I think we should filter out unpublished entities by default indeed. People could still opt out, if needed, but we need some proper test coverage for it.
Maybe some similar logic to \Drupal\node\Plugin\views\filter\Status should be applied on top.

e0ipso’s picture

I had assumed default behavior was to only return published nodes, and when I added a status filter to the call I received the nodes in the order I expected.

The problem is not with published nodes, the problem is that the current user doesn't have access. There are many ways to affect access in Drupal (most notably node grants / entity grants). The feature request you need is in the Entity Query API, to be able to only return entities that have access granted for the current user. Any workaround circumventing that is going to be hacky (and should probably be custom code).

hampercm’s picture

I think filtering out unpublished nodes by default is a bad idea; it goes against the intent of JSON API as a general-purpose API. Many use cases will need unpublished nodes included in responses. The current behavior is also consistent with how the Core REST API works.

Unpublished nodes won't be listed unless the user has access to them, so use of Drupal's permissions system can accomplish the filtering-out of unpublished nodes.

Anonymous’s picture

I think the only reason I assumed unpublished would be filtered is maybe just past default views config from core, was certainly not the best assumption.

I know the docs aren't quite complete, but just mentioning that unpublished nodes will be returned if proper user access checks aren't set up (my front-end doesn't know anything about Drupal users or permissions) you'll need to make sure to add filter[status][value]=1 to ensure only published returns might be beneficial.

hampercm’s picture

Drupal's user and permission checks ARE performed when JSON API is formulating a response. If those are set up properly, a front-end won't need to worry about filtering.

Re-read #10. Yes, adding a bit on that to the docs seems like a good idea.

Anonymous’s picture

I'm totally speaking from my rear end about how the module / Drupal is behaving, but the permissions & access are correct on my Drupal site, but my front-end is just pinging a URL is all I meant by "doesn't know anything" about the perms. I could have a small configuration error on my API side that let's unpublished nodes out w/o the filter, though.

e0ipso’s picture

Title: Sorting by a field, and leveraging page[limit] seems to return some results in reverse » Page limit is not reached if there are inaccessible items in the collection
Issue tags: +Needs documentation

This should be a documentation issue. Please feel free to close this when the docs are added.

hampercm’s picture

I hadn't realized that the inaccessible items were "taking up space" in the paged collection. This seems like more than a documentation issue; it feels very counterintuitive and not a good DX.

e0ipso’s picture

@hampercm I agree, but unless Entity Query can return only accessible entities for the current user, the fix for this is not reasonable in terms of complexity.

dawehner’s picture

Entity access is tricky. There is a difference between access to a single entity and access to a list of them.
For a single entity using $entity->access() is absolutely the right API. For listing of entities the query needs to already return the right entities, otherwise a pager doesn't work.
In views we are solving the problem by automatically adding the right query access tags, which for example node picks up and alters the query as needed. Node query access though by default doesn't take into account node status, as its not enabled by default. That is the reason why views adds the node.status filter by default. This is the underlying reason for my previous reasoning.

Given that we IMHO should do the following:
Add the status field by default, unless the user has access of (view all unpublished $entity_type) (feels hacky). Especially when we deal with anonymous users, node query access is often not applied, but we need to filter by node status.

e0ipso’s picture

@dawehner I thought that (http://cgit.drupalcode.org/jsonapi/tree/src/Query/QueryBuilder.php#n83) adding ->accessCheck(TRUE) was enough to get all the access tags to the query. Should we do anything else to accomplish:

In views we are solving the problem by automatically adding the right query access tags

On the other idea, I think that adding a hidden default filter to the listing will impact DX badly.

dawehner’s picture

@dawehner I thought that (http://cgit.drupalcode.org/jsonapi/tree/src/Query/QueryBuilder.php#n83) adding ->accessCheck(TRUE) was enough to get all the access tags to the query. Should we do anything else to accomplish:

In views we are solving the problem by automatically adding the right query access tags

Oh yeah those 2 things are synonmys. Views just doesn't use entity query, so we need the underlying API.

But as said, when we drop the $entity->access() call, we need to at the node status field by default, unless opted out.

Wim Leers’s picture

@dawehner, can you write an initial patch? With your Views expertise, you know much more about this than we do!

dawehner’s picture

Status: Active » Needs review
FileSize
8.48 KB

So this is kinda how this would look like. It just sadly requires node specific logic.

Status: Needs review » Needs work

The last submitted patch, 20: 2844375-20.patch, failed testing.

dawehner’s picture

FileSize
2.06 KB

Here is an alternative which iterates over items until we have until. It is not necessarily the nicest solution though.

e0ipso’s picture

I think that the feature we are trying to solve here is more confusing than the current state of things. I don't mind having the access errors shown in the collection, if they are a real burden then consumers can add the published (or alternative) filter to avoid it.


Dramatization

Consumer: Hmmm I'm getting many unpublished nodes in my collection that lead to 403. It's nice that the partial error message tells me that I don't have access to view those nodes. I suppose I'll have to add the filter for published: true.

hampercm’s picture

Issue summary: View changes

I'm not sure there is a good code solution for this, as much as I'd prefer that. The approach in #20 is specific to the published state of nodes. Any other factor that excludes access to an entity would need to be handled individually, which is impractical. The approach in #22 is more general purpose, but it won't scale well to sites with a large number of nodes.

That leaves only a few possibilities I can come up with:

1) Fail the request with a 403 (or 400?) if any entities are inaccessible.

2) Use the iteration approach, and come up with a caching scheme to keep track of what DB offset to start from for a given JSON API query, paging parameters, and user. This seems like it would be complex, and have unpredictable performance.

3) Stay with the existing behavior and document it thoroughly.

dawehner’s picture

I'm not sure there is a good code solution for this, as much as I'd prefer that. The approach in #20 is specific to the published state of nodes. Any other factor that excludes access to an entity would need to be handled individually, which is impractical. The approach in #22 is more general purpose, but it won't scale well to sites with a large number of nodes.

There indeed seems to be no good solution and the two proposals posted above aren't an ideal solution either,

This would be though at least in line what entity reference is doing for their selection plugins:
\Drupal\node\Plugin\EntityReferenceSelection\NodeSelection::buildEntityQuery, \Drupal\comment\Plugin\EntityReferenceSelection\CommentSelection::buildEntityQuery and \Drupal\Core\Entity\Plugin\EntityReferenceSelection\DefaultSelection::buildEntityQuery

Any other factor that excludes access to an entity would need to be handled individually, which is impractical.

Well, at least for views we certainly tell people: You cannot rely on $entity->access() for your access checking, but rather you need to implement something on the query level. For simple cases this means that people HAVE TO add node.status as filter. For complex cases this means that people should better use the node grant system. What you call impractical is the reality of Drupal since a long time :)

e0ipso’s picture

I advocate strongly for option 3) in #24.

3) Stay with the existing behavior and document it thoroughly.

As you could have guessed by my comment in #13

This should be a documentation issue. Please feel free to close this when the docs are added.


As pointer out by @dawehner in #25 this may be documented elsewhere already since this happens in many other places.

What you call impractical is the reality of Drupal since a long time :)

It also seems that Views is doing what I was proposing on #23. Again quoting @dawehner

For simple cases this means that people HAVE TO add node.status as filter. For complex cases this means that people should better use the node grant system.

Thoughts?

hampercm’s picture

I agree with keeping the existing functionality and documenting.

dawehner’s picture

What about asking the security team what they think should be done here?

hampercm’s picture

There isn't really a security concern regarding this, as far as I can see, as a user without access to certain entities will get 403 errors for those, even as part of a collection.

dawehner’s picture

Well, maybe we should change the way how core deals with listings.

Wim Leers’s picture

I think the main problem raised in this issue is:

if I request 10 resources per page, then I want 10 resources, otherwise my app's listings will be messed up

This is extremely common. It's also not reasonable to ask that every client implement additional logic to check what the actual number of returned items is: if you got page 1 of 5, then you expect to have received 10 resources.

However…

I suppose I'll have to add the filter for published: true.

This is also totally fair. And it solves the problem 99% of the time. Requiring the user to be explicit is sensible.

That then only leaves very complex set-ups. Those will have to implement the necessary things on the query level.

Conclusion: I agree that this should be fixed through documentation. We need to document:

  1. that JSON API does not automatically exclude unpublished entities from its collections
  2. that filtering published: true solves the problem 95% of the time
  3. that if you have more complex access control logic, that you'll need to implement that on the query level — we should also include an example of this
  4. perhaps… we can include a published link, so that that is even more easily discoverable, and because that's the 95% case.

Well, maybe we should change the way how core deals with listings.

True, but we need JSON API to move forward in the mean time anyway. Changing how core deals with listings is a massive undertaking.

e0ipso’s picture

It seems that the agreement is moving towards fixing this in documentation. I'll rephrase the title and IS to reflect this.


This will require a code change. I'll add a ticket for it.

perhaps… we can include a published link, so that that is even more easily discoverable, and because that's the 95% case.

e0ipso’s picture

Title: Page limit is not reached if there are inaccessible items in the collection » Document excluded inaccessible entities for DX reasons
Issue summary: View changes
Status: Needs work » Active
e0ipso’s picture

e0ipso’s picture

e0ipso’s picture

Category: Bug report » Task
Wim Leers’s picture

Component: Miscellaneous » Documentation
gabesullice’s picture

Status: Active » Fixed

I think this is resolved. See: https://www.drupal.org/docs/8/modules/json-api/pagination

If there are still unresolved issues, please open a new support request with the specific documentation needs.

Wim Leers’s picture

Issue tags: -Needs documentation

🎉

Wim Leers’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.