Problem/Motivation
The search index with the file entities can be set up in the search_api module.
File entities don't have published/unpublished state so they all get indexed regardless whether they are private or public files.
For example, a client has a webform. The search is not indexing Webform submissions. It is indexing File entities which are created when a file is uploaded to a webform submission (private path).
Proposed resolution
Wrote a data alteration plugin regarding this issue
Pls help for your kindly review and test so I can improve it, thanks
Comment | File | Size | Author |
---|---|---|---|
#7 | 3122167-6--exclude_private_files.patch | 1.98 KB | drunken monkey |
Comments
Comment #2
pandaski CreditAttribution: pandaski commentedComment #3
pandaski CreditAttribution: pandaski commentedComment #4
drunken monkeyThanks a lot for posting, looks great!
Just a few formatting changes, plus this:
This should only match at the beginning of the URI, right? Then I think
substr($file->uri, 0, 10) === 'private://'
will be both faster and more accurate. (Or, otherwisestrpos($file->uri, 'private://') === 0
.)Otherwise, I think this is RTBC. Please test/review the attached patch to make sure it still works correctly!
Comment #5
pandaski CreditAttribution: pandaski commentedLove this way :)
@drunken monkey thanks for your kind review. I think it is currently ready and we have tested it for our distribution.
Comment #7
drunken monkeyGreat, thanks a lot for the feedback!
Committed. Thanks again!
For D8, the attached would probably be the most sensible way to implement this. (Or would this be too disruptive for people with search indexes containing files? Should we maybe start introducing config for that processor, to easily exclude it for a specific type, now that it covers a wider and wider range of types?)
Anyways, I don’t have private files or a file index set up, so would be great if someone with real use for this could give it a try.
Also, it will need automatic test coverage, too, in any case.