Problem/Motivation
There's currently a big push among life sciences research institutions to adopt Bioschemas, a project that seeks to adapt and contribute to the usage
of schema.org in describing life sciences resources. It does so by:
- Definining new types and properties missing in schema.org that allow for the description of life science resources
- Defining new usage profiles for existing schema.org types to add relevant properties when defining a resource.
There is, currently, not a viable way of implementing Bioschemas in Drupal (or any other CMS for that matter), but the Schema.org Blueprints seems to get very close, and may allow for full Bioschemas compliance with just some minor changes.
Steps to reproduce
Try to create a content type that's defined on other schemas but not in schema.org, like Bioschemas.
Proposed resolution
Ideally, the plugin could just directly pull from the specifications just like it does for schema.org, and the user could just choose if they want to follow the Bioschemas specification on top of schemas.org. I realise that may be a substantial amount of work, so here's some elements that would allow for full Bioschemas compliance for users willing to tinker with the configurations:
- Allow for custom term definition, and for its introduction inside the schema.org hierarchy
- Allow for custom field definition (I realise that has just been patched)
- Allow for defining the
dct:conformsTo
property
. (Technically, that can already be done using custom JSON-LD, but it doesn't appear to let me change the position, and a dedicated field would be more user-friendly) - Ability to connect to ontologies of controlled vocabulary like EDAM (just a nice to have)
In short, I think what would be a very useful addition to the plugin the ability to define custom non-schema.org schemas, this is just an example of one such schema that is particularly gaining traction among institutions that handle biological data.
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #22 | 3373632-22.patch | 9.47 KB | jrockowitz |
Issue fork schemadotorg-3373632
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
jrockowitz commentedThe first and simplest question is where are bioschemas stored.
Theoretically, someone can override the schemadotorg.installer service (\Drupal\schemadotorg\SchemaDotOrgInstaller) and import any custom types and properties.
Comment #3
gpo commentedThe Bioschemas spec is here https://github.com/BioSchemas/specifications
I'm not the best PHPer but from looking at
SchemaDotOrgInstaller.phpwould I be correct to assume that if I can find a way to add Bioschemas' properties and types toschemaorg-current-https-properties.csvschemaorg-current-https-properties.csv, I'd be all set, right?Comment #4
jrockowitz commentedIn theory, you should be able to append the below files...
https://github.com/BioSchemas/schemaorg/blob/master/data/releases/3.9/ex...
https://github.com/BioSchemas/schemaorg/blob/master/data/releases/3.9/ex...
...to...
https://git.drupalcode.org/project/schemadotorg/-/blob/1.0.x/data/22.0/s...
https://git.drupalcode.org/project/schemadotorg/-/blob/1.0.x/data/22.0/s...
...and the Bioschemas types and properties will become available via the UI after running `drush schemadotorg:update-schema`
For now, manually merging the data should work. If it does work as expected, we can explore adding a hook or a programmatic approach to importing other schemas.
Comment #5
jrockowitz commentedIt looks like I am wrong about the CSV file hosted via https://github.com/BioSchemas/schemaorg/blob/master/data/releases/.
We need to find or generate a CSV that contains the bioschemas.
Comment #6
jrockowitz commentedI'm having a hard time figuring out what needs to be done. Keep in mind that I am not an expert on Schem.org.
I see that the http://bioschemas.org specification is being partially incorporated into https://schema.org as a pending spec.
@see Introduce BioSchemas terms into pending #2863
@see https://bioschemas.org/types/BioChemEntity/0.7-RELEASE-2019_06_19
@see https://schema.org/BioChemEntity
Comment #7
gpo commentedYeah Bioschemas can be a bit all over the place. Yes, some terms were incorporated into the main spec, but others (the ones that I'm actually more interested about, like
ComputationalTool,TrainingMaterial) were not, and as such they don't show up on the CSV on the repo that they forked from the main schema.org repository.From what I'm getting it probably seems to be just a matter of compiling the terms which haven't been added to the shemas.org spec and appending them to the CSV, which I haven't gotten around of trying yet.
Comment #8
gpo commentedJust gave it a go using some CSVs I generated and everything seems to work!
Having some sort of programmatic way of adding additional schemas seems perfect!
Comment #9
jrockowitz commentedMaybe the simplest solution is to make the source the CSV configurable. It would default to the local files, but we could use the below URLs
https://raw.githubusercontent.com/GilOliveira/schemadotorg/bioschemas/da...
https://raw.githubusercontent.com/GilOliveira/schemadotorg/bioschemas/da...
Comment #10
gpo commentedDefinitely! As my version overrides some original schema.org definitions.
Comment #11
gpo commentedOkay... found another hiccup.... the new properties I've added are not showing up on the "Manage Schema.org fields" form of the respective content type, although their respective entries appear in the property reports.
Example, I rewrote Organization like so
Notice how the second to last property is
https://bioschemas.org/socialMedia, which I then defined in the properties CSV:Yet it doesn't appear when adding a new Organization content type. Any chance I'm missing something?
Comment #12
jrockowitz commentedYeah, I'm going to have to update the code to strip out https://bioschemas.org/. I think the module is going to have to assume that all types and properties are unique, independent of their domain.
Comment #13
gpo commentedThat would be awesome! Thank you so much!
If it doesn’t work I can maybe try to work around it using something like
schema.org/bioschemas-xxxxx
But the added versatility may help people using other controlled vocabularies…
Comment #14
gpo commentedI just gave it a try, replacingbioschemas.orgwithschema.organd the property I added still doesn't show up in the form, although it appears on the reports.Disregard, clean reinstall fixed this.
Comment #15
jrockowitz commented@gpo Can you please post your bioschemas CSV for types and properties somewhere on GitHub? If I have an absolute URL to the data, I can test importing it.
The MR allows you to enter an external URL.
@see /admin/config/search/schemadotorg/settings/general
I did not implement any code to strip
https://bioschemas.org/Comment #17
jrockowitz commentedComment #18
gpo commentedHere they are:
https://github.com/BioData-PT/bioschemas-data/tree/schemadotorg
Comment #19
jrockowitz commentedI am getting a 404 via https://github.com/BioData-PT/bioschemas-data/tree/schemadotorg. Maybe you need to adjust the access to the file.
Comment #20
gpo commentedMy bad! Should be working now.
Comment #21
jrockowitz commentedSteps to review
Schema.org data file/URLtohttps://raw.githubusercontent.com/BioData-PT/bioschemas-data/schemadotorg/schemaorg-current-https-[TABLE].csvConfirm bioschema's ComputationalWorkflow (/admin/reports/schemadotorg/ComputationalWorkflow) works as expected
Comment #22
jrockowitz commentedAttached is a diff of the latest cleaned-up MR to make this easier to review.
Comment #23
jrockowitz commented@gpo I updated the MR. Are you using the MR? If yes, I am OK with merging the code.
Comment #25
jrockowitz commentedI think I like having this option available to allow people to import their own schemas. I am going to merge it and see if people use it.
Comment #27
mpp commentedI can second that, there are a lot more than schema.org, e.g. https://lov.linkeddata.es/dataset/lov/
@gpo, did you get the module to work on other vocabularies than schema.org?
Also, I don't fully understand why we import the vocabularies and application profiles as CSV data (vs linked data/triples)?
Comment #28
jrockowitz commentedIt was simpler and more performant. Keep in mind the Schema.org data is all managed via the 'schemadotorg.installer' and 'schemadotorg.schema_type_manager' are services and could be reworked to not use CSV data.
The [Schema.org configuration tool (RDF UI)](https://www.drupal.org/project/rdfui) does use linked data/triples.
@see https://git.drupalcode.org/project/rdfui/-/blob/8.x-1.x/src/EasyRdfConve...