Problem/Motivation

There's currently a big push among life sciences research institutions to adopt Bioschemas, a project that seeks to adapt and contribute to the usage
of schema.org in describing life sciences resources. It does so by:

  • Definining new types and properties missing in schema.org that allow for the description of life science resources
  • Defining new usage profiles for existing schema.org types to add relevant properties when defining a resource.

There is, currently, not a viable way of implementing Bioschemas in Drupal (or any other CMS for that matter), but the Schema.org Blueprints seems to get very close, and may allow for full Bioschemas compliance with just some minor changes.

Steps to reproduce

Try to create a content type that's defined on other schemas but not in schema.org, like Bioschemas.

Proposed resolution

Ideally, the plugin could just directly pull from the specifications just like it does for schema.org, and the user could just choose if they want to follow the Bioschemas specification on top of schemas.org. I realise that may be a substantial amount of work, so here's some elements that would allow for full Bioschemas compliance for users willing to tinker with the configurations:

  • Allow for custom term definition, and for its introduction inside the schema.org hierarchy
  • Allow for custom field definition (I realise that has just been patched)
  • Allow for defining the
    dct:conformsTo

    property

    . (Technically, that can already be done using custom JSON-LD, but it doesn't appear to let me change the position, and a dedicated field would be more user-friendly)
  • Ability to connect to ontologies of controlled vocabulary like EDAM (just a nice to have)
  • In short, I think what would be a very useful addition to the plugin the ability to define custom non-schema.org schemas, this is just an example of one such schema that is particularly gaining traction among institutions that handle biological data.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#22 3373632-22.patch9.47 KBjrockowitz
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

gpo created an issue. See original summary.

jrockowitz’s picture

The first and simplest question is where are bioschemas stored.

Theoretically, someone can override the schemadotorg.installer service (\Drupal\schemadotorg\SchemaDotOrgInstaller) and import any custom types and properties.

gpo’s picture

The Bioschemas spec is here https://github.com/BioSchemas/specifications

I'm not the best PHPer but from looking at SchemaDotOrgInstaller.php would I be correct to assume that if I can find a way to add Bioschemas' properties and types to schemaorg-current-https-properties.csv schemaorg-current-https-properties.csv, I'd be all set, right?

jrockowitz’s picture

In theory, you should be able to append the below files...

https://github.com/BioSchemas/schemaorg/blob/master/data/releases/3.9/ex...
https://github.com/BioSchemas/schemaorg/blob/master/data/releases/3.9/ex...

...to...

https://git.drupalcode.org/project/schemadotorg/-/blob/1.0.x/data/22.0/s...
https://git.drupalcode.org/project/schemadotorg/-/blob/1.0.x/data/22.0/s...

...and the Bioschemas types and properties will become available via the UI after running `drush schemadotorg:update-schema`

For now, manually merging the data should work. If it does work as expected, we can explore adding a hook or a programmatic approach to importing other schemas.

jrockowitz’s picture

It looks like I am wrong about the CSV file hosted via https://github.com/BioSchemas/schemaorg/blob/master/data/releases/.

We need to find or generate a CSV that contains the bioschemas.

jrockowitz’s picture

I'm having a hard time figuring out what needs to be done. Keep in mind that I am not an expert on Schem.org.

I see that the http://bioschemas.org specification is being partially incorporated into https://schema.org as a pending spec.

@see Introduce BioSchemas terms into pending #2863
@see https://bioschemas.org/types/BioChemEntity/0.7-RELEASE-2019_06_19
@see https://schema.org/BioChemEntity

gpo’s picture

Yeah Bioschemas can be a bit all over the place. Yes, some terms were incorporated into the main spec, but others (the ones that I'm actually more interested about, like ComputationalTool, TrainingMaterial) were not, and as such they don't show up on the CSV on the repo that they forked from the main schema.org repository.
From what I'm getting it probably seems to be just a matter of compiling the terms which haven't been added to the shemas.org spec and appending them to the CSV, which I haven't gotten around of trying yet.

gpo’s picture

it probably seems to be just a matter of compiling the terms which haven't been added to the shemas.org spec and appending them to the CSV, which I haven't gotten around of trying yet

Just gave it a go using some CSVs I generated and everything seems to work!

For now, manually merging the data should work. If it does work as expected, we can explore adding a hook or a programmatic approach to importing other schemas.

Having some sort of programmatic way of adding additional schemas seems perfect!

jrockowitz’s picture

Maybe the simplest solution is to make the source the CSV configurable. It would default to the local files, but we could use the below URLs

https://raw.githubusercontent.com/GilOliveira/schemadotorg/bioschemas/da...
https://raw.githubusercontent.com/GilOliveira/schemadotorg/bioschemas/da...

gpo’s picture

Maybe the simplest solution is to make the source the CSV configurable. It would default to the local files, but you could use the below URLs

Definitely! As my version overrides some original schema.org definitions.

gpo’s picture

Okay... found another hiccup.... the new properties I've added are not showing up on the "Manage Schema.org fields" form of the respective content type, although their respective entries appear in the property reports.

Example, I rewrote Organization like so

"https://schema.org/Organization","Organization","Bioschemas specification for describing a Organization in the life-science. Provides a way to describe bioscience organizations on the World Wide Web. It defines metadata terms that can be used in the code of web pages and applications, and builds on top of existing technologies and standards. The goal of the specification is to make it easier to discover, exchange and integrate life science organization profiles across the Internet.","https://schema.org/Thing","","","https://schema.org/description, https://schema.org/legalName, https://schema.org/name, https://schema.org/rdf:type, https://schema.org/sameAs, https://schema.org/topic, https://schema.org/alternateName, https://schema.org/contactPoint, https://bioschemas.org/fundingModel, https://schema.org/keywords, https://schema.org/location, https://schema.org/logo, https://schema.org/member, https://schema.org/memberOf, https://bioschemas.org/membershipCategory, https://schema.org/sameAs, https://bioschemas.org/status, https://schema.org/url, https://bioschemas.org/attachment, https://schema.org/budget, https://bioschemas.org/dateModified, https://schema.org/department, https://schema.org/dissolutionDate, https://bioschemas.org/founderMember, https://schema.org/foundingDate, https://schema.org/owns, https://schema.org/parentOrganization, https://bioschemas.org/socialMedia, https://schema.org/subOrganization","","","","https://bioschemas.org"

Notice how the second to last property is https://bioschemas.org/socialMedia, which I then defined in the properties CSV:

"https://bioschemas.org/socialMedia","socialMedia","Link to social media websites like twitter or facebook.","","","","https://schema.org/Organization","https://schema.org/URL","","","","https://bioschemas.org"

Yet it doesn't appear when adding a new Organization content type. Any chance I'm missing something?

jrockowitz’s picture

Yeah, I'm going to have to update the code to strip out https://bioschemas.org/. I think the module is going to have to assume that all types and properties are unique, independent of their domain.

gpo’s picture

That would be awesome! Thank you so much!

If it doesn’t work I can maybe try to work around it using something like
schema.org/bioschemas-xxxxx

But the added versatility may help people using other controlled vocabularies…

gpo’s picture

Yeah, I'm going to have to update the code to strip out https://bioschemas.org/. I think the module is going to have to assume that all types and properties are unique, independent of their domain.

I just gave it a try, replacing bioschemas.org with schema.org and the property I added still doesn't show up in the form, although it appears on the reports.

Disregard, clean reinstall fixed this.

jrockowitz’s picture

@gpo Can you please post your bioschemas CSV for types and properties somewhere on GitHub? If I have an absolute URL to the data, I can test importing it.

The MR allows you to enter an external URL.
@see /admin/config/search/schemadotorg/settings/general

I did not implement any code to strip https://bioschemas.org/

jrockowitz’s picture

Status: Active » Needs review
gpo’s picture

@gpo Can you please post your bioschemas CSV for types and properties somewhere on GitHub? If I have an absolute URL to the data, I can test importing it.

Here they are:
https://github.com/BioData-PT/bioschemas-data/tree/schemadotorg

jrockowitz’s picture

I am getting a 404 via https://github.com/BioData-PT/bioschemas-data/tree/schemadotorg. Maybe you need to adjust the access to the file.

gpo’s picture

My bad! Should be working now.

jrockowitz’s picture

Steps to review

  • Switch to branch
  • Reinstall module
  • Login as admin
  • Go to /admin/config/search/schemadotorg/settings/general
  • Set Schema.org data file/URL to https://raw.githubusercontent.com/BioData-PT/bioschemas-data/schemadotorg/schemaorg-current-https-[TABLE].csv
  • Click 'Save configuration'
  • Confirm bioschema's ComputationalWorkflow (/admin/reports/schemadotorg/ComputationalWorkflow) works as expected

jrockowitz’s picture

StatusFileSize
new9.47 KB

Attached is a diff of the latest cleaned-up MR to make this easier to review.

jrockowitz’s picture

@gpo I updated the MR. Are you using the MR? If yes, I am OK with merging the code.

  • jrockowitz committed 992192db on 1.0.x
    Issue #3373632: Bioschemas or other schemas support
    
jrockowitz’s picture

Status: Needs review » Fixed

I think I like having this option available to allow people to import their own schemas. I am going to merge it and see if people use it.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mpp’s picture

But the added versatility may help people using other controlled vocabularies…

I can second that, there are a lot more than schema.org, e.g. https://lov.linkeddata.es/dataset/lov/

@gpo, did you get the module to work on other vocabularies than schema.org?

Also, I don't fully understand why we import the vocabularies and application profiles as CSV data (vs linked data/triples)?

jrockowitz’s picture

Also, I don't fully understand why we import the vocabularies and application profiles as CSV data (vs linked data/triples)?

It was simpler and more performant. Keep in mind the Schema.org data is all managed via the 'schemadotorg.installer' and 'schemadotorg.schema_type_manager' are services and could be reworked to not use CSV data.

The [Schema.org configuration tool (RDF UI)](https://www.drupal.org/project/rdfui) does use linked data/triples.
@see https://git.drupalcode.org/project/rdfui/-/blob/8.x-1.x/src/EasyRdfConve...