There are only a few months left before the code freeze on September 1st. Now that Fields API has settled in core, it's time to extend it with some RDF semantics. DERI Galway is hosting an RDF in Drupal code sprint from May 11th until May 14th.

This sprint builds on Dries' ideas expressed in his recent posts Drupal, the semantic web and search and RDFa and Drupal. With RDF in the core of Drupal and RDFa output by default, it's dozens of thousands of websites which will all of a sudden start publishing their data as RDF.

So far 8 people have signed up. How about you?

Some others are willing to come but cannot afford the trip until some funding is secured. To help us fund the sprint and bring more Drupal rockstars on board, please consider making a donation using the ChipIn widget on this page. The money will be used to cover flight, food and hotel costs for the sprinters. All sprinters are generously donating their time to make this happen. It would also be great to fly in a few additional people with extensive testing and Fields experience. Any excess money will be used to add more people, or will be donated to the Drupal Association.

Goals of the code sprint

The RDF code sprint will focus on Drupal core and aim at integrating RDF semantics in it.

  1. Extend Fields API to integrate RDF mappings for each field instance. The semantics of a field can differ from a bundle to another. This can be stored either in the existing settings property or by adding a rdf_mappings property to the Field Instance objects.
  2. Modify the Fields UI (contrib) to allow RDF mappings editing.
  3. Define the appropriate mappings for the core modules, based on the RDF core mapping proposal.
  4. Patch core modules with the mappings defined above.
  5. Export these mappings in RDFa via the theme layer and keep it as generic as possible in order to ease the work of the themers.
  6. Write tests for RDF in core.
  7. Identify other non-fieldable entities in core which could benefit from being RDF-ized, and see how to annotate them. Comment is one example. Terms also, though they might become fieldable.
  8. RSS 1 (RDF) in core. Arto volunteered to get started with that.

See a list of current open RDF issues in RDF issues in core.
See also the RDF code sprint wiki page where we will keep an up to date list of goals.

Comments

AlanT’s picture

I'm probably just showing how little I know, but what is RDF? And what does having it in core mean to me as a user?

- Alan Tutt

Exceptional Personal Development for Exceptional People
http://www.PowerKeysPub.com

scor’s picture

RDF is a W3C standard to add semantics to the data of your site and enable interoperability on the Web. Think of it as RSS on steroids. Watch this great video Intro to the Semantic Web

webchick’s picture

1. Better SEO; RDF allows Google and other search engines to have context about your site's content. They'll understand that "Frank Jones" is the name of a person, not just some random text. They'll understand that a random node on your site is a review for a book with a rating of 2/5 stars. Think search engines on steroids.

2. Better opportunities for interoperability. Data on your site can be "mashed up" with data from other peoples' sites in all sorts of interesting ways.

3. Once you explain what the content is of your pages, it makes it really easy to pull in related content from elsewhere on your site (or elsewhere on the web) to help improve the ability of your visitors to find things they're looking for quickly and easily.

(scor, feel free to correct me if I'm wrong in any of this; this is just what I learned from researching OpenCalais the other week.)

scor’s picture

you're perfectly right webchick!

AlanT’s picture

Thank you, Webchick.

I have to say that this sounds like an interesting theory, and hopefully it will turn out to have practical uses as well. Has anyone split-tested this to see if it really does produce better SEO? Are there any examples of live sites using it to improve site usability?

- Alan Tutt

Exceptional Personal Development for Exceptional People
http://www.PowerKeysPub.com

webchick’s picture

It's not like this is science-fiction stuff that "could" someday appear. :) Google, for example, is parsing this stuff as we speak, and directing priority traffic to sites that implement RDF and Microformats.

For example, try searching Google for "name of movie movie" and you'll see something like this:

Ratings

That aggregated rating is parsed from sites that implement microformats to explain that the "5" that the search engine finds in that page is actually a "5 stars out of 5" rating on a movie review. If you click into that link, you'll see a variety of sites. One of them that usually comes up is http://www.commonsensemedia.org/ which also happens to be a Drupal site that implements the hreview microformat.

Drupal makes a particularly interesting/powerful platform to put RDF into because there is literally no limit to the type of content Drupal can manage, so we have a real opportunity to be leaders in this area, and move this power into the hands of people who are not comfortable hand-editing HTML.

Elijah Lynn’s picture

That was really great Angie! The previous explanation along with this one in the screenshot just saved me a couple hours of reading.

Cheers,

Elijah Lynn

-----------------------------------------------
The Future is Open!

scor’s picture

AlanT, make sure you also watch the video about SearchMonkey. It features enhanced result you can see already on Yahoo! search results, searching for art of pizza chicago for example.

pschopf’s picture

It looks like if you have to ask, you don't belong. Continuing with the strong drupal tradition of writing new code without backward compatibility, we can now release Drupal 7 years ahead of documentation for Drupal 6. In fact, you can forget about 6 documentation entirely - read the code, that should be enough.

webchick’s picture

I'm always happy to meet someone passionate about seeing Drupal's documentation improved. :)

It's important to note that anyone can click the "edit" tab on any handbook page and fix it if they notice something inaccurate. Or, if you come across something that's not documented yet, write down as much as you've managed to figure out, and then file an issue in the queue, either against a particular module if it's for that, or against the "Documentation" project if it's for something more general such as a page in the handbook. The documentation team is a really great bunch of volunteers who love to help those who want to help Drupal, and would be more than happy to proof-read your work, collaborate with you on something, or direct you to the proper channels. http://drupal.org/contribute/documentation has more information on getting involved.

Looking forward to your contributions! :D

dman’s picture

I was bored, so did some quick calculations.

A quick and very unscientific grep of the drupal core modules says that from:
24699 lines (just the core /modules directory, not API, excluding the html templates)

22477 are /not/ blank.
4721 /look/ like inline function docs ( with *)- the core phpdoc documentation as seen on api.drupal.org
1586 are inline explanatory docs ( with //) - available on api.d.o and useful to any developer.

Taking a look at the code,
2601 lines are calls to t() - which contain more text than code and are just ui messages, it's not like there are per-line docs needed there.
2931 lines contain nothing but "}" on its own - not exactly confusing to anyone reading docs.
(2601 + 2931) = 5532 non-documentable lines

sooo .. the way I look at it, there are
(4721+1586)= 6307 lines of doc to (22477 -6307 -5532)= 10638 lines of code.

a little over 1 line of documentation per 1.7 actual code that may need explaining. +/- 5%

So that's (a little) like the developers spending 22 minutes of every hour explaining what they are doing in the remaining 38 minutes.

Line-count-based metrics are extremely flawed way of measuring code quality, BUT I still don't understand why these results (2 docs every 3 lines) could be held up to call Drupal6 'undocumented'.
Do we need the "talking about things" to outweigh the "actually doing things" portion of the code before it can be called "adequately documented"?

FTR, to expose how bad my maths/cli skills are:

cat /var/www/drupal6/modules/*/*.module > drupal-cat.txt
export total_lines=`wc  -l drupal-cat.txt`
export line_count=`grep -cve '^\s*$' drupal-cat.txt ` 
export phpdoc_count=`grep -c '*' drupal-cat.txt  `
export inlinedoc_count=`grep -c '//' drupal-cat.txt  `
export translate_func_count=`grep -ce '^ *}\s*$' drupal-cat.txt `
export lone_brace_count=`grep -ce '^ *}\s*$' drupal-cat.txt `

.. there are many tweaks that could be made to this algorithm, have fun.

Of course, I may have totally missed the point as I'm only talking abut docs intended for people who read documentation. I'm not sure what the wordcount on the Drupal handbooks vs contrib code would come in at.

jrabeemer’s picture

I'm afraid RDF is an abstract and hard to understand topic. Even amongst seasoned web developers, you'll be hard pressed to find much excitement in RDF. I think that enthusiasm is reflected in the chipin widget. Can you provide more information about how that would to translate into real world applications and use cases?

Steve Dondley’s picture

If it's worth anything to you, Dries has given a presentation about RDF and explained his reasoning why he supports it. Tim Berners Lee supports it. Here's what he said:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

graybeal’s picture

I totally understand this reaction. As technologies go, RDF is dry as dust. It's one step removed from all the cool stuff that it enables, so people don't get excited about it.

As someone who is trying to marry science and semantics -- that is, change current scientific methods so that automated processes can _understand_ them, in a computational sense at least -- I am wildly enthusiastic about this work. I am convinced it will be the most important contribution of Drupal to its users, the users of Drupal sites, and to information technology in general, in this decade at least.

Many of the other links provide the additional information that you are asking for, and I probably can't improve upon them in a post. But I will give a use case from our community (since I wasn't sure where to leave my use case anyway). Those of you who like text more than video may find it helpful. This use case relates the more semantic-oriented technologies of this change to the practical Drupal web site technologies.

Right now we[1] collect references to other documents on the web about science data management. We are starting to categorize and rate them, using custom fields we created for each kind of reference. Anyone who wants to find and use these ratings has to go to the Drupal site, look up each page, and grab that data. The ratings themselves are terms that come from vocabularies we maintain on another 'vocabulary repository' system [2]. We will probably add the ratings data to a custom-built table, but that will cost development time and still require people navigate to the table and view it, or copy/paste it, to use it for their own purposes. They can only use what we can provide by developing custom software. They can't automate it because if we change our format, or the name we used for the title of a category, their automated scraping of our page will break. They can't tell what the rating words mean unless we also specifically add taxonomies to Drupal that match our rating taxonomies on our own vocabulary system (or build additional tools to do that automatically, which we might have to do). They can't relate the ratings on our site easily to the ratings on another site, or know that our John Doe that rated this Content Standard is the same John Doe that rated it differently on another site.

In the brave new world of RDF in Drupal, here's how I hope it will work:

  1. We create the page for rating, say, Content Standards. Our page will define for every category (Drupal field): a title; the vocabulary (from our vocabulary repository, but ideally this information is automatically aligned with Drupal taxonomies, ahem :-)) to use in filling out the field; and the RDF-encoded concept that corresponds to the title. In practice, this last means that we use terms from a vocabulary of concepts, like 'overall rating' and 'version reviewed' that we and Google and everyone else understand. Creating this page was easy and straightforward because it's all integrated in Drupal core.
  2. Members of our team make a page by filling out every field of information for a given Content Standard. They are prompted for the allowed terms to fill out each field. If they don't know what the field title means, they click on a help link that references the corresponding RDF-encoded concept ("version reviewed: the version string assigned by the information provider to the particular release of the information", and possibly much more). Note: Much of this is possible with Drupal taxonomies today, if you use them, but they are embedded inside your Drupal server, not widely accessible and exchangeable.
  3. In each presented Content Standard page, Drupal automagically provides the RDF metadata associated with that information.
  4. Anyone who wants to write an application that makes use of our data -- say, tracks the change in overall rating against the different versions reviewed -- can use the RDF information embedded in our web page to do so. Even if we change the format of the web page, and the title of the fields that the user sees, their application will still work. *And*, it will be able to explain to the user exactly what all these ratings mean that we're using, because we have defined those terms, and Drupal and the application understand how to use RDF to find the definitions. (Note to semantic web folks: Glossing over some URI dereferencing issues there.)
  5. Someone who isn't on our development team may not be inspired to write an application that lets a scientist find the perfect content standard for their needs, by using our ratings and information to automatically select content standards based on the scientist's input. This multiplies the value of our work, potentially a lot, without requiring any additional labor or agreement on our part. (Because the interface to the data is exposed automatically through RDF.)
  6. All the big players (Google, Yahoo, countless semantic tool developers) that write applications that crawl the web looking for information they understand ("look! there's an 'overall rating' for a 'web resource' that goes by the 'resource name' of -FGDC Content Standard- and it has an 'update date' that's less then a month old!) will now understand much, much more about what's on our web site, and can represent that information in their own contexts.
  7. If we ever decide to add a concept to one of our vocabularies -- say we add 'paradigm-changing' to our overall ratings vocabulary -- this 5-minute change automatically ripples through ALL these applications. Everyone can use it on our Drupal site to rate a Content Standard, every application will understand it is a term in the 'overall rating' scheme defined by RatingsRUs, and everyone who sees 'paradigm-changing' as a rating for a Content Standard, even if they have no idea where our site is or who created that Content Standard page, can immediately find out what what term means.
  8. Because ALL these terms and vocabularies are controlled and connected in well defined ways (thanks to RDF), we can understand that our 'paradigm-changing' rating actually means exactly the same thing as Google's '*****' and Consumer Reports' '10'. Similarly for names (thanks to an RDF vocabulary called Friend-of-a-Friend, or FOAF).
  9. Someday, automated systems can use all this knowledge to perform human-like reasoning automatically across all the data, concluding for example that the MMI rating system gives web content lower but more consistent ratings, while the Consumer Reports system gives higher ratings with more variation.

    I will donate some of my own money to make this happen, although the benefit will go to my work life. If someone from the project finds this use case interesting they are welcome to contact me about it.

    [1] Marine Metadata Interoperability project, http://marinemetadata.org
    [2] MMI Ontology Registry and Repository, http://mmisw.org/or

yelvington’s picture

While you're thinking about how RDF might empower what EPIC called "fact-stripping robots," give some attention to YQL Execute as well. The potential interaction of the two is mind-boggling.

PhillG’s picture

I think some seasoned web developers might not be excited about RDF because they may not all come from a data or information architecture background.

Have a look at http://www.london-gazette.co.uk/

All the Corporate Insolvency notices (and may others) contain large amounts of RDF triples encoded as RDFa. The documents are self describing, in combination with the ontologies pointed to by the CURIEs, a machine can infer all sorts of information such as comany name, number, nature of business, directors, the court hearing date, place, which administrators were appointed, which company they worked for and at which office, and so on.

Phil

Dries’s picture

Cloud’s picture

I will take part as much as possible (a few meetings on Monday but will be around for most of the week).

jaharmi’s picture

I hope that those developing this can take into account memory footprint. I don’t believe I’m alone, based on the RDF module’s issue queue, in running into memory problems with Drupal 6 + RDF on a shared hosting account. Enabling that module seems to take up another ~4 MB.

If this is going into core, I hope that it won’t have that kind of impact on every single Drupal site upgraded to v7.

I’m not complaining or railing against anyone’s hard work on this effort — heck, I want to be able to run the RDF module now — but I think that memory usage is an important consideration.

scor’s picture

The RDF API module is not going into core and there won't be similar memory issues in core. We are working to make the RDF in core as lightweight as possible.

pharma’s picture

What i understand is (non-technical guy), it will help index old search engines (Google & Yahoo) to display results like the new search engine from ex-googlers http://www.cuil.com (Pronounced "cool"...)

If i am not asking too much, is it possible to display standard search results of websites using drupal like Cuil search engine results...

If you check for "Drupal" in google and Cuil ...you know what i mean

giorgio79’s picture

If you implemented RDF for Drupal 6 would you share your SEO results, such as the % of increase from organic search? I am really curious how much can I benefit from its implementation? Some case studies would be great.

I just tried the Calais analyzer, but in some of my posts with 300 words, it could only identify maybe 2 words as the name of the person...

rupl’s picture

Right now I wouldn't expect RDF integration to have a very large impact on traffic to your site. However, just two days ago Google announced support for RDF and Microformats in regular search. Yahoo already has this capability featured in Search Monkey.

http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snip...
http://developer.yahoo.com/searchmonkey/

So expect this to be a hot topic in the coming months/years. It's awesome that we're identifying this tremendous opportunity and making progress toward supporting it! (I donated, you should too)

Elijah Lynn’s picture

-----------------------------------------------
The Future is Open!

webmastersamtha’s picture

I have been using Drupal since a long time for my site and am keen on looking out for latest updates on Drupal. This new RDF code from Drupal is very interesting and effective, specially its interoperability feature. I strongly recommend it to the webmasters for their sites.

[Edit: spam links removed]

dman’s picture

Dear "SEO" guy.
Trying to spam the system by adding "rel='follow'" links to your signatures does not work. Perhaps you should consult with a web professional who actually understands SEO, because you are demonstrably useless at it.

gvelez17’s picture

Hi guys

It isn't clear to me yet, if it is going to be possible/easy to make any arbitrary assertion about a node in Drupal7. I'm thinking something like the Relations API but be able to make the relation to any URL, not only another Drupal node.

Scor's work on RDFCCK is great, I love that we can use vocabularies, but I want more than that - I want to be able to make assertions about existing nodes that I didn't plan on when I created the CCK model, and in fact that don't apply to most of the nodes, just some of them, so I don't want to add it as a property. I'm thinking of general predicates like
dc:created_by
or
abra:inspired_by

that could apply to many documents and media, but I don't want to add as CCK properties as there are a large number of different predicates I might want to apply.

I don't always read this list, if this makes sense to anyone could you possibly copy a reply to gvelez17 && gmail.com ?

thanks!

--Golda
http://iwhome.com