Leveraging annotated data with query on the Semantic Web

Alan Ruttenberg
Plenary Talk, Afternoon Session, 2 September (11th MGED Meeting, 1-4 September, 2008)

Semantic Web and why we should be interested

The semantic web in a nutshell: it adds to web standards and practices, and encourages unambiguous names for things. It's aim is to enable computationally-assisted exploitation of information that can be seamlessly integrated from multiple sources. HTML was successful because it was easy to write, and you could always view the source. The same sort of idea is used in the development of the SW, in that they will continue to communicate via HTTP, but will add some more simple but powerful languages (RDF, OWL, Rules), and everything will stay text based, and things will have a shared namespace.

An example of the use of the SW is the Allen Brain Atlas and Copy/Paste in the semantic web. Mouse brains were sectioned and stained for the presence of gene expression: 20,000 genes and 400000 hi-res images. Initially only available via HTML. They scraped about 80,000 web pages to extract information, and then built a demo. The really good bit is that it allows a large number of on-the-fly queries – and more importantly, answers.

These common questions all involve some level of data integration, which is a lot of work. The neurocommons already has a method of doing this via OWL files and a triple store database.

What's the payoff? Work is visible and can be durable on the web, and with good design choices we can build to last. It becomes an extended web tuned for research use. Of course, there's no free lunch: still have to work out licensing, naming, and what the common representations should be.

Leveraging OWL 101

If you look at a graphical view of GO, in all it's glory. GO has a mixture of transitive is_a relationships with other relationships like part_of. We need techniques to manage these relationships, and that's what OWL is best for. In one case, 5% of the entries (120 errors) were found to be incorrect using very simple OWL relationships when comparing two e.coli databases (ECOCYC and one other – missed which one it was).

Trying to get from English to OWL can be difficult, as the words we often use are ambiguous. Reasoners can discover inconsistencies in the way we define words, even if we're trying not to be ambiguous. There are 3 ways of representing scientific knowledge: record, statement and domain level. You must have ontological commitment to ensure that we all say exactly what we mean to say.

Science commons is a project of the creative commons. CC started as a reaction to onerous copyright laws in the US. They've developed better open-source licenses. Science Commons specializes CC to science. It's built on open resources, is in the public domain, and open databases and open literature. How to release data in such a way that it successfully deals with the laws of multiple countries? Neurocommons' goal is to demonstrate that such a framework is possible.

These are just my notes and are not guaranteed to be correct.
Please feel free to let me know about any errors, which are all my
fault and not the fault of the speaker. 🙂

Read and post comments |
Send to a friend



Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s