LA-SiGMA Workshop: Connecting data through semantics and ontologies: a pathway to open data in science | Lectures
We are currently witnessing the confluence of a number of trends, which when combined, will deeply change the practice of science:
- The web has established itself as the central, scalable platform for the production, exchange and dissemination of information and knowledge units
- We are witnessing the beginning development of a data commons
- There is increasing political pressure and will to make open access data a reality (e.g. Hargreaves report (UK), White House Memorandum on Open Access (US)).
The combination of these trends is already beginning to provide unique opportunities to integrate, query and derive new knowledge from disparate data sources. Linked data and ontologies will play a crucial role in this: data integration and linking can be achieved using the Resource Description Framework (RDF) and ontologies are needed to construct formal, computable digital representations of objects (knowledge objects, data objects, physical objects).
This set of lectures will provide an introduction to RDF and the Web Ontology Language (OWL), will discuss their relationship to other semantic web technologies. Furthermore, we will explore some practical tools such as triple stores for the storage and querying of linked data, APIs for the programmatic manipulation of linked data and ontologies as well as ontology editors for knowledge engineering and ontology construction.
Following on from the formal lecture-based introduction to Linked Data, RDF and Ontologies, the practical session on Saturday afternoon will provide a starting point for attendees willing to use some of these technologies in their own work.
During the first part of the afternoon, we will jointly construct a small ontology, thus getting to know some of the features of the Web Ontology Language (OWL) using an example from synthetic organic chemistry. Furthermore we will explore how to reason with ontologies and how to use them to mark up data. The second part of the afternoon is an opportunity for attendees to bring along their own data and to start – hackathon style – constructing ontologies to describe it.
If you wish to participate in the practical session, please bring a laptop with Java 1.5 or higher installed. Please also download and install version 4.3 of the Protégé Ontology Editor (available as open source software from http://protege.stanford.edu/download/registered.html#p4.3). Any other software or resources which might be required will be distributed either at the beginning of the practical session or can be downloaded from the internet then.
If there is sufficient interest, we may also attempt some simple Java programming to illustrate the use of several relevant APIs (Manchester OWL API, Jena). If you do not feel comfortable programming, feel free to just follow along or to team up with another participant.
Over the course of the past five years, the Texas Advanced Computing Center has deployed progressively larger resources for research data management, with the most recent "Corral" system including 5 petabytes of geographically replicated storage, and systems later this year expected to provide tens of petabytes of capacity. In addition to large-scale storage capabilities, TACC has developed a suite of cyberinfrastructure services including consulting expertise, web and database service capabilities, metadata extraction, and data management planning. Chris Jordan, group leader for Data Management and Collections at TACC, will provide an overview of resources and services at TACC and will highlight a few important data collections hosted with significant support from TACC.