Research output in the life siences has traditionally been communicated and disseminated orally via conferences, symposia, etc, and in writing with scholarly journal. With the rise of digital computers and later the internet, our list of choices have increased and databases have become a central aspect of knowledge dissemination and, in fact, doing research.
The PRA3006 course aims to teach the student how life sciences databases can be accessed. Using a web browser is essential but not part of the education. Instead, the programmatic access of the databases is, which is essential to use these databases in professional research. Therefore, the PRA3006 practical combined hands-on experience with a some theory on programming and knowledge representation to challenge the student to answer biological questions.
Nowadays, many databases provide one or more application programming interfaces (APIs) to provide programmatic access to their databases. There is a fascinating history behind this, but PRA3006 is not a “History of Bioinformatics” course. Instead, it focuses on one API: the SPARQL endpoint. The obvious alternative is REST APIs [1], but SPARQL endpoints simply provide more learning opportunity to the student. A later chapter provides a list of SPARQL endpoints around the life sciences.
Each chapter in this book will describe the SPARQL endpoint of one database. It will describe the basics of the RDF data model, the used ontologies, and shows a few examples queries.
This book does not provide an introduction to SPARQL. For that, the reader is recommended to read D. Slenter’s SPARQLing Biology: a beginners course and “Learning SPARQL” [2].