PRA3006-SPARQL

Wikidata

Website https://www.wikidata.org/
License CCZero

Wikidata is not a life sciences database, but a general database related to Wikipedia [1]. That said, various research groups have started using Wikidata for the life sciences [2,3]. For example, CAS registry numbers in Wikidata and Wikipedia have been validated against the Common Chemistry database [4], and Wikidata has been used to make chemicals in taxon available in the LOTUS project [5].

Entities

The RDF contains all pathways, their datanodes (genes, proteins, metabolites, etc.), author information, molecular descriptors, and more. The main classes are:

Data model

Example queries

Proteins

We can list proteins with the following query:

SPARQL sparql/wikidataProteins.rq (run, edit)

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT * WHERE {
  ?o wdt:P31 wd:Q8054.
  ?o rdfs:label ?l.
  FILTER(LANG(?l)='en')
} LIMIT 10

which gives:

o l
http://www.wikidata.org/entity/Q24190 Neurotrophin 3
http://www.wikidata.org/entity/Q25902 chymosin
http://www.wikidata.org/entity/Q30530 Histidine ammonia-lyase
http://www.wikidata.org/entity/Q58321 protein kinase
http://www.wikidata.org/entity/Q63398 Chromogranin B
http://www.wikidata.org/entity/Q74314 titin
http://www.wikidata.org/entity/Q418781 Catechol-O-methyltransferase
http://www.wikidata.org/entity/Q418896 proopiomelanocortin
http://www.wikidata.org/entity/Q418934 TNF superfamily member 11
http://www.wikidata.org/entity/Q419004 Cannabinoid receptor 1

Chemicals

We can also list chemicals, with this query:

SPARQL sparql/wikidataChemicals.rq (run, edit)

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT * WHERE {
  ?o wdt:P31 wd:Q113145171 .
  ?o rdfs:label ?l.
  FILTER(LANG(?l)='en')
} LIMIT 50

which gives:

o l
http://www.wikidata.org/entity/Q150808 tetradecane
http://www.wikidata.org/entity/Q150831 pentadecane
http://www.wikidata.org/entity/Q150843 hexadecane
http://www.wikidata.org/entity/Q116587 diisononyl adipate
http://www.wikidata.org/entity/Q116907 glutathione
http://www.wikidata.org/entity/Q117422 glycol salicylate
http://www.wikidata.org/entity/Q118033 cycloundecane
http://www.wikidata.org/entity/Q118040 cyclododecane
This table is truncated. See the full table at sparql/wikidataChemicals.rq

References

  1. Vrandečić D, Pintscher L, Krötzsch M. Wikidata: The Making Of. In: WWW ’23 Companion: Companion Proceedings of the ACM Web Conference 2023 [Internet]. 2023. Available from: https://dl.acm.org/doi/10.1145/3543873.3585579 doi:10.1145/3543873.3585579 (Scholia)
  2. Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith O, et al. Wikidata as a knowledge graph for the life sciences. eLife [Internet]. 2020 Mar 17;9. Available from: https://elifesciences.org/articles/52614 doi:10.7554/ELIFE.52614 (Scholia)
  3. Waagmeester A, Willighagen EL, Su AI, Kutmon M, Gayo JEL, Fernández-Álvarez D, et al. A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses. BMC Biol [Internet]. 2021 Jan 22;19(1):12. Available from: https://bmcbiol.biomedcentral.com/track/pdf/10.1186/s12915-020-00940-y.pdf doi:10.1186/S12915-020-00940-Y (Scholia)
  4. Jacobs A, Williams D, Hickey K, Patrick N, Williams AJ, Chalk S, et al. CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community. JCIM. 2022 May 13; doi:10.1021/ACS.JCIM.2C00268 (Scholia)
  5. Rutz A, Sorokina M, Galgonek J, Mietchen D, Willighagen E, Gaudry A, et al. The LOTUS initiative for open knowledge management in natural products research. eLife [Internet]. 2022 May 26;11. Available from: https://doi.org/10.7554/elife.70780 doi:10.7554/ELIFE.70780 (Scholia)