Over the last year or so, my main priority has been the FlyWeb Project. Unfortunately, FlyWeb was supported by short-term funding (18 months), and is coming to an end soon. Here are a few belated notes on what we did and why we did it…
The main goal of FlyWeb was to minimize the time required for a researcher in the domain of Drosophila (fruit fly) functional genomics, with no informatics training, to find and compare gene expression data from different databases on a large number of genes. With this in mind, we developed openflydata.org, which hosts the following cross-database gene expression data search applications:
- openflydata.org/search/gene-expression – search for a single gene of interest, and then retrieve and display expression data for that gene, including tissue-specific mRNA levels from FlyAtlas, embryo in situ hybridization images and ontology annotations from BDGP, and testis in situ hybridization images from FlyTED. Also retrieved are literature references relevant to the selected gene, provided by FlyBase.
- openflydata.org/search/gene-batch-expression – search for a batch of genes, then retrieve and compare expression data from FlyAtlas, BDGP and FlyTED, for all matching genes.
- openflydata.org/search/by-expression-profile – search for genes matching a given tissue-specific mRNA expression profile, based on data from FlyAtlas, and then retrieve further expression data for each gene found.
The applications are all pure JavaScript, built using a custom library called FlyUI. They fetch data AJAX-style directly from four SPARQL endpoints, one for each of the four sources of genomic data. On the server side, we use Jena TDB as the underlying RDF storage and query engine, and SPARQLite as the SPARQL protocol server. The whole thing runs on a small EC2 instance.
Further details on our work to convert the four data sources to RDF, in addition to bulk RDF downloads, SPARQL endpoints and more, can be found at the links below: