Main»Home Page

Main.HomePage History

Show minor edits - Show changes to markup

June 28, 2013, at 01:56 PM by 138.100.11.51 -
Changed line 142 from:

// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed.

to:

// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed. \\

June 28, 2013, at 01:55 PM by 138.100.11.51 -
Changed lines 142-143 from:

// this query retrieves publications from pubmed in which title can be found "wilms tumor". For these publications, related genes from the gene database are extracted, and finally, related publications for each of these genes are retrieved again from pubmed
// A limit of 50 results is specified \\

to:

// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed.

Changed line 144 from:

"SELECT ?pubmed_uid ?pubmed_title ?gene_uid ?pubmed2_uid ?pubmed2_title \\?pubmed2_journal\n" + \\

to:

"SELECT ?gene_uid ?pubmed2_title\n" + \\

Changed lines 146-154 from:

" ?pubmed a eurdf:pubmed.\n" +
" ?gene a eurdf:gene.\n" +
" ?pubmed2 a eurdf:pubmed.\n" +
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" +
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" +
" ?pubmed eurdf:pubmed_gene ?gene.\n" +
" ?gene eurdf:gene_UID ?gene_uid.\n" +
" ?gene eurdf:gene_pubmed ?pubmed2.\n" +
" ?pubmed2 eurdf:pubmed_UID ?pubmed2_uid.\n" + \\

to:

" ?gene a eurdf:gene.\n" +
" ?pubmed2 a eurdf:pubmed.\n" +
" ?gene eurdf:gene_UID ?gene_uid.\n" +
" ?gene eurdf:gene_pubmed ?pubmed2.\n" + \\

Changed lines 151-156 from:

" ?pubmed2 eurdf:pubmed_JOUR ?pubmed2_journal.\n" +
"\n" +
" FILTER (?pubmed_title = \"\\\"wilms tumor\\\"\").\n" +
"}\n" +
"LIMIT 50";
\\

to:

" FILTER (?gene_uid = \"3992\").\n" +
"}";
\\

June 26, 2013, at 10:36 AM by 138.100.11.51 -
Changed line 92 from:

// this query asks for publications in PubMed with the general search term "dietary probiotics", and extracts the UID, title and journal from the retrieved publications. No limit is specified, so a maximum of 100 results are retrieved \\

to:

// this query asks for publications in PubMed with the general search term "dietary probiotics" (note that SPARQL escapes the characters with a '/' character, just like java, and in order to include the '"' character in the query, it must be escaped), and extracts the UID, title and journal from the retrieved publications. No limit is specified, so a maximum of 100 results are retrieved \\

June 26, 2013, at 10:34 AM by 138.100.11.51 -
Deleted line 151:

" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" + \\

Changed line 159 from:

" FILTER (?pubmed_auth = \"russ altman\").\n" + \\

to:

" FILTER (?pubmed_title = \"\\\"wilms tumor\\\"\").\n" + \\

May 09, 2013, at 05:57 AM by 138.100.11.51 -
Changed lines 40-47 from:

- README.txt: this file - NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries) - JavaDoc.rar: the Javadoc documentation of the API - examples.rar: a set of three examples in Java - RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI - ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work

to:
 - README.txt: this file
 - NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries)
 - JavaDoc.rar: the Javadoc documentation of the API
 - examples.rar: a set of three examples in Java
 - RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI
 - ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work

May 09, 2013, at 05:55 AM by 138.100.11.51 -
Changed lines 15-16 from:

The source code of NCBI2RDF is available in the following ftp server:

to:

The tool is freely available in the following ftp server:

May 09, 2013, at 05:54 AM by 138.100.11.51 -
Deleted lines 172-250:

String query = "PREFIX base: <http://aewrapper#>\n" +
"SELECT ?id ?name ?desc_text\n" +
"WHERE {\n" +
"?exp base:identifier_string ?id .\n" +
"?exp base:name_string ?name .\n" +
"?exp a base:Experiment.Experiment .\n" +
"?exp base:descriptions ?desc .\n" +
"?desc base:text_string ?desc_text .\n" +
"}";
String experimentId = "E-GEOD-1509";
// resultFile will contain the path to the file containing the results in SPARQL format
String resultFile = QueryProcessor.processQuery(query, experimentId);

In this code, the API is invoked with a SPARQL query (String) and a single experiment id (String). RDFbuilder translates the data of the specified experiment into RDF and performs the given SPARQL query, producing a file with SPARQL results format. There are some more options available when performing queries, such as the possibility of specifying a set of keywords instead of a single experiment. Please refer to the JavaDoc to get further details.

The API makes use of the disk drive of the computer where it executes to cache data from ArrayExpress. The directory for this cache is configurable through an xml configuration file. This configuration file must be named aewrapperConfig.xml, and must be placed inside a directory named /AEWRAPPER_CONFIG, which must be inside the base execution directory. For example, if our base execution directory is C:/executionDir/, then the xml configuration file should be in C:/executionDir/AEWRAPPER_CONFIG/aewrapperConfig.xml. The config file root tag is <aewrapper-config>.

Inside this tag there is one mandatory tag named <base-dir>, and two optional tags named <limit-experiment-count> and <limit-vector-ranges>.

The value in the base-dir tag indicates the directory where the cache will be placed. This must be a valid directory and it is necessary for the proper functioning of the library. Inside this directory we must also place the "mage-rdf-model-empty.obm" file that comes bundled with this library.

Example of config file (for a windows system with the cache base dir in C:\ArrayExpressWrapper)

    <?xml version="1.0" encoding="UTF-8"?>
    <aewrapper-config>
        <base-dir>C:\ArrayExpressWrapper</base-dir>
    </aewrapper-config>

In this example, the file mage-rdf-model-empty.obm should be placed inside C:\ArrayExpressWrapper\

Another example of config file, including the two optional tags:

    <?xml version="1.0" encoding="UTF-8"?>
    <aewrapper-config>
        <base-dir>C:\ArrayExpressWrapper</base-dir>
        <limit-experiment-count>5</limit-experiment-count>
        <limit-vector-ranges>100</limit-vector-ranges>
    </aewrapper-config>

In this example, with the optional tags we add two restrictions:

- We limit the number of experiments that are retrieved from the array express database to 300. This is to prevent the retrieval of too many databases in order to solve queries with keywords. For example, if a query is submitted with the keyword "organism", more than 23000 related experiments are found. Only downloading this amount of data could take several days. This value limit the number of downloaded experiments for answering a single query

- We limit the number of instances that are loaded from each MAGE-ML model (discarding the rest). This is useful if we want to execute the software in machines with fairly limited RAM size. With a value of 10000, the data should fit in a machine with 4GB of RAM. NOTE: adjust your java configuration to accept this amount of memory. To do this see for example http://www.caucho.com/resin-3.0/performance/jvm-tuning.xtp

Once the configuration file is properly set and placed, we can invoke the Java methods contained in the API. The software will create a directory called localExps inside the cache directory for storing downloaded data. This directory can be erased at any moment, thus clearing the cache. In addition, for each query submitted, the API will create a session directory inside the cache dir, looking something like query_session__2011-06-02--12-36-53__0. These directories store files created to answer submitted queries, and the result files for such queries. They can be erased after the results have been acquired.

May 09, 2013, at 05:53 AM by 138.100.11.51 -
Added line 104:
 \\
Added line 107:
 \\
Added line 111:

\\
Deleted line 112:
 \\
Added line 131:
 \\
Added line 134:
 \\
Added line 138:

\\
Deleted line 139:
 \\
May 09, 2013, at 05:52 AM by 138.100.11.51 -
Added line 110:
 \\
Added lines 133-164:
 


EXAMPLE 3:
// the query to launch
// this query retrieves publications from pubmed in which title can be found "wilms tumor". For these publications, related genes from the gene database are extracted, and finally, related publications for each of these genes are retrieved again from pubmed
// A limit of 50 results is specified
String query = "PREFIX eurdf: <http://RDFEutilsWrapper#>\n" +
"SELECT ?pubmed_uid ?pubmed_title ?gene_uid ?pubmed2_uid ?pubmed2_title \\?pubmed2_journal\n" +
"WHERE {\n" +
" ?pubmed a eurdf:pubmed.\n" +
" ?gene a eurdf:gene.\n" +
" ?pubmed2 a eurdf:pubmed.\n" +
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" +
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" +
" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" +
" ?pubmed eurdf:pubmed_gene ?gene.\n" +
" ?gene eurdf:gene_UID ?gene_uid.\n" +
" ?gene eurdf:gene_pubmed ?pubmed2.\n" +
" ?pubmed2 eurdf:pubmed_UID ?pubmed2_uid.\n" +
" ?pubmed2 eurdf:pubmed_TITL ?pubmed2_title.\n" +
" ?pubmed2 eurdf:pubmed_JOUR ?pubmed2_journal.\n" +
"\n" +
" FILTER (?pubmed_auth = \"russ altman\").\n" +
"}\n" +
"LIMIT 50";

// NCBI2RDF is invoked
String resultPath = Controller.launchQueryGetPath(query);

// The results are generated in a file located in .\EutilsWrapper\Results\results_"currentdate".xml
System.out.println("Results are in " + resultPath); \\
May 09, 2013, at 05:48 AM by 138.100.11.51 -
Changed lines 107-119 from:

System.out.println("Results are in " + resultPath); ||

to:

System.out.println("Results are in " + resultPath);


EXAMPLE 2:
// the query to launch
// this query retrieves publications in which "russ altman" in one of the authors, and the publications have related entries in the gene database. In each case, the publication uid and title, and the gene uid are retrieved
// the limit of retrieved results is set to 20
String query = "PREFIX eurdf: <http://RDFEutilsWrapper#>\n" +
"SELECT ?pubmed_uid ?pubmed_title ?gene_uid\n" +
"WHERE {\n" +
" ?pubmed a eurdf:pubmed.\n" +
" ?gene a eurdf:gene.\n" +
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" +
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" +
" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" +
" ?pubmed eurdf:pubmed_gene ?gene.\n" +
" ?gene eurdf:gene_UID ?gene_uid.\n" +
"\n" +
" FILTER (?pubmed_auth = \"russ altman\").\n" +
"}\n" +
"LIMIT 20";
// NCBI2RDF is invoked
String resultPath = Controller.launchQueryGetPath(query);
// The results are generated in a file located in .\EutilsWrapper\Results\results_"currentdate".xml
System.out.println("Results are in " + resultPath);
||

May 09, 2013, at 05:46 AM by 138.100.11.51 -
Changed lines 107-119 from:

System.out.println("Results are in " + resultPath);

to:

System.out.println("Results are in " + resultPath); ||

May 09, 2013, at 05:45 AM by 138.100.11.51 -
Deleted lines 88-98:

Added lines 90-120:

May 09, 2013, at 05:42 AM by 138.100.11.51 -
Changed lines 61-63 from:
      query: a SPARQL query 
returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL
query, or 100 if no limit was indicated in the query \\
to:

- query: a SPARQL query
- returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL query, or 100 if no limit was indicated in the query \\

Changed lines 65-67 from:
      Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator 
query: a SPARQL query
returns a Results object for reading the query results \\
to:

- Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator
- query: a SPARQL query
- returns a Results object for reading the query results \\

Changed lines 70-72 from:
      Performs a query and retrieves the results as a SPARQL Results file 
query: a ConceptsQuery object containing the query to perform
returns the path to the generated SPARQL Results file. This file will contain 100 results \\
to:

- Performs a query and retrieves the results as a SPARQL Results file
- query: a ConceptsQuery object containing the query to perform
- returns the path to the generated SPARQL Results file. This file will contain 100 results \\

Changed lines 75-79 from:
      Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator 
query: a ConceptsQuery object containing the query to perform
returns a Results object for reading the query results ||

to:

- Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator
- query: a ConceptsQuery object containing the query to perform
- returns a Results object for reading the query results ||

May 09, 2013, at 05:41 AM by 138.100.11.51 -
Changed line 60 from:
      Performs a query and retrieves the results as a SPARQL Results file \\
to:

- Performs a query and retrieves the results as a SPARQL Results file \\

May 09, 2013, at 05:41 AM by 138.100.11.51 -
Changed line 65 from:
  - public static Results launchQueryGetResults(String query); \\
to:

public static Results launchQueryGetResults(String query); \\

Changed line 70 from:
  - public static String launchQueryGetPath(ConceptsQuery query); \\
to:

public static String launchQueryGetPath(ConceptsQuery query); \\

Changed line 75 from:
  - public static Results launchQueryGetResults(ConceptsQuery query); \\
to:

public static Results launchQueryGetResults(ConceptsQuery query); \\

May 09, 2013, at 05:40 AM by 138.100.11.51 -
Changed line 59 from:
to:
May 09, 2013, at 05:38 AM by 138.100.11.51 -
Added lines 1-2:

Introduction

Changed lines 10-173 from:

For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es

to:

For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es

Downloads

The source code of NCBI2RDF is available in the following ftp server:

http://www.bioinformatics.org/ftp/pub/ncbi2rdf/

There is a README.TXT file in the ftp which explains how to use the library, plus several examples and precompiled jars, documentation and configuration files needed for installation.

Please read the README.TXT file contained in this ftp to learn the purpose of the available files, or read the next subsection.

Library instructions

Introduction

The NCBI2RDF tool is a Java-based API for enabling RDF-compliant access to the NCBI databases. It offers a programmatic interface for posing queries in SPARQL and receiving the results in SPARQL Results format. The API is quite straightforward to use, and its functionallity can be easily understood by looking at the provided examples.

Tool installation

The API can be used in a standalone Java application. All its functionality is bundled in the JAR that can be downloaded at the following web page: http://www.bioinformatics.org/ftp/pub/ncbi2rdf/.

The tool installation includse the following files:

- README.txt: this file - NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries) - JavaDoc.rar: the Javadoc documentation of the API - examples.rar: a set of three examples in Java - RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI - ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work

To use the API in a Java project:

i) Download and decompress ConfigFiles.rar in the root directory of your Java project. This will create a directory called EutolsWrapper, with three more directories containing the XML configuration files in it. These files must be placed there whenever the NCBI2RDF API is invoked.

ii) Download and import the NCBI2RDF jar library and use the public class es.upm.gib.eutilsrdfwrapper.Controller. This class offers a series of static methods for performing RDF-compliant queries over the NCBI databases, described below.

- public static String launchQueryGetPath(String query);
Performs a query and retrieves the results as a SPARQL Results file
query: a SPARQL query
returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL
query, or 100 if no limit was indicated in the query

- public static Results launchQueryGetResults(String query);
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator
query: a SPARQL query
returns a Results object for reading the query results

- public static String launchQueryGetPath(ConceptsQuery query);
Performs a query and retrieves the results as a SPARQL Results file
query: a ConceptsQuery object containing the query to perform
returns the path to the generated SPARQL Results file. This file will contain 100 results

- public static Results launchQueryGetResults(ConceptsQuery query);
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator
query: a ConceptsQuery object containing the query to perform
returns a Results object for reading the query results

As can be seen, the first method in the list admits a String parameter which must be a SPARQL-compliant query. This query should conform the provided RDF schema in order to generate results. This method generates a file in SPARQL Results format and returns its path.

The other methods offer different formats for specifying the query or obtaining the results. The ConceptsQuery class offers a programmatic way of defining queries to the system. Results class offers a programmatic way to retrieve results related to a posed query (it offers the methods hasNext and nextRow to iterate through the query results).

It is recommended to check the attached examples to see how the API is invoked with some sample queries.

String query = "PREFIX base: <http://aewrapper#>\n" +
"SELECT ?id ?name ?desc_text\n" +
"WHERE {\n" +
"?exp base:identifier_string ?id .\n" +
"?exp base:name_string ?name .\n" +
"?exp a base:Experiment.Experiment .\n" +
"?exp base:descriptions ?desc .\n" +
"?desc base:text_string ?desc_text .\n" +
"}";
String experimentId = "E-GEOD-1509";
// resultFile will contain the path to the file containing the results in SPARQL format
String resultFile = QueryProcessor.processQuery(query, experimentId);

In this code, the API is invoked with a SPARQL query (String) and a single experiment id (String). RDFbuilder translates the data of the specified experiment into RDF and performs the given SPARQL query, producing a file with SPARQL results format. There are some more options available when performing queries, such as the possibility of specifying a set of keywords instead of a single experiment. Please refer to the JavaDoc to get further details.

The API makes use of the disk drive of the computer where it executes to cache data from ArrayExpress. The directory for this cache is configurable through an xml configuration file. This configuration file must be named aewrapperConfig.xml, and must be placed inside a directory named /AEWRAPPER_CONFIG, which must be inside the base execution directory. For example, if our base execution directory is C:/executionDir/, then the xml configuration file should be in C:/executionDir/AEWRAPPER_CONFIG/aewrapperConfig.xml. The config file root tag is <aewrapper-config>.

Inside this tag there is one mandatory tag named <base-dir>, and two optional tags named <limit-experiment-count> and <limit-vector-ranges>.

The value in the base-dir tag indicates the directory where the cache will be placed. This must be a valid directory and it is necessary for the proper functioning of the library. Inside this directory we must also place the "mage-rdf-model-empty.obm" file that comes bundled with this library.

Example of config file (for a windows system with the cache base dir in C:\ArrayExpressWrapper)

    <?xml version="1.0" encoding="UTF-8"?>
    <aewrapper-config>
        <base-dir>C:\ArrayExpressWrapper</base-dir>
    </aewrapper-config>

In this example, the file mage-rdf-model-empty.obm should be placed inside C:\ArrayExpressWrapper\

Another example of config file, including the two optional tags:

    <?xml version="1.0" encoding="UTF-8"?>
    <aewrapper-config>
        <base-dir>C:\ArrayExpressWrapper</base-dir>
        <limit-experiment-count>5</limit-experiment-count>
        <limit-vector-ranges>100</limit-vector-ranges>
    </aewrapper-config>

In this example, with the optional tags we add two restrictions:

- We limit the number of experiments that are retrieved from the array express database to 300. This is to prevent the retrieval of too many databases in order to solve queries with keywords. For example, if a query is submitted with the keyword "organism", more than 23000 related experiments are found. Only downloading this amount of data could take several days. This value limit the number of downloaded experiments for answering a single query

- We limit the number of instances that are loaded from each MAGE-ML model (discarding the rest). This is useful if we want to execute the software in machines with fairly limited RAM size. With a value of 10000, the data should fit in a machine with 4GB of RAM. NOTE: adjust your java configuration to accept this amount of memory. To do this see for example http://www.caucho.com/resin-3.0/performance/jvm-tuning.xtp

Once the configuration file is properly set and placed, we can invoke the Java methods contained in the API. The software will create a directory called localExps inside the cache directory for storing downloaded data. This directory can be erased at any moment, thus clearing the cache. In addition, for each query submitted, the API will create a session directory inside the cache dir, looking something like query_session__2011-06-02--12-36-53__0. These directories store files created to answer submitted queries, and the result files for such queries. They can be erased after the results have been acquired.

Contact

For any comments, questions or suggestions, please write an email to aanguita@infomed.dia.fi.upm.es

May 03, 2013, at 10:35 AM by 138.100.11.51 -
Added lines 4-8:

The tool is free to download from http://www.bioinformatics.org/ftp/pub/ncbi2rdf/

For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es

May 03, 2013, at 10:32 AM by 138.100.11.51 -
Added lines 1-3:

This is the main project page for NCBI2RDF. The NCBI2RDF tool is a Java-based API for enabling RDF-compliant access to the NCBI databases. It offers a programmatic interface for posing queries in SPARQL and receiving the results in SPARQL Results format. The API is quite straightforward to use, and its functionallity can be easily understood by looking at the provided examples.