Main»Home Page

Main.HomePage History

Hide minor edits - Show changes to output

June 28, 2013, at 01:56 PM by 138.100.11.51 -
Changed line 142 from:
// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed.
to:
// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed. \\
June 28, 2013, at 01:55 PM by 138.100.11.51 -
Changed lines 142-143 from:
// this query retrieves publications from pubmed in which title can be found "wilms tumor". For these publications, related genes from the gene database are extracted, and finally, related publications for each of these genes are retrieved again from pubmed \\
// A limit of 50 results is specified \\
to:
// this query retrieves the gene with uid 3992 (FADS1), and for that gene, related publications that refer to that gene in PubMed.
Changed line 144 from:
"SELECT ?pubmed_uid ?pubmed_title ?gene_uid ?pubmed2_uid ?pubmed2_title \\?pubmed2_journal\n" + \\
to:
"SELECT ?gene_uid ?pubmed2_title\n" + \\
Changed lines 146-154 from:
" ?pubmed a eurdf:pubmed.\n" + \\
" ?gene a eurdf:gene.\n" + \\
" ?pubmed2 a eurdf:pubmed.\n" + \\
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" + \\
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" + \\
" ?pubmed eurdf:pubmed_gene ?gene.\n" + \\
" ?gene eurdf:gene_UID ?gene_uid.\n" + \\
" ?gene eurdf:gene_pubmed ?pubmed2.\n" + \\
" ?pubmed2 eurdf:pubmed_UID ?pubmed2_uid.\n" + \\
to:
" ?gene a eurdf:gene.\n" + \\
" ?pubmed2 a eurdf:pubmed.\n" + \\
" ?gene eurdf:gene_UID ?gene_uid.\n" + \\
" ?gene eurdf:gene_pubmed ?pubmed2.\n" + \\
Changed lines 151-156 from:
" ?pubmed2 eurdf:pubmed_JOUR ?pubmed2_journal.\n" + \\
"\n" + \\
" FILTER (?pubmed_title = \"\\\"wilms tumor\\\"\").\n" + \\
"}\n" + \\
"LIMIT 50"; \\
\\
to:
" FILTER (?gene_uid = \"3992\").\n" + \\
"}"; \\
\\
June 26, 2013, at 10:36 AM by 138.100.11.51 -
Changed line 92 from:
// this query asks for publications in PubMed with the general search term "dietary probiotics", and extracts the UID, title and journal from the retrieved publications. No limit is specified, so a maximum of 100 results are retrieved \\
to:
// this query asks for publications in PubMed with the general search term "dietary probiotics" (note that SPARQL escapes the characters with a '/' character, just like java, and in order to include the '"' character in the query, it must be escaped), and extracts the UID, title and journal from the retrieved publications. No limit is specified, so a maximum of 100 results are retrieved \\
June 26, 2013, at 10:34 AM by 138.100.11.51 -
Deleted line 151:
" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" + \\
Changed line 159 from:
" FILTER (?pubmed_auth = \"russ altman\").\n" + \\
to:
" FILTER (?pubmed_title = \"\\\"wilms tumor\\\"\").\n" + \\
May 09, 2013, at 05:57 AM by 138.100.11.51 -
Changed lines 40-47 from:
- README.txt: this file
- NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries)
- JavaDoc.rar: the Javadoc documentation of the API
- examples.rar: a set of three examples in Java
- RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI
- ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work

to:
- README.txt: this file
- NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries)
- JavaDoc.rar: the Javadoc documentation of the API
- examples.rar: a set of three examples in Java
- RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI
- ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work

May 09, 2013, at 05:55 AM by 138.100.11.51 -
Changed lines 15-16 from:
The source code of NCBI2RDF is available in the following ftp server:
to:
The tool is freely available in the following ftp server:
May 09, 2013, at 05:54 AM by 138.100.11.51 -
Deleted lines 172-250:









|| border=1
||String query = "PREFIX base: <http://aewrapper#>\n" + \\
"SELECT ?id ?name ?desc_text\n" + \\
"WHERE {\n" + \\
"?exp base:identifier_string ?id .\n" + \\
"?exp base:name_string ?name .\n" + \\
"?exp a base:Experiment.Experiment .\n" + \\
"?exp base:descriptions ?desc .\n" + \\
"?desc base:text_string ?desc_text .\n" + \\
"}"; \\
String experimentId = "E-GEOD-1509"; \\
// resultFile will contain the path to the file containing the results in SPARQL format \\
String resultFile = QueryProcessor.processQuery(query, experimentId); ||

In this code, the API is invoked with a SPARQL query (String) and a single experiment id (String). RDFbuilder translates
the data of the specified experiment into RDF and performs the given SPARQL query, producing a file with SPARQL results
format. There are some more options available when performing queries, such as the possibility of specifying a set
of keywords instead of a single experiment. Please refer to the JavaDoc to get further details.

The API makes use of the disk drive of the computer where it executes to cache data from ArrayExpress. The directory
for this cache is configurable through an xml configuration file. This configuration file must be named
aewrapperConfig.xml, and must be placed inside a directory named /AEWRAPPER_CONFIG, which must be inside the
base execution directory. For example, if our base execution directory is C:/executionDir/, then the xml configuration
file should be in C:/executionDir/AEWRAPPER_CONFIG/aewrapperConfig.xml. The config file root tag is <aewrapper-config>.

Inside this tag there is one mandatory tag named <base-dir>, and two optional tags named <limit-experiment-count> and <limit-vector-ranges>.

The value in the base-dir tag indicates the directory where the cache will be placed. This must be a valid
directory and it is necessary for the proper functioning of the library. Inside this directory we must also place
the "mage-rdf-model-empty.obm" file that comes bundled with this library.

Example of config file (for a windows system with the cache base dir in C:\ArrayExpressWrapper)

<?xml version="1.0" encoding="UTF-8"?>
<aewrapper-config>
<base-dir>C:\ArrayExpressWrapper</base-dir>
</aewrapper-config>

In this example, the file mage-rdf-model-empty.obm should be placed inside C:\ArrayExpressWrapper\ \\\

Another example of config file, including the two optional tags:

<?xml version="1.0" encoding="UTF-8"?>
<aewrapper-config>
<base-dir>C:\ArrayExpressWrapper</base-dir>
<limit-experiment-count>5</limit-experiment-count>
<limit-vector-ranges>100</limit-vector-ranges>
</aewrapper-config>

In this example, with the optional tags we add two restrictions:

- We limit the number of experiments that are retrieved from the array express database to 300. This is to prevent
the retrieval of too many databases in order to solve queries with keywords. For example, if a query is submitted
with the keyword "organism", more than 23000 related experiments are found. Only downloading this amount of
data could take several days. This value limit the number of downloaded experiments for answering a single query

- We limit the number of instances that are loaded from each MAGE-ML model (discarding the rest). This
is useful if we want to execute the software in machines with fairly limited RAM size. With a value of 10000,
the data should fit in a machine with 4GB of RAM. NOTE: adjust your java configuration to accept this amount
of memory. To do this see for example http://www.caucho.com/resin-3.0/performance/jvm-tuning.xtp

Once the configuration file is properly set and placed, we can invoke the Java methods contained in the API. The
software will create a directory called localExps inside the cache directory for storing downloaded data. This
directory can be erased at any moment, thus clearing the cache. In addition, for each query submitted, the API will
create a session directory inside the cache dir, looking something like query_session__2011-06-02--12-36-53__0.
These directories store files created to answer submitted queries, and the result files for such queries.
They can be erased after the results have been acquired.


May 09, 2013, at 05:53 AM by 138.100.11.51 -
Added line 104:
\\
Added line 107:
\\
Added line 111:
--------------------------------------------------- \\
Deleted line 112:
\\
Added line 131:
\\
Added line 134:
\\
Added line 138:
--------------------------------------------------- \\
Deleted line 139:
\\
May 09, 2013, at 05:52 AM by 138.100.11.51 -
Added line 110:
\\
Added lines 133-164:
\\
\\
\\
EXAMPLE 3: \\
// the query to launch \\
// this query retrieves publications from pubmed in which title can be found "wilms tumor". For these publications, related genes from the gene database are extracted, and finally, related publications for each of these genes are retrieved again from pubmed \\
// A limit of 50 results is specified \\
String query = "PREFIX eurdf: <http://RDFEutilsWrapper#>\n" + \\
"SELECT ?pubmed_uid ?pubmed_title ?gene_uid ?pubmed2_uid ?pubmed2_title \\?pubmed2_journal\n" + \\
"WHERE {\n" + \\
" ?pubmed a eurdf:pubmed.\n" + \\
" ?gene a eurdf:gene.\n" + \\
" ?pubmed2 a eurdf:pubmed.\n" + \\
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" + \\
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" + \\
" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" + \\
" ?pubmed eurdf:pubmed_gene ?gene.\n" + \\
" ?gene eurdf:gene_UID ?gene_uid.\n" + \\
" ?gene eurdf:gene_pubmed ?pubmed2.\n" + \\
" ?pubmed2 eurdf:pubmed_UID ?pubmed2_uid.\n" + \\
" ?pubmed2 eurdf:pubmed_TITL ?pubmed2_title.\n" + \\
" ?pubmed2 eurdf:pubmed_JOUR ?pubmed2_journal.\n" + \\
"\n" + \\
" FILTER (?pubmed_auth = \"russ altman\").\n" + \\
"}\n" + \\
"LIMIT 50"; \\
\\
// NCBI2RDF is invoked \\
String resultPath = Controller.launchQueryGetPath(query); \\
\\
// The results are generated in a file located in .\EutilsWrapper\Results\results_"currentdate".xml \\
System.out.println("Results are in " + resultPath); \\
May 09, 2013, at 05:48 AM by 138.100.11.51 -
Changed lines 107-119 from:
System.out.println("Results are in " + resultPath); ||











to:
System.out.println("Results are in " + resultPath); \\
\\
\\
EXAMPLE 2: \\
// the query to launch \\
// this query retrieves publications in which "russ altman" in one of the authors, and the publications have related entries in the gene database. In each case, the publication uid and title, and the gene uid are retrieved \\
// the limit of retrieved results is set to 20 \\
String query = "PREFIX eurdf: <http://RDFEutilsWrapper#>\n" + \\
"SELECT ?pubmed_uid ?pubmed_title ?gene_uid\n" + \\
"WHERE {\n" + \\
" ?pubmed a eurdf:pubmed.\n" + \\
" ?gene a eurdf:gene.\n" + \\
" ?pubmed eurdf:pubmed_UID ?pubmed_uid.\n" + \\
" ?pubmed eurdf:pubmed_TITL ?pubmed_title.\n" + \\
" ?pubmed eurdf:pubmed_AUTH ?pubmed_auth.\n" + \\
" ?pubmed eurdf:pubmed_gene ?gene.\n" + \\
" ?gene eurdf:gene_UID ?gene_uid.\n" + \\
"\n" + \\
" FILTER (?pubmed_auth = \"russ altman\").\n" + \\
"}\n" + \\
"LIMIT 20"; \\
// NCBI2RDF is invoked \\
String resultPath = Controller.launchQueryGetPath(query); \\
// The results are generated in a file located in .\EutilsWrapper\Results\results_"currentdate".xml \\
System.out.println("Results are in " + resultPath); \\
||











May 09, 2013, at 05:46 AM by 138.100.11.51 -
Changed lines 107-119 from:
System.out.println("Results are in " + resultPath); \\











to:
System.out.println("Results are in " + resultPath); ||











May 09, 2013, at 05:45 AM by 138.100.11.51 -
Deleted lines 88-98:










Added lines 90-120:
||EXAMPLE 1: \\
// the query to launch \\
// this query asks for publications in PubMed with the general search term "dietary probiotics", and extracts the UID, title and journal from the retrieved publications. No limit is specified, so a maximum of 100 results are retrieved \\
String query = "PREFIX eurdf: <http://RDFEutilsWrapper#>\n" + \\
"SELECT ?p1_uid ?p1_titl ?p1_jour\n" + \\
"WHERE {\n" + \\
" ?p1 a eurdf:pubmed.\n" + \\
" ?p1 eurdf:pubmed_ALL ?p1_all.\n" + \\
" ?p1 eurdf:pubmed_UID ?p1_uid.\n" + \\
" ?p1 eurdf:pubmed_TITL ?p1_titl.\n" + \\
" ?p1 eurdf:pubmed_JOUR ?p1_jour.\n" + \\
"\n" + \\
" FILTER (?p1_all = \"\\\"dietary probiotics\\\"\").\n" + \\
"}"; \\
// NCBI2RDF is invoked \\
String resultPath = Controller.launchQueryGetPath(query); \\
// The results are generated in a file located in .\EutilsWrapper\Results\results_"currentdate".xml \\
System.out.println("Results are in " + resultPath); \\












|| border=1
May 09, 2013, at 05:42 AM by 138.100.11.51 -
Changed lines 61-63 from:
query: a SPARQL query \\
returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL \\
query, or 100 if no limit was indicated in the query \\
to:
- query: a SPARQL query \\
- returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL query, or 100 if no limit was indicated in the query \\
Changed lines 65-67 from:
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
query: a SPARQL query \\
returns a Results object for reading the query results \\
to:
- Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
- query: a SPARQL query \\
- returns a Results object for reading the query results \\
Changed lines 70-72 from:
Performs a query and retrieves the results as a SPARQL Results file \\
query: a ConceptsQuery object containing the query to perform \\
returns the path to the generated SPARQL Results file. This file will contain 100 results \\
to:
- Performs a query and retrieves the results as a SPARQL Results file \\
- query: a ConceptsQuery object containing the query to perform \\
- returns the path to the generated SPARQL Results file. This file will contain 100 results \\
Changed lines 75-79 from:
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
query: a ConceptsQuery object containing the query to perform \\
returns a Results object for reading the query results ||

to:
- Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
- query: a ConceptsQuery object containing the query to perform \\
- returns a Results object for reading the query results ||

May 09, 2013, at 05:41 AM by 138.100.11.51 -
Changed line 60 from:
Performs a query and retrieves the results as a SPARQL Results file \\
to:
- Performs a query and retrieves the results as a SPARQL Results file \\
May 09, 2013, at 05:41 AM by 138.100.11.51 -
Changed line 65 from:
- public static Results launchQueryGetResults(String query); \\
to:
public static Results launchQueryGetResults(String query); \\
Changed line 70 from:
- public static String launchQueryGetPath(ConceptsQuery query); \\
to:
public static String launchQueryGetPath(ConceptsQuery query); \\
Changed line 75 from:
- public static Results launchQueryGetResults(ConceptsQuery query); \\
to:
public static Results launchQueryGetResults(ConceptsQuery query); \\
May 09, 2013, at 05:40 AM by 138.100.11.51 -
Changed line 59 from:
|| - public static String launchQueryGetPath(String query); \\
to:
||public static String launchQueryGetPath(String query); \\
May 09, 2013, at 05:38 AM by 138.100.11.51 -
Added lines 1-2:
! Introduction
Changed lines 10-173 from:
For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es
to:
For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es


! Downloads

The source code of NCBI2RDF is available in the following ftp server:

http://www.bioinformatics.org/ftp/pub/ncbi2rdf/

There is a README.TXT file in the ftp which explains how to use the library, plus several examples and precompiled jars, documentation and configuration files needed for installation.

Please read the README.TXT file contained in this ftp to learn the purpose of the available files, or read the next subsection.


! Library instructions

!! Introduction

The NCBI2RDF tool is a Java-based API for enabling RDF-compliant access to the NCBI databases. It offers a programmatic interface for posing
queries in SPARQL and receiving the results in SPARQL Results format. The API is quite straightforward to use, and its functionallity can be
easily understood by looking at the provided examples.


!! Tool installation

The API can be used in a standalone Java application. All its functionality is bundled in the JAR that can be downloaded at
the following web page: http://www.bioinformatics.org/ftp/pub/ncbi2rdf/.

The tool installation includse the following files:

- README.txt: this file
- NCBI2RDF.jar: the Java library containing all the tool code (including third-party libraries)
- JavaDoc.rar: the Javadoc documentation of the API
- examples.rar: a set of three examples in Java
- RDFSchema.rdf: the RDF schema that NCBI2RDF generates and that represents the available data in NCBI
- ConfigFiles.rar: this archive file contains a set of XML configuration files which NCBI2RDF needs in order to correctly work


To use the API in a Java project:

i) Download and decompress ConfigFiles.rar in the root directory of your Java project. This will create a directory called
EutolsWrapper, with three more directories containing the XML configuration files in it. These files must be placed there whenever the NCBI2RDF
API is invoked.

ii) Download and import the NCBI2RDF jar library and use the public class es.upm.gib.eutilsrdfwrapper.Controller. This class
offers a series of static methods for performing RDF-compliant queries over the NCBI databases, described below.


|| border=1
|| - public static String launchQueryGetPath(String query); \\
Performs a query and retrieves the results as a SPARQL Results file \\
query: a SPARQL query \\
returns the path to the generated SPARQL Results file. This file will contain as many results as indicated in the LIMIT element of the SPARQL \\
query, or 100 if no limit was indicated in the query \\
\\
- public static Results launchQueryGetResults(String query); \\
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
query: a SPARQL query \\
returns a Results object for reading the query results \\
\\
- public static String launchQueryGetPath(ConceptsQuery query); \\
Performs a query and retrieves the results as a SPARQL Results file \\
query: a ConceptsQuery object containing the query to perform \\
returns the path to the generated SPARQL Results file. This file will contain 100 results \\
\\
- public static Results launchQueryGetResults(ConceptsQuery query); \\
Performs a query and retrieves the results as a Results object which allows retrieving the results as an iterator \\
query: a ConceptsQuery object containing the query to perform \\
returns a Results object for reading the query results ||


As can be seen, the first method in the list admits a String parameter which must be a SPARQL-compliant query. This query should conform the
provided RDF schema in order to generate results. This method generates a file in SPARQL Results format and returns its path.

The other methods offer different formats for specifying the query or obtaining the results. The ConceptsQuery class offers a programmatic way
of defining queries to the system. Results class offers a programmatic way to retrieve results related to a posed query (it offers the methods
hasNext and nextRow to iterate through the query results).

It is recommended to check the attached examples to see how the API is invoked with some sample queries.












|| border=1
||String query = "PREFIX base: <http://aewrapper#>\n" + \\
"SELECT ?id ?name ?desc_text\n" + \\
"WHERE {\n" + \\
"?exp base:identifier_string ?id .\n" + \\
"?exp base:name_string ?name .\n" + \\
"?exp a base:Experiment.Experiment .\n" + \\
"?exp base:descriptions ?desc .\n" + \\
"?desc base:text_string ?desc_text .\n" + \\
"}"; \\
String experimentId = "E-GEOD-1509"; \\
// resultFile will contain the path to the file containing the results in SPARQL format \\
String resultFile = QueryProcessor.processQuery(query, experimentId); ||

In this code, the API is invoked with a SPARQL query (String) and a single experiment id (String). RDFbuilder translates
the data of the specified experiment into RDF and performs the given SPARQL query, producing a file with SPARQL results
format. There are some more options available when performing queries, such as the possibility of specifying a set
of keywords instead of a single experiment. Please refer to the JavaDoc to get further details.

The API makes use of the disk drive of the computer where it executes to cache data from ArrayExpress. The directory
for this cache is configurable through an xml configuration file. This configuration file must be named
aewrapperConfig.xml, and must be placed inside a directory named /AEWRAPPER_CONFIG, which must be inside the
base execution directory. For example, if our base execution directory is C:/executionDir/, then the xml configuration
file should be in C:/executionDir/AEWRAPPER_CONFIG/aewrapperConfig.xml. The config file root tag is <aewrapper-config>.

Inside this tag there is one mandatory tag named <base-dir>, and two optional tags named <limit-experiment-count> and <limit-vector-ranges>.

The value in the base-dir tag indicates the directory where the cache will be placed. This must be a valid
directory and it is necessary for the proper functioning of the library. Inside this directory we must also place
the "mage-rdf-model-empty.obm" file that comes bundled with this library.

Example of config file (for a windows system with the cache base dir in C:\ArrayExpressWrapper)

<?xml version="1.0" encoding="UTF-8"?>
<aewrapper-config>
<base-dir>C:\ArrayExpressWrapper</base-dir>
</aewrapper-config>

In this example, the file mage-rdf-model-empty.obm should be placed inside C:\ArrayExpressWrapper\ \\\

Another example of config file, including the two optional tags:

<?xml version="1.0" encoding="UTF-8"?>
<aewrapper-config>
<base-dir>C:\ArrayExpressWrapper</base-dir>
<limit-experiment-count>5</limit-experiment-count>
<limit-vector-ranges>100</limit-vector-ranges>
</aewrapper-config>

In this example, with the optional tags we add two restrictions:

- We limit the number of experiments that are retrieved from the array express database to 300. This is to prevent
the retrieval of too many databases in order to solve queries with keywords. For example, if a query is submitted
with the keyword "organism", more than 23000 related experiments are found. Only downloading this amount of
data could take several days. This value limit the number of downloaded experiments for answering a single query

- We limit the number of instances that are loaded from each MAGE-ML model (discarding the rest). This
is useful if we want to execute the software in machines with fairly limited RAM size. With a value of 10000,
the data should fit in a machine with 4GB of RAM. NOTE: adjust your java configuration to accept this amount
of memory. To do this see for example http://www.caucho.com/resin-3.0/performance/jvm-tuning.xtp

Once the configuration file is properly set and placed, we can invoke the Java methods contained in the API. The
software will create a directory called localExps inside the cache directory for storing downloaded data. This
directory can be erased at any moment, thus clearing the cache. In addition, for each query submitted, the API will
create a session directory inside the cache dir, looking something like query_session__2011-06-02--12-36-53__0.
These directories store files created to answer submitted queries, and the result files for such queries.
They can be erased after the results have been acquired.



! Contact

For any comments, questions or suggestions, please write an email to aanguita@infomed.dia.fi.upm.es
May 03, 2013, at 10:35 AM by 138.100.11.51 -
Added lines 4-8:

The tool is free to download from http://www.bioinformatics.org/ftp/pub/ncbi2rdf/


For any questions, contact Alberto Anguita at aanguita@infomed.dia.fi.upm.es
May 03, 2013, at 10:32 AM by 138.100.11.51 -
Added lines 1-3:
This is the main project page for NCBI2RDF. The NCBI2RDF tool is a Java-based API for enabling RDF-compliant access to the NCBI databases. It offers a programmatic interface for posing queries in SPARQL and receiving the results in SPARQL Results format. The API is quite straightforward to use, and its functionallity can be
easily understood by looking at the provided examples.