Bioinformatics.org
|
|
Research
|
Online databases
Online analysis tools
Online education tools
|
Development
|
|
Forums
|
News & Commentary
Jobs Forum (Career Center)
|
|
News & Commentary - Message forums
|
|
|
|
URL: Developing a `Customized Search Engine' for sequence mining the entire Web
Submitted by Natarajan Ganesan; posted on Friday, May 18, 2007
|
I want to be able develop an intelligent `data miner' that takes in any kind of input and `mines' the Web. One of the approaches for now would be to develop a Google Co-op and `condition' the query to look into annotated domains. I have created one such CSE which is hosted at some and would like to invite `collaborators' who can suggest the major domains of yet to be featured databases. I wish to particularly cover those site and services that host highly specific and annotated databases like `RNA secondary structure', OR `plant mitochondrial' etc. This tool called `Webseq' is currently hosted at http://natarajanganesan.googlepages.com/instaseq.
Subsequently, however, I wish to move beyond to include raw sequence input into the search box. organize them as `Fasta', `Structure', `literature', etc. Currently my other tool, `InstaSeq' just about manages to do this by parsing the raw sequence before resubmitting to Google to mine the Web. This tool is hosted at http://bioinformatics.georgetown.edu/InstaSeq.htm. Subsequently I hope to combine both the tools.
Collaborators who would like to take part in this veture are requested to mail me with "Webseq - Google Co-op" in the subject header. All collaborators will be duly accredited and acknowledged.
|
|
Expanded view | Monitor forum | Save place
Start a new thread:
You have to be to post a reply.
|
|