BRIE and OAP 2001
From Bioinformatics.Org Wiki
Text in Biology: Biological Research with Information Extraction & Open-Access Publications (BRIE & OAP) 2001
A satellite event at the ISMB'01 conference
Date & locale
July 26, 2001
Tivoli Gardens, Copenhagen, Denmark
- With the increasing availability of textual information related to biology, including MedLine abstracts and full-text journal articles, research on information extraction is rapidly becoming an essential component of various bioinformatics applications.
- It is expected that text mining in general and information extraction in particular will provide tools that will facilitate the annotation of vast amounts of molecular information, including gene sequences, transcription profiles and biological pathways.
- Text mining and information extraction have already been successfully used in a number of applications, including the detection of gene and protein interactions, the functional classification of proteins, automatic database-driven sequence annotation and the annotation of transcription profiles from microarray technology.
- Similarly to other areas of data mining, the primary goal of textual information extraction is the detection of linguistic patterns already present in the corpus under investigation. Thus, novel discoveries in biology should be expected from the mere mining of biological text.
- Copyright on scientific communications (published articles and so forth) belongs to publishing companies and not to authors, for most publications. Scientists wishing to share relevant communications, even their own in some cases, face legal challenges from publishers.
- Publishing companies charge expensive subscriptions to access scientific communications. Scientists in developing countries and poorly-endowed institutions, although intellectually on par with their peers, are severely hindered by this.
- These two problems have prevented scientists from gaining any access, even for simple searches, to the full text of these communications.
- Scientific communications are published in journals segregated by topic. This has resulted in confusion as to the best place to publish, retrieve or extract information (e.g., mathematical biology communications could be published in either a mathematical journal or a biological one).
- Communications are also published in journals differing by publisher. This has caused the segregation of communications by the prestige of the journal (e.g., how difficult it is to be published in the journal and the composition of the readership). This has also allowed room for personal politics in scientific communication.
- These two problems are compounded by the first two: with a limited budget, to which journals should one subscribe? What we are left with is an artificial selection, by publishers, of which communications are best suited to a scientist's field of study.
- This may be the result of a competitive marketplace for readership, but is there an alternative to profit-based publications? Should there be? Can an alternative publication model be profitable for a publisher?
- Additionally, even with the advent of computers, databases, and the World Wide Web, scientific communications are published as they were 100 years ago: as linear, printable text. And they are archived this way. While this makes good reading, it is not the best format for information retrieval or extraction.
- All of these problems restrict information retrieval, extraction, and scientific inquiry. How do we resolve them? As the ultimate solution, should future communications be published in an "open-access, global knowledge-base"? Before or after information extraction techniques are applied?
- The BRIE conference is expected to complement two ongoing initiatives in text mining for biology: (1) the natural language processing sessions at the Pacific Symposium for Biocomputing and (2) the recent collective initiative on Natural Language Processing in Biology.
- The primary aim of this conference is to bring together individuals and groups actively involved in text mining for biology.
- We identify several obstacles to information retrieval and extraction: copyright restrictions, costly subscriptions, artificial segregation of communications, and archival of information in a manner not suited for information retrieval and extraction. And we seek to discuss the concept of "open-access publications" and if it is a viable solution to these problems.
- OAP also serves as a "Birds of a Feather" (BoF) meeting for Bioinformatics.Org, an organization committed to freedom and openness in the field of bioinformatics.
- We are seeking abstract submissions in the area of biological discovery using text mining techniques. In particular, we would like to put more emphasis on the use of these techniques in the discovery of highly non-trivial, novel information in biology, including relationships at the molecular, biochemical and cellular levels.
- Abstracts in the following areas are particularly welcome:
- Biological discoveries independently supported by other experimental information using text mining
- Annotated corpora of biological text for the benchmarking of existing methods
- Approaches for database maintenance and integrity using text mining
- Standardization and evaluation of methods for information extraction including algorithms (e.g. full parsers) and databases (e.g. portable ontologies).
- We are seeking several speakers who can address how the above problems might be solved. Topics may include author-owned copyrights, free or inexpensive subscriptions, uniform and multiple categories for communications, and archival of information in a manner suited for information retrieval and extraction, for example, knowledge bases.
- Alfonso Valencia, CNB-CSIC Madrid
- Christos Ouzounis, EMBL-EBI Cambridge
- J.W. Bizzaro, Bioinformatics.Org & University of Massachusetts Lowell
- Thomas Sicheritz-Ponten, The Technical University of Denmark
- 09:00-09:05 - OAP Introduction, J.W. Bizzaro, Bioinformatics.Org, & University of Massachusetts Lowell
- 09:10-09:30 - "The Public Library of Science", Michael Eisen, Lawrence Berkeley National Lab, & University of California Berkeley (abstract)
- 09:35-09:55 - "BioMed Central - Open-Access Publishing in the Real World", Matthew Cockerill, BioMed Central (abstract)
- 10:00-10:15 - Coffee Break
- 10:15-10:35 - "Information Extraction Using Open-Access Publications", Jan Komorowski, Norwegian University of Science and Technology, (abstract)
- 10:40-11:00 - OAP Panel Discussion
- 11:00-11:05 - BRIE Introduction, Alfonso Valencia, CNB-CSIC
- 11:10-11:30 - "Current Research on 'Smart Documents' in the Biological Knowledge Laboratory", Robert Futrelle, Northeastern University, (abstract)
- 11:35-11:55 - "Learning Information Extractors from Approximately Correct Patterns and Weakly Labeled Training Data", Mark Craven, University of Wisconsin (abstract)
- 13:00-13:20 - "A Generic Statistical Method for Information Extraction in Genomics", Violaine Pillet, INRIA Rhone-Alpes, (abstract)
- 13:25-13:45 - "Information Extraction with Hidden Markov Models", Eom Jae-Hong, Seoul National University, (abstract)
- 13:50-14:10 - "The Frame-Based Module of the Suiseki Information Extraction System", Christian Blaschke, CNB-CSIC (abstract)
- 14:15-14:30 - Coffee Break
- 14:30-14:50 - "Supporting Discovery in Biomedicine by Association Rule Mining of Bibliographic Databases", Dimitar Hristovski, University of Ljubljana (abstract)
- 14:55-15:15 - "A Suite of Tools to Mine Relations of Nuclear Receptors and Cofactors from Scientific Literature", Dietrich Schuhmann, LION Bioscience AG (abstract)
- 15:20-15:40 - BRIE Panel Discussion