BioLegato - Introduction

From Bioinformatics.Org Wiki

Jump to: navigation, search

Click here to go back to: BioLegato



Contents

Introduction

bioLegato is a program that executes other programs, by providing a menu that lets you set program parameters, launch the program, and view the output. In many cases, output also goes to a new bioLegato window, making it possible to do pipelining. That means you can usually run additional programs using the output of one program as the input of the next. In contrast, most browser-based applications display output in a human-readable form that allows no further analysis by other programs.

Biolegato is based on GDE to manipulae the menus. GDE uses a menu description language to define what external programs it can call, and what parameters and data to pass to each function. This language allows users to customize their own environment to suite individual needs. Each step in the process is described in a file .GDEmenus in the user's current or home directory. The language used in this file describes three phases to an external function call. The first phase describes the menu item as it will appear, and the Unix command line that is actually run when it is selected. The second phase describes how to prompt for the parameters needed by the function. The third phase describes what data needs to be passed as input to the external function, and what data (if any) needs to be read back from its output.

There are other applications similar to bioLegato, some of them: JEMBOSS that is a Java front end for EMBOSS; as well as Kaptain extensions to EMBOSS that utilizes Kaptain, a system for generating grsaphic interfaces for commandline programs using grammar scripts; similarly, web-based interfaces to over 200 applications have been generated using Pise, which creates HTML interfaces from XML definitions of program parameters; finally, the Taverna workbench is a Java application in which complex data workflows can be created by linking together icons representing web services available at both local and remote sites.

A comparison of these applications can be seen in Table #1.

Application Menu Files GUI
GDE .GDEmenus X11 (C Languague)
EMBOSS (Kaptain Extension) ACD (Ajax Command Definition) Kaptain (Bash)
JEMBOSS ACD (Ajax Command Definition) Swing (Java)
Pise XML  ?
Taverna BioMoby Swing (Java)
Biolegato .GDEmenu Swing (Java)

There are different software applications with similar goals.

In the case of bioLegato, some features are not satisfied as long as we like in regard with the menus.

A general idea can be seen in http://www.bioinformatics.org/wiki/BioLegato_internal_menu_format_ideas.

A summarizing information is shown in Table #2

Feature Versatility Code CPU RAM Can be realistically done in GDE-style Needs Syntax extension Status
Directory structure Good Good Good Good Yes No Done
Multiple Buttons Good Good Good Good Yes No Done
Object/Sequence Variables Good Good Good Good Yes Yes Not yet
Interpreted Language Very Good Good Bad* Bad* No* Yes Not yet
Conditional Variables Bad Bad Good Good No Yes Not yet
Direct Jar Good Okay Very Good Very Good Yes No Not yet
Tabbed Panes Okay Good Good Good Yes No In progress
Icons Okay Good Good Good Yes No Not yet
Mouse-over help Okay Good Good Good Yes No Not yet
Interactive options Good Very bad  ?  ? No Maybe Not yet
Sanity Checking Very Good Good Good Good Yes No Not yet
Decimal Numbers Good Good Good Good Yes No Not yet
API Very good Medium  ?  ?  ? Yes Not yet


Bibliographical review: What are the main trend in the published articles with the same topic as BIRCH/bioLegato in the last 3 years?

Web Applications:

With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.

Motivation: For the biologist, running bioinformatics analyses involves a time-consuming management of data and tools. Users need support to organize their work, retrieve parameters and reproduce their analyses. They also need to be able to combine their analytic tools using a safe data flow software mechanism. Finally, given that scientific tools can be difficult to install, it is particularly helpful for biologists to be able to use these tools through a web user interface. However, providing a web interface for a set of tools raises the problem that a single web portal cannot offer all the existing and possible services: it is the user, again, who has to cope with data copy among a number of different services. A framework enabling portal administrators to build a network of cooperating services would therefore clearly be beneficial. Results: We have designed a system, Mobyle, to provide a flexible and usable Web environment for defining and running bioinformatics analyses. It embeds simple yet powerful data management features that allow the user to reproduce analyses and to combine tools using a hierarchical typing system. Mobyle offers invocation of services distributed over remote Mobyle servers, thus enabling a federated network of curated bioinformatics portals without the user having to learn complex concepts or to install sophisticated software. While being focused on the end user, the Mobyle system also addresses the need, for the bioinfomatician, to automate remote services execution: PlayMOBY is a companion tool that automates the publication of BioMOBY web services, using Mobyle program definitions. Using a novel data transformation technique to provide the EMBOSS software suite as semantic Web Services. 2007 Ieee International Conference on Bioinformatics and Biomedicine, Proceedings. 2007. pp 117 – 124. The EMBOSS software suite is a well established package consists of over 200 command line programs that can be strung together to perform many common tasks in Bioinformatics. To date, its users must either be proficient at the command-line for batch data processing, or they must use one of several Web-form wrappers for EMBOSS on a one-off basis. To combine the simplicity of Web-based forms with the power of the command line, we are wrapping the entire EMBOSS suite as semantic Web Services (SWS) using a novel framework of declarative rules. These rules describe how ontology-driven data in the BioMOBY SWS XML formal can be converted to the various text formats EMBOSS programs use, and vice versa. The chief benefits of this approach for end-users are 1) they can chose between many BioMOBY client GUIs to create and run EMBOSS-based analysis pipelines and 2) they get automatic interoperability between EMBOSS programs and the 100's of other BioMOBY services that may be useful in answering a research question. The declarative rules approach, using XSLT and a novel "anti-XSLT" language, promotes creating modular, reusable pieces of domain technical knowledge to bridge the Web and the Semantic Web. These rule systems have successfully been applied "in the wild" to both client and server-side software, facilitating integration of bioinformatics tools and data. The framework is Open Source and is freely available from the BioMOBY code repository.


Bio-Programming Languages:

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence. le formats and multiple sequence alignments, dealing with 3D macromolecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning.

Background: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they dIntelligent client for integrating bioinformatics serviceso not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.


Local Applications:

Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks.

Background: Molecular biologists need sophisticated analytical tools which often demand extensive computational resources. While finding, installing, and using these tools can be challenging, pipelining data from one program to the next is particularly awkward, especially when using web-based programs. At the same time, system administrators tasked with maintaining these tools do not always appreciate the needs of research biologists. Results: BIRCH (Biological Research Computing Hierarchy) is an organizational framework for delivering bioinformatics resources to a user group, scaling from a single lab to a large institution. The BIRCH core distribution includes many popular bioinformatics programs, unified within the GDE (Genetic Data Environment) graphic interface. Of equal importance, BIRCH provides the system administrator with tools that simplify the job of managing a multiuser bioinformatics system across different platforms and operating systems. These include tools for integrating locally-installed programs and databases into BIRCH, and for customizing the local BIRCH system to meet the needs of the user base. BIRCH can also act as a front end to provide a unified view of already-existing collections of bioinformatics software. Documentation for the BIRCH and locally-added programs is merged in a hierarchical set of web pages. In addition to manual pages for individual programs, BIRCH tutorials employ step by step examples, with screen shots and sample files, to illustrate both the important theoretical and practical considerations behind complex analytical tasks. Conclusion: BIRCH provides a versatile organizational framework for managing software and databases, and making these accessible to a user base. Because of its network-centric design, BIRCH makes it possible for any user to do any task from anywhere.

Summarizing

Discussed Questions:

  1. Why not try to fix the errors in GDE syntax instead to create a completely new syntax?
    Answers:
    • not many users work with the GDE menus, and most of those who do are trained in computer programming; hence, it will not make a significant difference to introduce a new menu format
    • for simple menus the GDE syntax is sufficient; however, to extend the GDE syntax to support features such as interactive menus would be a nightmare (e.g. Intel's x86 architechture).
    • The GDE syntax allows the user to make many errors, which may go uninterpreted by the computer as such
    • The GDE syntax allows for complex parsing because of the structure of the grammar (i.e. objects and definitions do not have to be bound together)
  2. Why not use some of the existing syntaxes from similar programs like EMBOSS (ACD), Pise (XML), BioMoby (SOAP XML), etc.
    Answers:
    • ACD is not very intuitive to understand
    • Not all of the problems are solved with ACD. We have to extend many of the characteristics to be similar to GDE (e.g. file access, because ACD works with files while GDE works with sequences).
    • ACD also has some object dissociaton.
    • XML can be complex for users (Pise).
    • SOAP XML is even more complex than regular XML
  3. Why not use a syntax similar to computer programming languages (e.g. Perl, Python, Java, C, etc.)?
    Answers:
    • Scripting languages are more complex to parse (i.e. Because they have functions, etc.)
    • Parsing computer languages is much slower than we intend our menu language to be
  4. Why not use CSV?
    Answers:
    • Impossible to create an effective schema that doesn't contain the same problems as GDE's syntax

Conclusion to our questioning:

Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox