[Bioclusters] Java Vs C++(Qt) for Bioinformatics

Aaron Darling darling at cs.wisc.edu
Thu May 24 12:21:54 EDT 2007


Hello Aijaz,

Mr. Syed Aijaz wrote:
> Hello All,
>
> Just wondering what bioinformatics community thinks of is best to use:
> 1. Java Swings (1.6+)
> 2. C++ Qt (4.0)
>
> My visualziation tool requires accessing data which is in the order of
> few hundred MBs we are expecting this to hit GBs soon. I am planning
> not to hold up all the data. However, I will have to hold up some data
> (a few hunderds of thousands (O(100,000)) of data entities, each costing
> around ~60 bytes). As the tool is supposed to be a interactive, what will
> be good alternative between Java Vs C++? I am leaning towards Java,
> reason being:
> 1. Comprehensive GUI
> 2. Java not that Slow, as they say!
> 3. Huge API, DBMS, XML, DRMAA, . . . . .
> 4. No deployment pain, although a little application
>   specific deployment may be required example: preference files etc
> 5. Automated Garbage collection, less trouble in maintaining memory.
>   Although it has a little overhead, it can be reduced by efficient
> handling of data???
> 6. efficient multi threading, not system level fork, etc??????
> 7. Java has growing number of Bioinformatics applications
>


Having implemented several c++ programs and a bioinformatics data viz 
tool in Java (Mauve), I would agree with your logic behind favoring 
Java.  In my experience, it's not necessarily Java itself that's slow, 
it's often how Java is used that makes it slow.
Garbage collection and object allocation are slow, so if you have to 
repeatedly allocate objects for each of your 60-byte data entities then 
performance will likely suffer.  Often that can be avoided by 
pre-allocating all necessary storage to avoid runtime penalties during 
interactive usage.  The other issue with allocating huge numbers of 
small objects is the tremendous memory overhead required to track the 
objects.  This isn't really a Java-specific problem, I think it exists 
any high-level language.  In any case, if your collection of 60-byte 
data objects can be somehow flattened into arrays of an integral type 
like int or long, much of the memory overhead can be avoided, and if the 
data is accessed in linear scans, cache performance may improve as 
well.  For huge data sets memory-mapped I/O may be helpful, depending on 
the I/O pattern.
When working with Swing GUIs, be sure to test your code on each platform 
since the widgets appear slightly differently.

-Aaron


More information about the Bioclusters mailing list