[Bio-linux-devel] Best method to build a regularly updated bioinformatics platform

Tim Booth avarus at fastmail.fm
Mon Oct 31 20:15:05 EDT 2016


Hi Burke, Tony, all,

I'm delighted to see this discussion starting up regarding potentially
re-invigorating Bio-Linux, or something inspired by it. Getting to grips
with bioinformatics continues to be a trying experience for many
biologists. The array of tools and technologies is bewildering; the
learning curves steep. I firmly believe that providing an appealing,
user friendly, pre-configured platform that can be put in front of
complete beginners, yet with the power of a full-fledged Linux
workstation, is a no-brainer in terms of overall return on investment.
One or two dedicated developers, working in concert with the community,
can effectively support hundreds if not thousands of users around the
world. I hope people on this list will work with Tony to help him secure
funding,  take up the reins, and build a developer community.

As I worked on the old Bio-Linux project for so many years I've
developed some strong opinions, which I'll share freely, but please
don't imagine I'll be upset if people disagree or think they can see a
better way. Laziness, impatience and hubris are after all the cardinal
virtues of a good developer.

Specific comments follow...

> 1. I appreciate the use of Ubuntu but am not a fan of the Unity desktop.
> I find the Elementary OS (https://elementary.io, based on Ubuntu from
> what I can read) to be more intuitive for new users and users in general.
> Perhaps we can have a discussion around the possibility of setting
> Elementary OS as the default OS.

As Luca commented, this might be a risk. However, I'm a great believer
in "eating your own dogfood" - use the things you make, and give users
what you'd want for yourself, even if it means a little extra effort. So
if you really like Elementary, put it in. Once you understand how the
desktop packages fit into the system, and about things like GSettings
and DBUS sessions and freedesktop/XDG integration, maintaining a custom
desktop is not as onerous as it might seem.
 
Personally, after forcing myself to use Unity for years, I've now got so
used to it I struggle to work on other desktops.

> b. Yet another alternative is to enable a user to pick a flavor of Ubuntu
> and build biolinux on top of it.

This was already possible with Bio-Linux 8. No doubt the installer could
be refined, but this feature actually drops out for you automatically if
you get your packaging right - you can have a standard Bio-Linux
"product" which for me was the ISO/OVA files but actually anyone who
finds the packages can use as much or as little of the project as they
need, as everything is modular and the packages adhere to the
Debian/Ubuntu standards.

> c. (As I do not know the history of biolinux I hope I am not treading on
> difficult ground here.)

Tread away!

> 2. The idea of creating USB flash drives for people to use their own
> laptops is an attractive option as well as an option when using another
> groups computers’ for training.

There are a few technical caveats when doing this, but in general it's
pretty easy and Tony is very familiar with the ins and outs. If nothing
else, giving out flash drives was always a nifty way to promote the
project, and I always smile when I come across one of our sticks "in the
wild". On thing I will say - get your custom printed sticks from
Flashbay - vastly more reliable than any other companies we tried!

> 4. My initial (hopefully not too naïve) thoughts are to replace the
> single biolinux distribution with a pipeline that can build custom
> packages of biolinux for people’s needs (see next point)

We basically have that already. Each bit of software is in a DEB
package, and you can select and install sets of packages on your stock
Debian/Ubuntu machine. You could achieve what you suggest with the
creation of a few wrapper packages and a small GUI to hep the user
select them. Brad actually did exactly what you are proposing already as
part of the Cloud BioLinux project - categorising the software into
package sets, and going beyond DEB packages to pull in stuff from CRAN,
CPAN, PIP, etc. as well as build-from-source.

One thing to be wary of - I was never able to integrate all the Cloud
BioLinux stuff back into "mainline" Bio-Linux because there was no
reliable provision for updating what you installed over time. This makes
perfect sense on cloud - the instances are basically disposable - but if
Bio-Linux is on your desktop or server you need to maintain a system
over several years and not have it break because an upgrade goes awry.
This took a LOT of my time - and Ubuntu core devs spend a lot of time on
it too. My advice would be that you either make a decision not to
support package upgrades or else you take the time and do it properly -
don't try to cut corners.

> 5. As I think about the areas that we train in I see particular groupings
> of analysis tools that could be offered as a single full biolinux install
> or as separate packages for those offering a specific training. As I see
> the packages, they could include:
> a. Data Science, Biostatistics, Scientific Development
> b. NGS, Microbiome, Sequence Analysis
> c. Clinical Genomics
> d. Phylogenetics
> e. Structural, 3D
> f. Biological Networks, Systems Biology, Downstream Analysis
> g. Workflow (Galaxy, Jupyter Notebook, etc)

I'm not sure how far the ELIXIR tools-and-data-services registry guys
got with categorising the existing packages like this, but you should
check out their stuff before starting from scratch.
Also, if you are gathering tools for a category outside your own
personal field of expertise, you really need to nail down a domain
expert and find out how they really work. Bioinformaticians are horribly
cagey about revealing how they really do things. They'll tell you to use
one tool, and you'll struggle for days trying to get it to work, and
finally they'll admit "well actually I don't use that myself because I
had problems and I use this shortcut and this website and this little
Perl script my masters student wrote, but I'm sure I should be doing it
the proper way like I told you". Please do not drive your users to
despair by falling for this brand of nonsense. Give them the tools they
actually need, not the tools you feel they should need.

> 6. All of this will only be possible (I think) if we automate individual
> steps as much as possible BUT I think the technology exists to enable us
> to do this.

Yes! Virtually all the repetitive stuff is automated if you know where
to look. Debian and Ubuntu have been doing this for YEARS and have
software package updating and continuous integration testing down to a
fine art. And all the facilities of Launchpad.net and Alioth are yours
for the asking. Also, Packer.io is an absolute godsend and allowed me to
build Bio-Linux images for DVD/ISO, USB, VirtualBox, VMWare ESX and
OpenStack in one go. I could easily have done AWS to boot. It's all up
on GitHub - if it's not obvious where something is or what does what
just ask.

Basically, during my time as the main BL developer I spent _much_ more
time on genuine debugging, enhancement and development than routine
updates.

> 7. Finally, I see the following platforms as possible endpoints of this
> effort:
> a. Laptops or desktops (Intel, Not-Apple)
> b. Laptops or desktops (Apple)

Not much difference. Ubuntu runs just dandy on Apple hardware, and is
pretty nippy under Parallels.

> c. Containers

Are you actually using containers? I may be wrong but I generally see
them as things people tell you you should be using, and then when
interrogated more thoroughly it turns out they're not actually using
them now but are "going to soon" (see rant on bioinformaticians and
their little white lies above).

> d. Virtual Machines

Genuinely useful, and pretty easy to support. As noted above, Packer.io
is your friend.

> e. Cloud (Amazon, Open cloud, other)

Lots of potential here and lots of good existing projects to tap into,
but the days when you could spend 30 minutes knocking up something
cloud-related and jump aboard the hype train to publication-land are
over. You'll need a clear plan and an understanding of your use cases.

> I apologize if any of the is no totally coherent, if so it merely
> reflects the current state of it in my head! ☺

Hardly surprising. Software integration is harder than most people
think. I've been at it for years and my default state is still one of
general bafflement punctuated by flashes of clarity.

Best,

TIM


More information about the Bio-Linux-devel mailing list