Bioclipse 2.0 released

On behalf of all Bioclipse developers I am happy to announce the release of Bioclipse 2.0. Bioclipse is a free, open source workbench for the life sciences that provides advanced functionality mainly in cheminformatics (bioinformatics is planned for version 2.1 later this summer). Some major components include a brand new chemical editor for SWT (JChemPaint), interactive 3D visualization of molecules (Jmol), a Molecules Table capable of reading large files, and a powerful backbone in cheminformatics provided by the Chemistry Development Kit (CDK) library.

Figure 1: Screenshot of Bioclipse showing editing of a chemical structure using the new JChemPaint editor.

Bioclipse is a Rich Client for the life sciences that provides the means to run and integrate algorithms and tools in disconnected state, while still taking advantage of remote services if a network connection is available. Built on the famous Eclipse framework, Bioclipse delivers a state-of-the-art plugin architecture which makes it possible to extend it in any direction.

Figure 2: Screenshot of the interactive 3D visualization of a protein using the integrated component Jmol.

All functionality in Bioclipse 2 is available from the GUI as well as a new scripting language based on Javascript. This allows for complete control of the workbench and functionality from scripts, which can be used to automate tasks or reproduce and validate scientific analyses.

Figure 3: All functionality in Bioclipse is available from an integrated scripting language based on Javascript.

Bioclipse 2 can be downloaded from Sourceforge, releases are available for all major platforms. There is an update site where users can install additional functionality (such as Speclipse) and data collections; this is available from the Bioclipse workbench under menu Help > Software updates.

A small installation guide is also provided, but the main documentation for Bioclipse is available from help.bioclipse.net; the same information is also available from within Bioclipse from the menu Help > Help Contents. For general questions there is the bioclipse-users mailing list.

All software contains bugs, and Bioclipse is no exception. However, in contrast to many commercial and closed source initiatives, open source projects generally have a faster bug fixing rate as well as more frequent releases. If you find bugs in Bioclipse, please report them on bugs.bioclipse.net. There is a list of intractable bugs on the Bioclipse development wiki, and also a convenience list for tracking known major bugs.

Bioclipse is an open development that welcome new developers with varying backgrounds. Developers hang out on daily basis on IRC (irc.freenode.net, channel #bioclipse), and can also be reached via the mailing list bioclipse-devel.

Thanks to all contributors who made this release possible!


Working with large SDFiles in Bioclipse

I decided to test the performance of Bioclipse 2 (current release 2.0.0RC5) for working with large structural files (SDFiles). I first loaded in the complete ChEBI (Chemical Entities of Biological Interest) which consisted of 13.486 chemical structures and has a file size of 54 MB on disk. This was very fast, Bioclipse indexed and opened the file in less than 4 seconds, and then continued to parse the properties in the background for another 4 seconds (but during this time it is possible to browse and work with the structures). The MolTable editor was very responsive and scrolls nicely.

Figure 1: Screenshot from Bioclipse with the entire Chebi SDF open.

To really push Bioclipse, a test file of the first 225.000 compounds in Pubchem were concatenated, resulting in an SDFile of size 1.1 Gb. Bioclipse creates an index of the file and opens it in 66 seconds. It then continues parsing the properties in the background, which takes another 78 seconds. The MolTable editor was still very responsive and scrolls nicely. Not bad for such a large file!

Calculating InChI on the >1Gb file on the open file in MoleculesTable (resulting in all InChI properties kept in memory) took 13.20 min. Trying to save the resulting file took 2 min 49 seconds for the first 20Mb, extrapolated to 2h and 20 minutes for the total (this forces a complete save of all chemical structures and a lot of swapping in and out from disc). Calculating the same InChi and saving to file but not opening it in MolTable first (avoiding keeping all properties in memory) took 20 minutes. What do we learn from this? Browse large files is fine, but if you want to manipulate them then, do this on the file directly without visual inspection.

As a side note: Handling large SDFiles is generally not a recommended solution. When StructureDB (a relational database for chemistry) is released for Bioclipse, we will see a dramatic performance boost when dealing with large collections of molecules.


Bioclipse 2.0 Release Candidate 5

Today, Bioclipse 2.0 Release Candidate 5 (versioned 2.0.0.RC5) was released with primarily a fix in the atom typing done when editing chemical structures, and a less stricter handling of SDFiles. The Bioclipse help is also available as standalone. The release requires a fresh download from Sourceforge, and we kindly ask beta-testers for bug reports on the bugs.bioclipse.net.