Towards standards for QSAR data setup

I have started to explore the possibilities of Bioclipse2 when it comes to projects with natures and autobuilds. This gave me the idea of automatic QSAR descriptor calculations, based on mine and Egon Willighagens work with QSAR in Bioclipse 1.x. The idea is to have a file (qsar.xml) that defines molecules and descriptors and might look something like below (note: demonstrational example, info and URL's are made up):

<molecule id="http://www.mol.repo.org/molecule?abc123" ns="myNS">
<molecule id="http://www.mol.repo.org/molecule?abc234" ns="myNS">
<descriptor id="http://www.cdk.sf.net/descriptors/xlogp:implementation1" name="XlogP">
<parameter key="cutoff" value="10">
<descriptor id="http://www.cdk.sf.net/descriptors/ZagrebIndex:02"/ name="Zagreb Index">

The implementation would be an automatic build that calculates all descriptors for each molecules when the qsar.xml changes, but only for deltas (i.e. partial build, do not build already built). This means, add a new descriptor to the qsar.xml and on save it will be calculated for all molecules. This would be done on the background and produce a descriptor matrix (dataset.csv). Any plots of this matrix would also be updated.

I have started to implement this in Bioclipse2 and intend to also create a public repository for storing these QSAR setups.

So, what is missing? Having unique ID's following the REST architecture for descriptors and molecules would make it a standards candidate for QSAR data matrices setup. I shall explore this with the CDK people. Comments on the project's design and implementation are very welcome.

1 comment:

  1. I appreciate open standards in all ways. Please note that not everybody has the same amount of time in knowing the actual code, interfaces, and bindings.
    In other words, if you want feedback from other projects and even commercial partners, you might want to exaplin things in more details and ask for feedback. I think many people might be willingly to contribute, even those not knowing Bioclipse implementation details. People following the regular workshops are probably the most experienced persons, but some other might be able to catch-up at least a little bit, if you just post enough about it !

    On the other hand, just go the pragmatic way and create your own standards. Please just make sure others had a fair chance in giving reasonable feedback in an informed way. This means I personally would appreciate to be informed about any progress in this direction without following the CVS and source code changes. So, please post more about it here or on the Wiki and let people know what is going on.

    I must admit that it is at the moment easier for me adding information to a Wiki then editing source code.