Wednesday, August 21, 2013

Utility script for launching bare JAR files

Torsten Seemann compiled a list of minimum standards for bioinformatics command line tools, things like printing help when no commands are specified, including version info, avoid hardcoded paths, etc. These should be obvious to any seasoned software engineer, but many of these standards are not followed in bioinformatics.

#8 on the list was "Don't distribute bare JAR files." This is particularly annoying, requiring a user to invoke the software using something like: java -Xmx1000m -jar /path/on/my/system/to/software.jar . There are a few notable offenders in bioinformatics out there (I'm looking at you, Trimmomatic, snpEff, GATK...).

A very simple solution to the bare JAR file problem is distributing your java tool with a shell script wrapper that makes it easier for your users to invoke. E.g., if I have GATK installed in ~/bin/ngs/gatk/GenomeAnalysisTK.jar, I can create this shell script at ~/bin/ngs/gatk/gatk (replace GenomeAnalysisTK.jar with someOtherBioTool.jar):

Once I make that script executable and include that directory in my path, calling GATK is much simpler:

Yes, I'm fully aware that making my own JAR launcher utility scripts for existing software will make my code less reproducible, but for quick testing and development I don't think it matters. The tip has the best results when JAR files are distributed from the developer with utility scripts for invoking them.

See the post below for more standards that should be followed in bioinformatics software development.

Torsten Seeman: Minimum standards for bioinformatics command line tools

No comments:

Post a Comment

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.