log commands - create workflows

man log

log is a simple python script which takes a snapshot of your system before and after you run a command.
It is used by simply prepending 'log' to the beginning of a command, for example: > samtools index data.bam simply becomes, > log samtools index data.bam

In doing so, log will record the following details about the execution event: - the command & parameters - execution start time - username - user permissions (eg. if run as root) - hostname - execution duration - the output - an ID unique to this execution event

log will also try to determine from your command which resources (programs & input files) were: - used - created - modified - deleted both explicitly (mentioned in the command itself), or implicitly (changed on the surrounding filesystem over the course of the execution).
This is accomplished by comparing the MD5 checksums of all the resources before and after execution. We then log all of this information in a graph database that links together input files and execution events. With this graph, all we need is a file's MD5 checksum to find it in our database, and then we can walk through any pipelines that created this file, used this file, modified this file, or deleted this file - when that happened, what the output was, etc etc.

log can also be used to backup all unique resources below a set file size.
This is great for not only backing up gene lists or other small intermittent files during the course of an analysis, but also scripts and programs in various stages of development. With the exact command line parameters and versions of the programs/scripts backed up, whole pipelines can be reprocessed years after they were run with very little effort. Please note that backups are only ever stored locally!

Finally, log offers the user a growing number of helper functions such as: - supress command output (but still log it) - run the command via screen - email the user after execution - call/sms the user after execution All of these can be set to default parameters if desired, or one-off by calling log with parameters itself, such as: > log +call callTo=004412345678 samtools sort data.bam These helper functions are actually so useful, many people run log without logging just for the sake of using these helper functions.

./configure ; make ; make install

In keeping with the AC.GT philosphy, log is incredibly easy to install. - create an account on log.bio - download the latest version of log here - run log for the first time, starting the interactive installer Of course you can customize your installation to run your own authentication server and your own logging databases rather than use the public ones - for instructions on how to do this check out the videos below.

Get involved!

Like every other project we host, log is totally free and opensource.
Anyone can read the code and submit updates & improvements to get recognition in the source code :)
If you know a bit of Python, JavaScript, C, CSS or HTML, check out the code below to submit improvements or, alternatively, get in contact with us via the chat or contact pages to discuss what you want to see in future versions of log!


log code

Written in Python, the log client subprocess commands, tries to identify files created / used / modified / delete, and submits the log to the log server. We desperately need a C programmer to improve the detection of open(2)'d files.

server code

Written in JavaScript for Node.js, the log server is a RESTful API that accepts logs and adds them to a Neo4j graph database. Also requires a bunch of dependancies from npm, but nothing too obscure!

website code

Written in a mix of HTML, CSS, and Angular.js, this is the face of log.bio :)