Have you ever faced a situation that you have so many simulations to run, and at some point you forget which one is finished and which one is not? Have you ever log in to the computing cluster many times a day to check the simulation status and restart interrupted simulations? Or perhaps, you have a lot of raw observational data just downloaded from the telescope, and you need to run a pipeline to process them as efficiently as possible.
You are in luck if you are facing these situations. We could let the computer to do these tedious jobs for us! We just need to write a script that repeatedly checks the status of a bunch of simulations or pipelines. If some of them have just finished, then great, we can use the CPU cores that they were using to run new jobs. If they are interrupted, we could probably parse the output file to detect the error, then restart the job accordingly. If all jobs are done, send an email to notify the user.
Simulation Monitor (SiMon) is developed based on this idea. It is implemented in python. Very modular and lightweight. You could easily extend its support to your own code by just telling it how to run your simulations and where to check the status. SiMon also allows you to obtain an overview of all managed simulations. This means that you no longer have to switch back and forth between different directories to check out the simulations manually.
Well, start your simulations before your vacation, and then you have your data right after your vacation!
Source code is available on GitHub: https://github.com/maxwelltsai/SiMon