.. include:: ../../global.inc .. include:: chapter_numbers.inc .. _manual.tracing_pipeline_parameters: ################################################################################################################# |manual.tracing_pipeline_parameters.chapter_num|: Tracing pipeline parameters ################################################################################################################# * :ref:`Manual overview ` .. index:: pair: pipeline_printout; Manual pair: Tracing pipeline parameters; Manual pair: Debugging; Manual The trickiest part of developing pipelines is understanding how your data flows through the pipeline. In **Ruffus**, your data is passed from one task function to another down the pipeline by the chain of linked parameters. Sometimes, it may be difficult to choose the right **Ruffus** syntax at first, or to understand which parameters in what format are being passed to your function. Whether you are learning how to use **ruffus**, or trying out a new feature in **ruffus**, or just have a horrendously complicated pipeline to debug (we have colleagues with >100 criss-crossing pipelined stages), your best friend is :ref:`pipeline_printout(...) `. **pipeline_printout** displays the parameters which would be passed to each task function for each job in your pipeline. In other words, it traces how each of the functions in the pipeline are called in detail. It makes good sense to alternate between calls to **pipeline_printout** and **pipeline_run** in the development of **Ruffus** pipelines (perhaps with the use of a command-line option), so that you always know exactly how the pipeline is being invoked. ======================================= Printing out which jobs will be run ======================================= **pipeline_printout** is called in exactly the same way as **pipeline_run** but instead of running the pipeline, just prints the tasks which are and are not up-to-date. The ``verbose`` parameter controls how much detail is displayed. :: verbose = 0 : prints nothing verbose = 1 : logs warnings and tasks which are not up-to-date and which will be run verbose = 2 : logs doc strings for task functions as well verbose = 3 : logs job parameters for jobs which are out-of-date verbose = 4 : logs list of up-to-date tasks but parameters for out-of-date jobs verbose = 5 : logs parameters for all jobs whether up-to-date or not verbose = 10: logs messages useful only for debugging ruffus pipeline code Let us take the two step :ref:`pipeline` from the tutorial. :ref:`Pipeline_printout(...) ` by default merely lists the two tasks which will be run in the pipeline: .. image:: ../../images/simple_tutorial_pipeline_printout1.png .. :: :: >>> pipeline_printout(sys.stdout, [second_task]) ________________________________________ Tasks which will be run: Task = first_task Task = second_task ________________________________________ To see the input and output parameters of out-of-date jobs in the pipeline, we can increase the verbosity from the default (``1``) to ``3``: .. image:: ../../images/simple_tutorial_pipeline_printout2.png This is very useful for checking that the input and output parameters have been specified correctly. ============================================= Determining which jobs are out-of-date or not ============================================= It is often useful to see which tasks are or are not up-to-date. For example, if we were to run the pipeline in full, and then modify one of the intermediate files, the pipeline would be partially out of date. Let us start by run the pipeline in full but then modify ``job1.stage`` so that the second task is no longer up-to-date:: pipeline_run([second_task]) # modify job1.stage1 open("job1.stage1", "w").close() At a verbosity of ``5``, even jobs which are up-to-date will be displayed. We can now see that the there is only one job in ``second_task(...)`` which needs to be re-run because ``job1.stage1`` has been modified after ``job1.stage2`` (highlighted in blue): .. image:: ../../images/simple_tutorial_pipeline_printout3.png