Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer).
What if your data analysis on a remote server takes several hours, days, or even weeks, to finish? No worries, you don’t need to be connected to the remote server while the data are being analysed. Here, I introduce you to the tools that allow you to start an analysis, disconnect from the server, and then look at the progress or the results at a later time point.
The nohup
tool allows you to run a process in the background; which
means that, while the analysis is running, you can do other tasks in
parallel or log off from the remote server.
Imagine the nohup
tool as a bracket which encloses the command that
you want to run in the background
nohup ... &
Always, nohup
precedes and &
follows the command that you want to
run in the background (here shown as ...
). Let’s say you want to run
the command ls -lhcrt
(which lists all files and subdirectories in
your current directory) in the background.
nohup ls -lhcrt &
When you hit ENTER, the terminal prints out some information:
[1] 21118
nohup: ignoring input and appending output to 'nohup.out'
The number 21118
(which will differ in your case) in the first line
is the process-ID of your background-process. The second line informs you that
all ‘results’, that would be normally printed in the terminal window,
are now redirected to the file nohup.out
.
Let’s first have a closer look at the process-ID. What’s the use of this number?
If you have started a process that takes several hours - or longer -
to finish, then you can use the process-ID to see if the process is
still running. For this, you can use the ps
command with the -p
option, which reports the status of a process with a certain process
ID. To see the status of the process I have started above, I would
use:
ps -p 21118
The output is
PID TTY TIME CMD
Since this is only the header line of the process specifications, the process must have finished. Here:
PID
indicates the process-IDTTY
indicates the controlling terminalTIME
shows the time that the process is running alreadyCMD
shows the command nameIf the process would still run, you would get a line similar to:
PID TTY TIME CMD 21118 ? 00:00:04 ls
The process-ID allows you to cancel the process before it
finishes. To cancel the process comes in handy when you figure out
that you started it with wrong parameters or input files and you want
to re-start it with different settings. The kill
command allows you
to cancel a specific project.
kill 21118
This would cancel the process that we started before in the
background. If you can’t remember the process-ID but want to cancel
all ls
processes, then you could use the pkill
command in the
following way:
pkill ls
Compared to the kill
command, the pkill
command allows you to
specify the command-name instead of the process-id of the running
process that you want to cancel.
By default, the nohup
command redirects all information from the
terminal window to the nohup.out
file. If the file exists already,
it will not be overwritten. All new information will be appended to
the end of the file. With the >
operator, you can redirect the
output to a different file. For example, to redirect the output of the
ls
command to the file Directory-Listing.txt
, I use the command
nohup ls -lhcrt > Directory-Listing.txt &
So, the redirecting-operator (>
) is followed by the name of the
target file and precedes the closing &
operator of the nohup
command. If you want to save the output to a file in a different
directory, just specify the entire file-path that precedes your target
file, like:
nohup ls -lhcrt > /home/alj/Documents/DirectoryListing.txt &
The screen
tool provides another way (than nohup
) to continue
running a process on a remote server when you log off, or to run
different processes in parallel.
You can imaging the screen
tool as a command-line way to open
different terminals as ‘sessions’ in parallel and running different
processes in each of them. To start or open a new session, you can use:
screen -S Testscreen
The option -S
allows you to set a name (here Testscreen
) to the
session. Once you hit ENTER, you will be faced with a new (clean)
terminal window. This is your Testscreen
-session. You can execute
any commands in it and while a process is running, you can detach from
the session by pressing first ‘CTRL+A’ on you keyboard, then hitting
the letter ‘d’ (for ‘detach’). OK, I get the information
[detached from 970.Testscreen]
This means that I detached from the Testscreen
-session that has the
process-ID 970. The processes in this session, however, still continue
to run. You can log off from the remote server and get back to the
Testscreen
-session when you log in next time. That is very handy.
To get an overview of all sessions that are running in parallel, use:
screen -ls
In my case, I get:
There are screens on:
970.Testscreen (14. feb. 2015 kl. 19.53 +0100) (Detached)
31995.pts-9.alj-Inspiron-5537 (14. feb. 2015 kl. 19.47 +0100) (Detached)
2 Sockets in /var/run/screen/S-alj.
You see that I have two sessions running, both are detached. To
re-attach to our Testscreen
-session, just enter:
screen -r Testscreen
The option -r
(for re-attach) is followed by the name of the session
that you would like to re-attach to.
When checking now the running sessions, I get:
screen -ls
There are screens on:
970.Testscreen (14. feb. 2015 kl. 19.53 +0100) (Attached)
31995.pts-9.alj-Inspiron-5537 (14. feb. 2015 kl. 19.47 +0100) (Detached)
2 Sockets in /var/run/screen/S-alj.
To stop a session, you have two options. Either, attach to the session
and enter quit
in the terminal window, or use the kill
command
with the process-ID of the session. To stop the Testscreen
-session,
for example, I would use
kill 970
When using the screen
tool, be aware that, compared to the nohup
tool, all results are printed to the session’s terminal - not to a
file.