nettee (1) Linux Manual Page
NAME
- nettee – a network "tee" program
SYNOPSIS
nettee [options]
DESCRIPTION
nettee passes a data stream to one or more child nodes using a daisychain method. On each node nettee may also direct the stream to a file or pipe. nettee allows large amounts of data to be quickly distributed to multiple nodes on a network at a rate limited only by the network bandwidth. The distribution chain is typically linear for each network switch but may branch when nodes utilize multiple switches. For maximum throughput only one instance of nettee should utilize each network interface.
When nettee starts it waits for a connection from the upstream node before attempting to connect to its downstream nodes. Consequently nettee may be started on the nodes in any order (by a script, rsh, ssh, and so forth.) Typically only the node that reads the data stream for stdin or a file will be set to log messages, so that the progress of the transfer may be monitored. Transmission errors are detected by comparing the total number of bytes read by each child node with the number of bytes transmitted to that child.
Error Handling
By default severe errors cause the entire chain to abort. By utilizing the -conwf and -colwf options nettee may be instructed to do its best to continue processing in the event of certain write failures of the data stream. Note that failures which occur while the distribution chain is forming are still fatal events. To allow the program to continue with a truncated or alternate chain if chain formation errors are encountered utilize the -connf option, and optionally specify alternate targets in each hostlist. If the node above the failed node is allowed to emit messages and errors ( for instance: -v 5 ) messages similar to these will be sent to the log destination ( -log ):
Failures detected in child 0 [node34]: NWF
Failures detected in child 1 [node35]: NONE
Failures detected in chain: NWF
The first type of message describes the failures that were detected in the named child node, that is, those named in the -next option. The second message describes failures that were detected anywhere further on in the chain. The error codes currently defined are: NONE no errors, NWF network write failure, LWF local write failure, BBC child returned incorrect byte count, BSTAT child returned unknown or bad status, and NNF could not connect to (one or more) downstream chain nodes.
Exit Status
nettee will normally emit an EXIT_SUCCESS status. (0 on Unix.) This is true even if the errors were detected and handled in the node itself or in a child node. nettee will emit an EXIT_FAILURE status if it was forced to close by an unhandled event such as a timeout, write failure, or unexpected socket closure.
OPTIONS
-h- Print help information.
-hexamples- Print examples.
-herrors- Print error status codes.
-i- Print version, license, and copyright information.
-in<SRC>- Reads data from <SRC> which may have one of three values:
netteereads from the upstream node;–reads from stdin;socketread the output of a command from a socket;filenamereads from a file. If no-inoption is present the programs reads data from the upstream node. -out<DST>- Writes data locally to <DST> which may have one of three values:
nonewrites nothing locally;–writes to stdout;socketwrite the datastream to a command through a socket;filenamewrites to a file. If no-outoption is present the program writes data to stdout. -next<HOSTLISTS>- Writes data to downstream destination[s]
hostlist1(,hostlist2(,hostlist3(…)))where the hostlist entries are separated by commas or spaces. A hostlist consists of either a single hostname, or a comma separated list of hostnames enclosed in square brackets. Example:node1,[node2,node3],[node4,node5,node6],node7.The bracketed form allows for automatic failover if unreachable nodes are encountered and if-connfis specified. The first hostname in the list is tried, then the next, and so on. There may be 1-8 hostlists. The number of hostlists controls the topology of the distribution chain. Use a linear distribution chain (a single hostlist) when all nodes share a single network switch. Use a forked distribution chain (multiple hostlist) when nodes are connected to two or more network switches. The End of Chain condition (no downstream write) is indicated by a <HOSTS> value of., , or_EOC_.An End of Chain condition is also indicated by the absence of an-nextoption. If End of Chain is indicated there may not be any other hostslists specified. -cmd<COMMAND>- Specifies the command to use in conjunction with an
-in socketor-out socketoption. Since only a single<COMMAND>may be specifiedsocketmay not be applied to both-inand-outat the same time. When-cmdis used with-in socketa child process running<COMMAND>reads data from a disk or other device and writes the resulting data stream to stdout. When-cmdis used with-out socketa child process running<COMMAND>reads the datastream from stdin and writes the processed data to a disk or other device. Typically the<COMMAND>string invokestaror some other archiving program. In some instances using sockets and-cmdwill be faster than using the same command in a pipe due to the larger buffer size used for the socket. Runnettee -hexamplesto see a usage example. -stm<EOS>- stream text through a
netteechain until the string<EOS>is encountered, then exit. This allows short text messages to traverse the chain without waiting for a buffer to fill. Since the text message can very rapidly traverse thenetteechain it can be piped intoexecinput(or any other program that will execute its stdin as commands) to produce essentially simultaneous execution on all target nodes. The<EOS>string is not passed through the data chain and its length is ignored. When used to start furthernetteeprocesses on the target nodes<PORT>values must be chosen to avoid interference. While this mode may be convenient for setting up Beowulf nodes it is exceedingly dangerous for general use since any command introduced into the command stream will execute on all chain nodes as if submitted by the owner of the nettee process on that node. Runnettee -hexamplesto see a usage example. -name<STRING>- Specify the node name used in messages (<=127 characters). If not supplied the values of the environmental variables
MYHOSTNAMEandHOSTNAMEare first checked, and if those are not defined, the result of agethostname()call is used. -log<LDST>- Errors and messages are written to <LDST> which may have one of two values:
–writes to stderr orfilenamewrites to a file. If no-logoption is present the program writes messages to stderr. -p,-port<PORT>- First of two consecutive ports use for communication. If no
-portoption is present the program uses the default value of 9997. -v<VERBOSE>-
<VERBOSE>is a bit mask which controls the types of warning and error messages which are sent to the-logdestination. Bit values indicate:1show error messages;2show command line settings;4show messages;8show periodic status messages during transfer;16prepend nodename to all messages. Use a<VERBOSE>value of 0 to eliminate all messages. If no-vis present the program uses a default<VERBOSE>value of 1. -q- Suppresss "ignored signal" messages.
-t<WAIT>- Wait up to
<WAIT>seconds for a connection from upstream in the chain to form or data to be received. If neither of these events occur exit with an error. A value of0waits forever and will only exit on an end of data condition. If no-tis present the program uses a default<WAIT>value of 0. The-iconnf<WAIT> and-woptions control timeouts for downstream connections. -w- Wait for the next node to boot or attach to the network. If not specified and the next node is not reachable
netteewill exit with an error no matter what the-t<WAIT> and-iconnf<WAIT> timeout values are. -colwf- Continue on Local Write Failure. Normally the failure of a write of the data stream to the local output will be fatal and the entire distribution chain will collapse immediately. (Typically this happens when data is written to disk and a partition fills or there is an ownership problem. A complete disk failure may initially present this way but often goes on to crash the node, resulting also in a network write failure.) When
-colwfis set and a local write failure occurs on a node that node will continue to relay data down the chain. The node that failed will not have correctly processed the data stream locally but all other nodes will be unaffected by this failure. The top node will emit an error message when this occurs so that a subsequent analysis with other tools may locate the node(s) which failed. This option may only be employed on a node that reads data from an upstream node.
-conwf- Continue on Network Write Failure. Normally the failure of a write of the data stream to the next node will be fatal and the entire distribution chain will collapse immediately. (Typically this happens when a node crashes while nettee is running.) When
-conwfis set and a network write failure occurs on a node (indicating that the next node has failed) the node will continue to process the data stream locally but will make no further attempts to transfer data to the next node in the chain. This allows the data transfer to complete on a chain down to the node above a failed node. The top node will emit an error message when this occurs so that a subsequent analysis with other tools may locate the node(s) which failed. This option may only be employed on a node that reads data from an upstream node-connf<WAIT>- Continue on Next Node Failure. Give each node in a hostlist
<WAIT>seconds to join the chain. After that each successive host in the hostlist is given<WAIT>seconds to join, and if none succeed, no data will be sent to any of those hosts. If-connfis not specified or the wait time is set to zero seconds, the program will wait forever for a connection to the first node in each hostlist.-progress<INTERVAL>- If
-v 8is used a status message is emitted every<INTERVAL>bytes transferred. The default value of 10000000 will be too small for a very fast network.
RELATED PROGRAMS
netcat(1).
nettee is derived from Felix Rauch’s dolly which is available here: http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly
The nettee home page is: http://saf.bio.caltech.edu/nettee.html
COPYRIGHTS
Copyright: 2008 David Mathog and Caltech. Copyright: Felix Rauch and ETH Zurich
LICENSE
Freely distributed under the second GNU General Public License (GPL 2).
AUTHOR
David Mathog Biology Division, Caltech
