New Software; Summer 2003

Lately, we've run a lot more 256-and-up processor jobs.   Jobs this size
test the limits of many aspects of conventional programs, and perfectly
good 32 node programs don't work anymore.  Above are some experiences and
hints for dealing with things that won't scale; let print statements,
file access, tips for scripts, etc.
(EDITING - 7/1/03...)

Printing to the screen - stdout and stderr limits

One of the advantages, and disadvantages, of our bproc-based clusters, is the standard input, output, and error connections to a process on any node are forwarded back to the cluster head. The good thing about this is, from any programming language, you can use any standard print or write statement, and the output will automatically appear on your screen. Or, you can redirect this output to a file, and have one log for all the output of all your tasks in one place. This makes debugging a lot simpler, particularly with smaller programs. The downside of this approach is that it really isn't scalable to larger problems. Once hundreds of tasks start vying to print on your terminal at the same time, the system runs out of buffering capability. Experimentally, I've been able to determine that the breaking point happens somewhere around 200 tasks trying to send 100 lines of output in quick sucession. That's 20,000 lines of output produced in less than a second -- probably more than you will ever read! Tests of 150 tasks with 100 lines have run successfully. I *think* we've seen problems with fewer tasks (around 90) doing lots of output, but it's sporadic. Most things limited to a few hundred or a few thousand lines of output will do OK. I should add that these problems can vary with timing; even the 20,000 line case will work about one out of every three times.

Symptoms:

Well, your program will crash. Usually, one task will fail initially with some sort of "p4: net recv failed" error, and this will create a large cascade of other messsages, such as: "p4 error: timeout on socket".

Solutions:

Don't have every task print stuff out unless you absolutely have to! Use a block like if(myrank==0) print(whatever) to limit the amount of output. If you actually need some output from every task, use MPI, which is built to be scalable. Have tasks send messages with their output to be aggregated on one (or a small set) of tasks, and have only those tasks print out all the information. Huge amounts can be printed, as long as only a few tasks are vying for the resources simultaneously. Alternatively, you may wish to use conditional compilation. Many people leave lots of print statements in there programs for debugging.

NFS and other I/O limits

Printing to the screen - stdout and stderr limits

Mini-Grid Homepage