Lately, we've run a lot more 256-and-up processor jobs. Jobs this size
test the limits of many aspects of conventional programs, and perfectly
good 32 node programs don't work anymore. Above are some experiences and
hints for dealing with things that won't scale; let print statements,
file access, tips for scripts, etc.
- Printing to the screen - stdout and stderr limits
- NFS and other I/O limits (File Access)
**Added 7/2/03
- Some hints for scripting
(EDITING - 7/1/03...)
Printing to the screen - stdout and stderr limits
One of the advantages, and disadvantages, of our bproc-based clusters, is the
standard input, output, and error connections to a process on any node are forwarded
back to the cluster head.
The good thing about this is, from any programming language, you can use any standard
print or write statement, and the output will automatically appear on your screen.
Or, you can redirect this output to a file, and have one log for all the output
of all your tasks in one place. This makes debugging a lot simpler, particularly
with smaller programs.
The downside of this approach is that it really isn't scalable to larger problems.
Once hundreds of tasks start vying to print on your terminal at the same time,
the system runs out of buffering capability.
Experimentally, I've been able to determine that the breaking point happens somewhere
around 200 tasks trying to send 100 lines of output in quick sucession. That's
20,000 lines of output produced in less than a second -- probably more than
you will ever read! Tests of 150 tasks with 100 lines have run successfully.
I *think* we've seen problems with fewer tasks (around 90) doing lots of output,
but it's sporadic. Most things limited to a few hundred or a few thousand
lines of output will do OK.
I should add that these problems can vary with timing; even the 20,000 line
case will work about one out of every three times.
Symptoms:
Well, your program will crash. Usually, one task
will fail initially with some sort of "p4: net recv failed" error, and
this will create a large cascade of other messsages, such as:
"p4 error: timeout on socket".
Solutions:
Don't have every task print stuff out unless you absolutely have to!
Use a block like if(myrank==0) print(whatever) to limit the amount of output.
If you actually need some output from every task, use MPI, which is built to be
scalable. Have tasks send messages with their output to be aggregated on one
(or a small set) of tasks, and have only those tasks print out all the information.
Huge amounts can be printed, as long as only a few tasks are vying for the resources
simultaneously.
Alternatively, you may wish to use conditional compilation. Many people leave
lots of print statements in there programs for debugging.
NFS and other I/O limits
Printing to the screen - stdout and stderr limits