[PVFS2-users] MPICH2 + PVFS2 + Help needed urgently.
Rob Ross
rross at mcs.anl.gov
Wed Jun 1 16:48:24 EDT 2005
Hi Michael,
This is actually a problem with the handling of subarray datatypes in
MPICH2 (which is my fault). We know about it, and we're working on a
fix. I believe that we'll have a patch for ROMIO and/or a new ROMIO
release available in the next two weeks. I can email you personally
when that is done if you like.
Regards,
Rob
Michael Gauckler wrote:
> Dear Lists,
>
> I am having problems with the performance of MPICH2 and PVFS2.
>
> The program attached below should write 136MB junks of data to a
> 2.7GB file on a pvfs2 mount.
>
> Unfortunately the performance is so poor that my program never
> finishes. PVFS2 performance seems not great but acceptable for
> 136 MB junks to finish soon (122MB/s, see below).
>
> If someone could run a test on his machine and give me estimation of
> the runtime or hints where the problem might be I would be more than
> happy! I need to locate the problem: Code, MPICH2, ROMIO, PVFS2.
>
> Sincereley yours,
> Michael
>
>
> ___
>
> System configuration
>
> 40 Dual Xeon 3.0 GHz, all acting as PVFS2 data servers. GigE Ethernet.
> Software RAID on 2 SCSI disks.
> Debian Sarge: Linux 2.6.8-2-686-smp #1 SMP Mon Jan 24 02:32:52 EST
> 2005 i686 GNU/Linux
> ___
>
> Performance of PVFS2:
>
> mpdrun -np 2 ./mpi-io-test
> # Using mpi-io calls.
> nr_procs = 2, nr_iter = 1, blk_sz = 16777216
> # total_size = 33554432
> # Write: min_t = 0.045768, max_t = 0.274489, mean_t = 0.160128, var_t
> = 0.026157
> # Read: min_t = 0.023897, max_t = 0.038090, mean_t = 0.030993, var_t
> = 0.000101
> Write bandwidth = 122.243300 Mbytes/sec
> Read bandwidth = 880.925184 Mbytes/sec
>
> ___
>
> Command line to run programm given below:
>
> mpdrun -1 -np 2 ./mpicube
> ___
>
> Programm "mpicube.cpp":
>
> #include "mpi.h"
> #include <stdio.h>
> #include <stdexcept>
> #include <stdlib.h>
> #include <sstream>
> #include <iostream>
>
> char filename[] = "pvfs2:/mnt/pvfs2/mpicube_testfile.dat";
>
> // the following lines might not be needed if not linked with the
> boost library
> namespace boost
> {
> void assertion_failed(char const * expr, char const * function,
> char const * file, long line)
> {
> std::ostringstream ss;
> ss << "BOOST_ASSERT failed for expr " << expr << ", function "
> << function << " in file " << file << " at line " << line <<
> std::endl;
> throw std::runtime_error(ss.str());
> }
> }
>
> int main( int argc, char *argv[] )
> {
> int rank;
> int err;
> int worldsize;
> MPI_Offset headerOffset = 0;
> MPI_File fh;
> MPI_Datatype filetype;
> MPI_Datatype datatype = MPI_DOUBLE;
>
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
> printf("Hello world from process %d of %d with filename %s\n",
> rank, worldsize, filename);
>
> int iterations = 10;
> int extent0 = 600;
> int extent1 = 12;
> int extent2 = 10;
> int numSamples = 5000;
> int numSamplesPerBlock = numSamples / worldsize / iterations;
> int numIterConcurrent = 1;
> int numFinalConcurrent = 0;
> int groupColor = 0;
> int current;
>
> int gsizes[4];
> int lsizes[4];
> int indices[4];
>
> gsizes[0] = extent0;
> gsizes[1] = extent1;
> gsizes[2] = extent2;
> gsizes[3] = numSamples;
> lsizes[0] = extent0;
> lsizes[1] = extent1;
> lsizes[2] = extent2;
> lsizes[3] = numSamplesPerBlock;
> indices[0] = 0;
> indices[1] = 0;
> indices[2] = 0;
>
> MPI_Comm groupcomm = MPI_COMM_WORLD;
>
> std::cout << "opening file <" << filename << ">" << std::flush <<
> std::endl;
> MPI_File_open(groupcomm, filename, MPI_MODE_RDWR |
> MPI_MODE_CREATE | MPI_MODE_UNIQUE_OPEN, MPI_INFO_NULL, &fh);
> std::cout << "opened file" << std::flush << std::endl;
>
> // number of elements of type T to be stored
> long long lcubesize = lsizes[0]*lsizes[1]*lsizes[2]*lsizes[3];
> long long gcubesize = gsizes[0]*gsizes[1]*gsizes[2]*gsizes[3];
>
> std::cout << "local cube size * 8 = " << (long long)lcubesize /
> 1024 / 1024 * 8 << " MB " << std::flush << std::endl;
> std::cout << "global cube size * 8 = " << (long long)gcubesize /
> 1024 / 1024 * 8 << " MB " << std::flush << std::endl;
>
> double *cube = new double[extent0 * extent1 * extent2 *
> numSamplesPerBlock];
> for(int j = 0; j < extent0 * extent1 * extent2 *
> numSamplesPerBlock; j++)
> cube[j] = 3.1415;
>
>
> for(int i = 0; i < iterations; i++){
>
> indices[3] = (i + rank*iterations)*numSamplesPerBlock;
>
> std::cout << "iteration = " << i << std::endl;
> std::cout << "indices[3] = " << indices[3] << std::endl;
>
> // create a data type to get desired view of file
> err = MPI_Type_create_subarray(4, gsizes, lsizes, indices,
> MPI_ORDER_C, MPI_DOUBLE, &filetype);
> if (err != MPI_SUCCESS)
> std::cerr << "could not create subarray" << std::endl;
>
> err = MPI_Type_commit(&filetype);
> if (err != MPI_SUCCESS)
> std::cerr << "could not commit datatype" << std::endl;
>
> std::cout << "writeSubCube: setting view" << std::endl;
>
> // store the view into file
> err = MPI_File_set_view(fh, 0, datatype, filetype, "native",
> MPI_INFO_NULL);
> if (err != MPI_SUCCESS)
> std::cerr << "could not set view" << std::endl;
>
> std::cout << "allocating cube" << std::endl;
>
> std::cout << "starting write all" << std::endl;
>
> err = MPI_File_write_all(fh, &cube[0], lcubesize, datatype,
> MPI_STATUS_IGNORE);
>
>
> if (err != MPI_SUCCESS)
> std::cerr << "could not write to file" << std::endl;
>
> std::cout << "done write all" << std::endl;
>
> err = MPI_Type_free(&filetype);
> if (err != MPI_SUCCESS)
> std::cerr << "could not free datatype" << std::endl;
>
> }
>
> MPI_File_close(&fh);
>
> std::cout << "closed file" << std::flush << std::endl;
>
> MPI_Finalize();
> return 0;
> }
>
>
> _______________________________________________
> PVFS2-users mailing list
> PVFS2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
More information about the PVFS2-users
mailing list