Standalone Solver
MPI provides a mechanism to dynamically spawn worker processes via MPI_Comm_spawn
.
By utilising this feature, it is possible to separate the main application from the solver.
The architecture can be illustrated as follows.

The esp
solvers are also available as standalone executables that follow the above design.
Such a form is particularly beneficial if the following holds.
- The main application is not suitable for distributed-memory model due to, for example, latency constraints.
- The main application adopts a different parallelism pattern.
- Sometimes, one may want to have a clear boundary between the main application and the solver(s).
However, such a clear boundary may be a shortcoming as well.
For example, the root process needs to store the whole matrices in memory, which is subject to physical limitations.
And since the spawned processes are not persistent, solving the same system with different right-hand sides may incur unnecessary data communication.
If the matrices need to be read from IO in a distributed manner, one may also need a refined control of how the data is prepared, by probably directly using BLACS
/ScaLAPACK
/MPI
functions.
Backbone
Here, we use the pgesv
solver as the example to illustrate some critical steps used in implementing the above architecture.
We are not using the raw MPI
functions, instead, the mpl
library is used.
It is necessary to get the default communicator and the parent inter communicator.
solver.pgesv.cpp:52:55 |
---|
| #include <mpl/mpl.hpp>
const auto& comm_world{mpl::environment::comm_world()};
const auto& parent = mpl::inter_communicator::parent();
|
Since mpl
handles MPI
environment initialisation and finalisation, it is necessary to tell ezp
to ignore finalising the MPI
environment.
The function ezp::blacs_env::do_not_manage_mpi()
must be called if MPI
is managed by external tools.
solver.pgesv.cpp:83 |
---|
| ezp::blacs_env<int>::do_not_manage_mpi();
|
The parameters of the linear system will be broadcast over by the main application first.
solver.pgesv.cpp:90:98 |
---|
| const auto all = mpl::communicator(parent, mpl::communicator::order_high);
int config[3]{};
all.bcast(0, config);
const auto N = config[0];
const auto NRHS = config[1];
const auto FLOAT = config[2];
|
Knowing the problem sizes N
and NRHS
, the root process can initialise the containers and receive the contents of two matrices.
solver.pgesv.cpp:60:70 |
---|
| if(0 == comm_world.rank()) {
A.resize(N * N);
B.resize(N * NRHS);
mpl::irequest_pool requests;
requests.push(parent.irecv(A, 0, mpl::tag_t{0}));
requests.push(parent.irecv(B, 0, mpl::tag_t{1}));
requests.waitall();
}
|
Solving the system follows the conventional approach, that is, create a solver object and call the solve method with data wrapped.
The solution is sent back to the caller.
solver.pgesv.cpp:72:77 |
---|
| const auto error = ezp::pgesv<DT, IT>().solve({N, N, A.data()}, {N, NRHS, B.data()});
if(0 == comm_world.rank()) {
parent.send(error, 0);
if(0 == error) parent.send(B, 0);
}
|
The Caller
From the above code snippets, one may observe that the caller needs to
- broadcast
config
,
- send
A
and B
,
- receive the error code and the solution
X
if no error occured.
To this end, one can declare the corresponding containers using std::vector
.
runner.cpp:38:39 |
---|
| std::vector<double> A, B(N * NRHS, 1.);
std::vector<int> config;
|
The following creates a diagonal matrix for illustration.
runner.cpp:45:50 |
---|
| solver = "solver.pgesv";
config = {N, NRHS, 1};
A.resize(N * N, 0.);
for(auto I = 0; I < N; ++I) A[I * N + I] = I + 1.;
|
Now since all the data is ready to be communicated, the actual communication is very concise and straightforward.
runner.cpp:95:110 |
---|
| const auto& comm_world{mpl::environment::comm_world()};
const auto worker = comm_world.spawn(0, argc < 3 ? 1 : std::abs(std::stoi(argv[2])), {solver});
const auto all = mpl::communicator(worker, mpl::communicator::order_low);
all.bcast(0, config.data(), mpl::contiguous_layout<int>(config.size()));
mpl::irequest_pool requests;
requests.push(worker.isend(A, 0, mpl::tag_t{0}));
requests.push(worker.isend(B, 0, mpl::tag_t{1}));
requests.waitall();
int error = -1;
worker.recv(error, 0);
if(0 == error) worker.recv(B, 0);
|
Python Caller
Since there is a clear boundary between the main application and the solver, it is possible to use other MPI compatible to send/receive the problem.
For example, one can use mpi4py
to call the solvers.
It has to be pointed out that, the column-major memory layout shall be kept.
runner.py |
---|
| from array import array
import sys
from mpi4py import MPI
import numpy
def run(nprocs: int, N: int, NRHS: int):
comm = MPI.COMM_WORLD
# spawn the solver
worker = comm.Spawn("./solver.pgesv", maxprocs=nprocs)
all = worker.Merge()
# broadcast the problem configuration
all.Bcast(array("i", [N, NRHS, 1]), root=0)
# send the matrices
# ! numpy arrays are row-major by default
# ! column-major order is required by (Sca)LAPACK
# ! thus need to specify the order as "F"
A = numpy.zeros((N, N), dtype=numpy.float64, order="F")
for i in range(N):
A[i, i] = i + 1
print("A:\n", A)
worker.Send(A, dest=0, tag=0)
B = numpy.ones((N, NRHS), dtype=numpy.float64, order="F")
for i in range(NRHS):
B[:, i] = i + 1
print("B:\n", B)
worker.Send(B, dest=0, tag=1)
# receive the error code and the solution if no error
error = array("i", [-1])
worker.Recv(error)
if 0 == error[0]:
X = numpy.ones((N, NRHS), dtype=numpy.float64, order="F")
worker.Recv(X)
print("Solution:\n", X)
worker.Disconnect()
all.Disconnect()
if __name__ == "__main__":
if len(sys.argv) < 4:
print("Usage: runner.py <nprocs> <N> <NRHS>")
sys.exit(1)
nprocs = int(sys.argv[1])
N = int(sys.argv[2])
NRHS = int(sys.argv[3])
run(nprocs, N, NRHS)
|
Calling the above script with
yields the following is the solution, which is correct.
Text Only |
---|
| A:
[[1. 0. 0. 0. 0. 0.]
[0. 2. 0. 0. 0. 0.]
[0. 0. 3. 0. 0. 0.]
[0. 0. 0. 4. 0. 0.]
[0. 0. 0. 0. 5. 0.]
[0. 0. 0. 0. 0. 6.]]
B:
[[1. 2.]
[1. 2.]
[1. 2.]
[1. 2.]
[1. 2.]
[1. 2.]]
Solution:
[[1. 2. ]
[0.5 1. ]
[0.33333333 0.66666667]
[0.25 0.5 ]
[0.2 0.4 ]
[0.16666667 0.33333333]]
|
Full Reference Implementation
The following is a full reference implementation of a standalone solver and the corresponding caller logic.
runner.cpp
runner.cpp |
---|
| #include <iostream>
#include <mpl/mpl.hpp>
int main(int argc, char** argv) {
if(argc < 2) {
std::cout << "Usage: runner ge|po|gb|db|pb [n]\n";
std::cout << "Example: runner ge 3\n";
return 0;
}
constexpr auto N = 6, NRHS = 1;
std::vector<double> A, B(N * NRHS, 1.);
std::vector<int> config;
const auto type = std::string(argv[1]);
std::string solver;
if("ge" == type) {
solver = "solver.pgesv";
config = {N, NRHS, 1};
A.resize(N * N, 0.);
for(auto I = 0; I < N; ++I) A[I * N + I] = I + 1.;
}
else if("po" == type) {
solver = "solver.pposv";
config = {N, NRHS, 1};
A.resize(N * N, 0.);
for(auto I = 0; I < N; ++I) A[I * N + I] = I + 1.;
}
else if("gb" == type) {
solver = "solver.pgbsv";
constexpr auto KL = 1, KU = 1;
config = {N, KL, KU, NRHS, 1};
A.resize(N * (KL + KU + 1), 0.);
for(auto I = 0; I < N; ++I) A[KU + I * (KL + KU + 1)] = I + 1.;
}
else if("db" == type) {
solver = "solver.pdbsv";
constexpr auto KL = 1, KU = 1;
config = {N, KL, KU, NRHS, 1};
A.resize(N * (KL + KU + 1), 0.);
for(auto I = 0; I < N; ++I) A[KU + I * (KL + KU + 1)] = I + 1.;
}
else if("pb" == type) {
solver = "solver.ppbsv";
constexpr auto KLU = 1;
config = {N, KLU, NRHS, 1};
A.resize(N * (KLU + 1), 0.);
for(auto I = 0; I < N; ++I) A[I + I * KLU] = I + 1.;
}
else {
std::cout << "Usage: runner ge|po|gb|db|pb [n]\n";
return 0;
}
const auto& comm_world{mpl::environment::comm_world()};
const auto worker = comm_world.spawn(0, argc < 3 ? 1 : std::abs(std::stoi(argv[2])), {solver});
const auto all = mpl::communicator(worker, mpl::communicator::order_low);
all.bcast(0, config.data(), mpl::contiguous_layout<int>(config.size()));
mpl::irequest_pool requests;
requests.push(worker.isend(A, 0, mpl::tag_t{0}));
requests.push(worker.isend(B, 0, mpl::tag_t{1}));
requests.waitall();
int error = -1;
worker.recv(error, 0);
if(0 == error) worker.recv(B, 0);
for(auto I = 0; I < N; ++I) printf("x[%d] = %+.6f\n", I, B[I]);
return 0;
}
|
solver.pgesv.cpp
solver.pgesv.cpp |
---|
| #include <ezp/pgesv.hpp>
#include <mpl/mpl.hpp>
const auto& comm_world{mpl::environment::comm_world()};
const auto& parent = mpl::inter_communicator::parent();
template<ezp::data_t DT, ezp::index_t IT> int run(const int N, const int NRHS) {
std::vector<DT> A, B;
if(0 == comm_world.rank()) {
A.resize(N * N);
B.resize(N * NRHS);
mpl::irequest_pool requests;
requests.push(parent.irecv(A, 0, mpl::tag_t{0}));
requests.push(parent.irecv(B, 0, mpl::tag_t{1}));
requests.waitall();
}
const auto error = ezp::pgesv<DT, IT>().solve({N, N, A.data()}, {N, NRHS, B.data()});
if(0 == comm_world.rank()) {
parent.send(error, 0);
if(0 == error) parent.send(B, 0);
}
return 0;
}
int main(int, char**) {
ezp::blacs_env<int>::do_not_manage_mpi();
if(!parent.is_valid()) {
printf("This program must be invoked by the host application.\n");
return 0;
}
const auto all = mpl::communicator(parent, mpl::communicator::order_high);
int config[3]{};
all.bcast(0, config);
const auto N = config[0];
const auto NRHS = config[1];
const auto FLOAT = config[2];
if(FLOAT >= 10) return run<complex16, int_t>(N, NRHS);
if(FLOAT >= 0) return run<double, int_t>(N, NRHS);
if(FLOAT > -10) return run<float, int_t>(N, NRHS);
return run<complex8, int_t>(N, NRHS);
}
|