Syntax:
fix ID group-ID balance Nfreq dimstr Niter thresh keyword value ...
out arg = filename filename = output file to write each processor's sub-domain to
Examples:
fix 2 all balance 1000 x 10 1.05 fix 2 all balance 0 xy 20 1.1 out tmp.balance
Description:
This command adjusts the size of processor sub-domains within the simulation box dynamically as a simulation runs, to attempt to balance the number of particles and thus the computational cost (load) evenly across processors. The load balancing is "dynamic" in the sense that rebalancing is performed periodically during the simulation. To perform "static" balancing, before of between runs, see the balance command.
Load-balancing is only useful if the particles in the simulation box have a spatially-varying density distribution. E.g. a model of a vapor/liquid interface, or a solid with an irregular-shaped geometry containing void regions. In this case, the LAMMPS default of dividing the simulation box volume into a regular-spaced grid of processor sub-domain, with one equal-volume sub-domain per procesor, may assign very different numbers of particles per processor. This can lead to poor performance in a scalability sense, when the simulation is run in parallel.
Note that the processors command gives you some control over how the box volume is split across processors. Specifically, for a Px by Py by Pz grid of processors, it lets you choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz = P, the total number of processors. This can be sufficient to achieve good load-balance for some models on some processor counts. However, all the processor sub-domains will still be the same shape and have the same volume.
This command does not alter the topology of the Px by Py by Pz grid or processors. But it shifts the cutting planes between processors (in 3d, or lines in 2d), which adjusts the volume (area in 2d) assigned to each processor, as in the following 2d diagram. The left diagram is the default partitioning of the simulation box across processors (one sub-box for each of 16 processors); the right diagram is after balancing.
 
IMPORTANT NOTE: This command attempts to minimize the imbalance factor, as defined above. But because of the topology constraint that only the cutting planes (lines) between processors are moved, there are many irregular distributions of particles, where this factor cannot be shrunk to 1.0, particuarly in 3d. Also, computational cost is not strictly proportional to particle count, and changing the relative size and shape of processor sub-domains may lead to additional computational and communication overheads, e.g. in the PPPM solver used via the kspace_style command. Thus you should benchmark the run times of your simulation with and without balancing.
The group-ID is currently ignored. In the future it may be used to determine what particles are considered for balancing. Normally it would only makes sense to use the all group. But in some cases it may be useful to balance on a subset of the particles, e.g. when modeling large nanoparticles in a background of small solvent particles.
The Nfreq setting determines how often a rebalance is performed. If Nfreq > 0, then rebalancing will occur every Nfreq steps. Each time a rebalance occurs, a reneighboring is triggered, so you should not make Nfreq too small. If Nfreq = 0, then rebalancing will be done every time reneighboring normally occurs, as determined by the the neighbor and neigh_modify command settings.
On rebalance steps, rebalancing will only be attempted if the current imbalance factor, as defined above, exceeds the thresh setting.
The dimstr argument is a string of characters, each of which must be an "x" or "y" or "z". Eacn character can appear zero or one time, since there is no advantage to balancing on a dimension more than once. You should normally only list dimensions where you expect there to be a density variation in the particles.
Balancing proceeds by adjusting the cutting planes in each of the dimensions listed in dimstr, one dimension at a time. For a single dimension, the balancing operation (described below) is iterated on up to Niter times. After each dimension finishes, the imbalance factor is re-computed, and the balancing operation halts if the thresh criterion is met.
A rebalance operation in a single dimension is performed using a density-dependent recursive multisectioning algorithm, where the position of each cutting plane (line in 2d) in the dimension is adjusted independently. This is similar to a recursive bisectioning (RCB) for a single value, except that the bounds used for each bisectioning take advantage of information from neighboring cuts if possible, as well as counts of particles at the bounds on either side of each cuts, which themselves were cuts in previous iterations. The latter is used to infer a density of pariticles near each of the current cuts. At each iteration, the count of particles on either side of each plane is tallied. If the counts do not match the target value for the plane, the position of the cut is adjusted based on the local density. The low and high bounds are adjusted on each iteration, using new count information, so that they become closer together over time. Thus as the recustion progresses, the count of particles on either side of the plane gets closer to the target value.
The density-dependent part of this algorithm is often an advantage when you rebalance a system that is already nearly balanced. It typically converges more quickly than the geometric bisectioning algorithm used by the balance command. However, if can be a disadvants if you attempt to rebalance a system that is far from balanced, and converge more slowly. In this case you probably want to use the balance command before starting a run, so that you begin the run with a balanced system.
Once the rebalancing is complete and final processor sub-domains assigned, particles migrate to their new owning processor as part of the normal reneighboring procedure.
IMPORTANT NOTE: At each rebalance operation, the RCB operation for each cutting plane (line in 2d) typcially starts with low and high bounds separated by the extent of a processor's sub-domain in one dimension. The size of this bracketing region shrinks based on the local density, as described above, which should typically be 1/2 or more every iteration. Thus if Niter is specified as 10, the cutting plane will typically be positioned to better than 1 part in 1000 accuracy (relative to the perfect target position). For Niter = 20, it will be accurate to better than 1 part in a million. Thus there is no need to set Niter to a large value. This is especially true if you are rebalancing often enough that each time you expect only an incremental adjustement in the cutting planes is necessary. LAMMPS will check if the threshold accuracy is reached (in a dimension) is less iterations than Niter and exit early.
IMPORTANT NOTE: If a portion of your system is a perfect lattice, e.g. a frozen substrate, then the balancer may be unable to achieve exact balance. I.e. entire lattice planes will be owned or not owned by a single processor. So you you should not expect to achieve perfect balance in this case. Nor will it be helpful to use a large value for Niter, since it will simply cause the balancer to iterate until Niter is reached, without improving the imbalance factor.
The out keyword writes a text file to the specified filename with the results of each rebalancing operation. The file contains the bounds of the sub-domain for each processor after the balancing operation completes. The format of the file is compatible with the Pizza.py mdump tool which has support for manipulating and visualizing mesh files. An example is shown here for a balancing by 4 processors for a 2d problem:
ITEM: TIMESTEP 1000 ITEM: NUMBER OF SQUARES 4 ITEM: SQUARES 1 1 1 2 7 6 2 2 2 3 8 7 3 3 3 4 9 8 4 4 4 5 10 9 ITEM: TIMESTEP 1000 ITEM: NUMBER OF NODES 10 ITEM: BOX BOUNDS -153.919 184.703 0 15.3919 -0.769595 0.769595 ITEM: NODES 1 1 -153.919 0 0 2 1 7.45545 0 0 3 1 14.7305 0 0 4 1 22.667 0 0 5 1 184.703 0 0 6 1 -153.919 15.3919 0 7 1 7.45545 15.3919 0 8 1 14.7305 15.3919 0 9 1 22.667 15.3919 0 10 1 184.703 15.3919 0
The "SQUARES" lists the node IDs of the 4 vertices in a rectangle for each processor (1 to 4). The first SQUARE 1 (for processor 0) is a rectangle of type 1 (equal to SQUARE ID) and contains vertices 1,2,7,6. The coordinates of all the vertices are listed in the NODES section. Note that the 4 sub-domains share vertices, so there are only 10 unique vertices in total.
For a 3d problem, the syntax is similar with "SQUARES" replaced by "CUBES", and 8 vertices listed for each processor, instead of 4.
Each time rebalancing is performed a new timestamp is written with new NODES values. The SQUARES of CUBES sections are not repeated, since they do not change.
Restart, fix_modify, output, run start/stop, minimize info:
No information about this fix is written to binary restart files. None of the fix_modify options are relevant to this fix.
This fix computes a global scalar which is the imbalance factor after the most recent rebalance and a global vector of length 3 with additional information about the most recent rebalancing. The 3 values in the vector are as follows:
As explained above, the imbalance factor is the ratio of the maximum number of particles on any processor to the average number of particles per processor.
These quantities can be accessed by various output commands. The scalar and vector values calculated by this fix are "intensive".
No parameter of this fix can be used with the start/stop keywords of the run command. This fix is not invoked during energy minimization.
Restrictions: none
Related commands:
Default: none