Parallelism
Knitro offers several features to exploit parallel computations on shared memory multi-processor machines. These features are implemented using OpenMP.
Note
The parallel features offered through Knitro are not available through all interfaces. Check with your modeling language vendor to see if these features are included. The parallel features are included in the AMPL interface, the object-oriented interfaces, and through the callable library. Parallel features are also available through the MATLAB interface, but some may be less efficient in this environment.
Knitro offers the following parallel features:
Parallel Finite-Difference Gradients
As described in Derivatives, if you are unable to provide the exact
first derivatives, Knitro offers the option to approximate
first derivatives using either a forward or central finite-difference approach,
by setting the option gradopt
. Knitro will compute these finite
difference gradient values in parallel if the user specifies that Knitro should
use multiple threads through the option findiff_numthreads
.
This parallel feature only applies to first derivative finite-difference
evaluations. Using more than one thread for the finite-difference gradient evaluation is only
beneficial when concurrent_evals
=1
).
Note
In the Knitro-MATLAB interface, the parallel finite-difference feature is
controlled by the UseParallel MATLAB option, rather than the Knitro
findiff_numthreads
option. See Knitro / MATLAB reference for
more information.
Parallel Multi-Start
The multi-start procedure described in Multi-Start
can run in parallel by setting either numthreads
or
ms_numthreads
to use multiple threads.
When the multi-start procedure is run in parallel, Knitro will produce the same sequence of initial points and solves that you see when running multi-start sequentially (though, perhaps, not in the same order).
Therefore, as long as you run multi-start to completion (ms_terminate
=0)
and use the deterministic option (ms_deterministic
=1),
you should visit
the same initial points encountered when running multi-start sequentially,
and get the same final solution. By default ms_terminate
=0
and ms_deterministic
=1 so that the parallel multi-start produces
the same solution as the sequential multi-start.
However, if ms_deterministic
=0, or ms_terminate
>0,
there is no guarantee that the final solution reported by multi-start will
be the same when run in parallel compared to the solution when run sequentially,
and even the parallel solution may change when run at different times.
Parallel Algorithms
If the user option alg
is set to multi
, then Knitro
will run all four algorithms (see Algorithms). When
numthreads
is set to use multiple threads, the four Knitro
algorithms will run in parallel. The termination of the parallel algorithms
procedure is controlled by the user option ma_terminate
. See
Algorithms for more details on the multi algorithm procedure.
Parallel Mixed-Integer Branch-and-Bound
The branch-and-bound method for mixed-integer programming
is designed to execute in parallel by default. This can result in
significant speedups by solving multiple node subproblems or heuristics in parallel.
The option mip_numthreads
can be used to customize the number
of threads used for the branch-and-bound algorithm (mip_method
=1).
Parallel Tuning
The Knitro-Tuner can help you identify some non-default options settings that may improve performance on a particular model or set of models.
When numthreads
is set to use multiple threads, Knitro will test the Tuner options in parallel.
Parallel Basic Linear Algebra Subroutine (BLAS)
The Knitro algorithms - in particular the interior-point/barrier algorithms - rely heavily on BLAS operations (e.g. dot products of vectors, dense matrix-matrix and matrix-vector products, etc.). For large-scale problems, these operations may often take 35%-50% of the overall solution time, and sometimes more.
These operations can be computed in parallel using multiple threads by
setting the user option blas_numthreads
>1. This option is currently only active when
using the default Intel BLAS (blasoption
=1) provided with Knitro.
Parallel Sparse Linear System Solves
The primary computational cost each iteration in the Knitro interior-point
algorithms is the solution of a linear system of equations. The linsolver
user option specifies the linear system, solver to use. You can
use the multi-threaded Intel MKL PARDISO solver in Knitro by
choosing linsolver
=6. By default the Intel MKL
PARDISO solver will use one thread, however, it can solve linear systems in
parallel by choosing linsolver_numthreads
>1 (in combination with
linsolver
=6). It is also possible to use linsolver_numthreads
>1
with the linear solvers MA86 and MA97.
Note
Generally you should not use BOTH parallel BLAS and a parallel linear solver
as they may conflict with each other. If blas_numthreads
>1
one should set linsolver_numthreads
=1 and vice versa.
Parallel Options
Option |
Meaning |
---|---|
Specifies the max number of threads to use for all parallel features. You can just set this and let Knitro decide how to distribute the threads. |
|
Specifies whether or not to allow concurrent evaluations. |
|
Specifies the number of threads to use for parallel BLAS
(when |
|
Specifies the number of threads to use for finite-difference gradients. |
|
Specifies the number of threads to use for parallel linear
system solves (when |
|
Specifies the number of threads to use for the mixed-
integer branch-and-bound method ( |
|
Specifies the number of threads to use for the parallel multi-start procedure. |
The user option numthreads
is used to determine the number of threads
Knitro can use for all parallel computations. Knitro will decide how to apply the threads.
If numthreads
> 0, then
the number of threads is determined by the value of numthreads
.
If numthreads
= 0, then the number of threads is determined by the
value of the environment variables OMP_NUM_THREADS
. If
numthreads
= 0 and OMP_NUM_THREADS
is not set, then the
number of threads to use will be automatically determined by OpenMP. If
numthreads
< 0, Knitro will automatically determine the number of
threads to use.
Generally, if you are unsure of how best to apply parallel threads in Knitro
you should just set the general option numthreads
to the maximum
number of threads you want Knitro to use, and leave blas_numthreads
,
linsolver_numthreads
, and ms_numthreads
at their default values.
Then Knitro will try to allocate work to these different
threads in the most sensible way. Typically, if you are performing a single solve,
the threads will get applied to the BLAS operations. If, for example,
you are using multi-start then the multi-start solves are run in
parallel but BLAS is sequential (typically applying 2 layers of parallelism is
not good). You should then compare this with using one thread to see if you
achieve any speedup from parallelism. Using parallelism will not always result in a
speedup since sometimes the overhead work associated with using parallelism will
outweigh any improvements achieved through parallel computations inside Knitro.
This may be especially likely on small problems that are solved quickly.
The options blas_numthreads
, findiff_numthreads
, linsolver_numthreads
,
mip_numthreads
and ms_numthreads
allow
the expert user more fine-grained control over parallelism of these specific features.
The user option blas_numthreads
is used to determine the number of threads
Knitro can use for parallel BLAS computations. This option is only active when using
the default Intel BLAS (blasoption
=1).
The domain specific blas_numthreads
,
will override the general thread setting specified by numthreads
for BLAS operations.
The user option findiff_numthreads
is used to determine the number of threads
to use for finite-difference gradient evaluations when gradopt
=2 or
gradopt
=3.
The user option linsolver_numthreads
is used to determine the number of threads
Knitro can use for parallel linear system solves. This option is only active when using
the Intel MKL PARDISO linear solver (linsolver
=6), the HSL MA97 linear solver
(linsolver
=7) and the HSL MA86 linear solver (linsolver
=8).
The domain specific linsolver_numthreads
,
will override the general thread setting specified by numthreads
for linear system solve operations.
The user option mip_numthreads
is used to determine the number of threads
to use for the branch-and-bound algorithm for solving mixed-integer models when
mip_method
=1.
The user option ms_numthreads
is used to determine the number of threads
to use for the multi-start procedure. See Multi-Start for more details.
The user option concurrent_evals
determines whether or not the user
provided callback functions used for function and derivative evaluations can
take place concurrently in parallel (for possibly different values of “x”).
If it is not safe to have concurrent evaluations, then setting
concurrent_evals
=0, will put these evaluations in a critical region
so that only one evaluation can take place at a time. If concurrent_evals
=1
then concurrent evaluations are allowed when Knitro is run in parallel, and it
is the responsibility of the user to ensure that these evaluations are stable.
Preventing concurrent evaluations will decrease the efficiency of the parallel features, particularly when the evaluations are expensive or there are many threads and these evaluations create a bottleneck.
AMPL example
Let us consider again our AMPL example from Section Getting started with AMPL and run it with the parallel multi algorithm procedure. We specify that Knitro should run in parallel with four threads (one for each algorithm):
1 2 3 4 5 | ampl: reset;
ampl: option solver knitroampl;
ampl: option knitro_options "algorithm=5 ma_terminate=0 numthreads=4";
ampl: model testproblem.mod;
ampl: solve;
|
The Knitro log printed to the screen shows the results of each algorithm (one per line):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | =======================================
Commercial License
Artelys Knitro 14.1.0
=======================================
Knitro presolve eliminated 0 variables and 0 constraints.
algorithm: 5
concurrent_evals: 0
datacheck: 0
hessian_no_f: 1
ma_terminate: 0
numthreads: 4
Problem Characteristics ( Presolved)
-----------------------
Objective goal: Minimize
Objective type: quadratic
Number of variables: 3 ( 3)
bounded below only: 3 ( 3)
bounded above only: 0 ( 0)
bounded below and above: 0 ( 0)
fixed: 0 ( 0)
free: 0 ( 0)
Number of constraints: 2 ( 2)
linear equalities: 1 ( 1)
quadratic equalities: 0 ( 0)
gen. nonlinear equalities: 0 ( 0)
linear one-sided inequalities: 0 ( 0)
quadratic one-sided inequalities: 1 ( 1)
gen. nonlinear one-sided inequalities: 0 ( 0)
linear two-sided inequalities: 0 ( 0)
quadratic two-sided inequalities: 0 ( 0)
gen. nonlinear two-sided inequalities: 0 ( 0)
Number of nonzeros in Jacobian: 6 ( 6)
Number of nonzeros in Hessian: 5 ( 5)
Knitro running multiple algorithms in parallel with 4 threads.
Alg Status Objective FeasError OptError Real Time
-------- --------- -------------- ---------- ---------- ----------
2 0 9.360000e+02 0.000e+00 1.945e-07 0.001
3 0 9.360000e+02 0.000e+00 0.000e+00 0.002
1 0 9.360000e+02 0.000e+00 1.250e-09 0.002
4 0 9.360000e+02 0.000e+00 0.000e+00 0.006
Multiple algorithms stopping, all solves have completed.
EXIT: Locally optimal solution found.
Final Statistics
----------------
Final objective value = 9.36000000000000e+02
Final feasibility error (abs / rel) = 0.00e+00 / 0.00e+00
Final optimality error (abs / rel) = 0.00e+00 / 0.00e+00
# of iterations = 21
# of CG iterations = 12
# of function evaluations = 0
# of gradient evaluations = 0
# of Hessian evaluations = 0
Total program time (secs) = 0.00798 ( 0.028 CPU time)
===============================================================================
Knitro 14.1.0: Locally optimal or satisfactory solution.
objective 936; feasibility error 0
21 iterations; 0 function evaluations
|
As can be seen, all four Knitro algorithms solve the problem and find the same
local solution. However, the interior-point conjugate gradient algorithm (alg
=2
)
is the fastest.
C example
As an example, the C example can also be easily modified to enable
parallel multi-algorithms by adding the following lines before the
call to KN_solve()
:
// parallelism
if (KN_set_int_param_by_name (kc, "algorithm", KN_ALG_MULTI) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "ma_terminate", 0) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "numthreads", 4) != 0)
exit( -1 );
Again, running this example we get a Knitro log that looks simlar to what we observed with AMPL.