miércoles, 7 de noviembre de 2012

[OvO]Reached max waiting intervals for policy (OpC30-3405).No opcmon value received and reached max waiting intervals (OpC30-3404)


En la consola de ovo tenia un enorme chorro de Warnings con el mensaje de error


Error: Reached max waiting intervals for policy <policy name> (OpC30-3405).No opcmon value received and reached max waiting intervals (OpC30-3404)


El problema reside en una larga ejecución de políticas o bien timeouts por la larga espera.

Todo esto y mucho más lo podemos solucionar editando y añadiendo unas lineas a un archivo. Abrimos el símbolo del sistema en el nodo afectado con permisos de Administrador y ejecutamos el comando ovconfchg -edit. Se abrirá un fichero en el bloc de notas. Aquí, bajo el namespace eaagt tenemos que añadir las siguientes entradas:


OPC_KILL_SCHEDULE=true
OPC_KILL_SCHEDULE_TIMEOUT=-1
FAILED_COLLECTION_RETRIES=-1
FAILED_POLICY_TIME_TO_REACTIVATE=-1
MAX_RETRIES_UNTIL_POLICY_FAILED=-1
POLICY_MIN_INTERVALS_WAIT=-1
POLICY_MIN_TIME_WAIT=-1



La traducción para cada una de estas entradas se explican a continuación

# ovconfchg -ns eaagt -set OPC_KILL_SCHEDULE true
Sample ovconfpar syntax ( from mgt server against target node)
# ovconfpar -change -host <hostname> -ns eaagt -set OPC_KILL_SCHEDULE true
For the above commands run it for each parameter.
The sample syntax to clear it is # ovconfchg -ns eaagt -clear OPC_KILL_SCHEDULE
-- The rest of this section contains a detailed explanation of the parameters.
Keyword     : OPC_KILL_SCHEDULE
Flags       : external
Description : When a new request to start a process for a scheduled action arrives at the action agent, it first checks whether a process already started from the same policy is still running.
              IF YES
              THEN
                 It checks whether the process has already run longer than the configurable timeout (default 55 seconds).
                 IF YES
                 THEN
                   The old process is killed. A warning is written to the opcerror log. No end or failed message is sent to the management server. The new process is started.
                 ELSE
                   The new process is not started. A warning is written to the opcerror log. No start message is sent to the management server.
                 ENDIF
              ELSE
                 The new process is started.
              ENDIF
              OPC_KILL_SCHEDULE can be used to disable the new functionality.
namespace: eaagt (>=OVO8)
Keyword     : OPC_KILL_SCHEDULE_TIMEOUT
Flags       : external
Description : Defines the timeout period that is used to check whether an old process is killed or the new one not started.
              (See also OPC_KILL_SCHEDULE)
              It can be set to
                 0  Default of 55 seconds is used.
                >0  Number is used as the timeout period in seconds.
                <0  No timeout check is done, the old process is killed
                    immediately.
Type/Unit   : int
Default     : 55
namespace: eaagt (>=OVO8)
Keyword     : FAILED_COLLECTION_RETRIES
Flags       : external
Description : Specifies whether startup of a failed collection should be restarted for an Advanced Monitor Policy.
              0  : No retries should be done.
              -1 : This failure is ignored - the policy does not go into failed state (This is like it was with agents before A.07.26).
Type/Unit   : int
Default     : 3
namespace: eaagt (>=OVO8)
Keyword     : FAILED_POLICY_TIME_TO_REACTIVATE
Flags       : external
Description : You can automatically attempt to reactivate a policy in the failed state.
              The time before reactivation is attempted can be specified with this variable.
              The time is specified in hours. Use 0, if no reactivation should be done.
Type/Unit   : int
Default     : 24
namespace: eaagt (>=OVO8)
Keyword     : MAX_RETRIES_UNTIL_POLICY_FAILED
Flags       : external
Description : This is the number of attempt that a policy makes to
              collect data.
              This is important for usage with external program sources. If an external program has a problem, policy handling should not be stopped immediately. Therefore, external  collection is stopped and restarted a number of times.
              This variable specifies how often the collection is started.  Use 1 if no retries should be made.
Type/Unit   : int
Default     : 3
namespace: eaagt (>=OVO8)
Keyword     : POLICY_MIN_TIME_WAIT
Flags       : external
Description : Minimum time to wait before stopping a policy if it does not receive any data. The time is specified in minutes.
              Important for program sources where the execution time of an external program depends on the current system performance.
              If the system is very busy, it is possible that the execution takes longer than the configured interval. Reconfiguring the time interval that the monitor agent waits for external programs to finish can be helpful.
              To specify the number of intervals, see POLICY_MIN_INTERVALS_WAIT.
Type/Unit   : int
Default     : 2
namespace: eaagt (>=OVO8)
Keyword     : POLICY_MIN_INTERVALS_WAIT
Flags       : external
Description : Minimum number of wait intervals before stopping a policy if it does not receive any data.
              Important for program sources where the execution time of an external program depends on the current system performance.
              If the system is very busy, it is possible that the execution takes longer than the configured interval. Reconfiguring the  time interval that the monitor agent waits for the external programs to finish can be helpful.
              Use -1 if POLICY_MIN_TIME_WAIT should be used.
              Use  0 if the policy should not wait.
Type/Unit   : int
Default     : -1
namespace: eaagt (>=OVO8)


Esta es una solución que a muchos puede irle bien a y otros simplemente podrán suavizar la avalancha de alertas de este tipo. Si de verdad no deseas recibir ninguna de estas alertas, solo tienes que añadir al montón la siguiente entrada OPCMONA_ERRORMSG_ONLY_OPCERROR=true.

CUIDADO!: Añadiendo esta ultima entrada estarás suprimiendo todos los mensajes de OpC30-3400 a
OpC30-3409

No hay comentarios:

Publicar un comentario