Output post processor monitoring
For starters quite a time since last post and it isn’t technical post, it is more my thoughts about topic.
Recently I got message from client to start monitor output post processor (OPP) concurrent manager and reason for that was that there were few accidents lately when OPP didn’t pick up request from or java garbage collector hung and failed OPP to operate. Either way there is need for DBA actions to bring services back. Reasons may be various for OPP to fail (I’ll not discuss the reasons) but as usually DBA is third level support there may be some time when You find out that system has problems.
First thing you probably think is, let’s see the log files they always tell the truth. But not this time, for example, if system facing “The Output Post-processor is running but has not picked up this request” error. In this situation actual error will be populated into request log, OPP log will look fine, without a single pattern of failure. OPP log file is useful to check for requests having problems. In stable non-development systems most probably java errors will point out for some templates that can’t proceed with select data, or data is wrongly entered to apply for particular template.
Next what comes in mind is monitor processes from OS side and also do some queries from database tables like FND_CONCURRENT_QUEUES, FND_CONCURRENT_PROCESSES or FND_CP_GSM_OPP_AQTBL. But again there could be situation, for example, where java garbage collector gets OPP java process hung and not working, log file will have entries like:
[GC 5906K->4624K(6784K), 0.0054799 secs] [GC 6597K->5039K(7040K), 0.0072827 secs] [Full GC 5039K->4335K(7040K), 0.1910211 secs]
Actual process on OS will look fine and running, and Concurret -> Manager – > Administer will show that actual process for OPP is ok, until restart button is pressed, but requests will end up completed/warning.
NB!!! OS process monitoring is needed (as part of concurrent manager process/count monitoring) because most cases it will notify about the problem but this post is more detailed to OPP health.
So in both cases we actually found some clues into request log details. Most probably also end-users will report that particular request is not executing well. So actually requests should be monitored only question how? First that comes in mind is to find out and monitor all requests that are supposed to use OPP and, for example, if two requests end up in error or warning in a row, do warning. This could work after some time when system behaviour is adjusted to monitoring but I suppose it will do lots of spam and actual target to monitor OPP health is not so accurate.
Only thing what’s left in my mind is to schedule regularly some very small report what uses OPP and check actual output, is it as it should be, not XML. This will create extra load on database, OPP and defiantly not the best solution as I always think that monitoring shouldn’t create any load on system but for now this is best how I can assure that OPP is healthy and be first noticed about the problems. One more thing is that if concurrent processing is on multiple servers then this test should be run on each node.
Do you have any better ideas?