java.lang.Object
java.lang.Thread
org.apache.uima.collection.impl.cpm.engine.CPMEngine
All Implemented Interfaces:
Runnable

public class CPMEngine extends Thread
Responsible for creating and initializing processing threads. This instance manages the lifecycle of the CPE components. It exposes API for plugging in components programmatically instead of declaratively. Running in its own thread, this components creates seperate Processing Pipelines for Analysis Engines and Cas Consumers, launches configured CollectionReader and attaches all of those components to form a pipeline from source to sink. The Collection Reader feeds Processing Threads containing Analysis Engines, and Analysis Engines feed results of analysis to Cas Consumers.
  • Field Details

    • MAX_WAIT_ON_QUEUE

      private static final int MAX_WAIT_ON_QUEUE
      See Also:
    • CAS_PROCESSED_MSG

      private static final int CAS_PROCESSED_MSG
      See Also:
    • SINGLE_THREADED_MODE

      private static final String SINGLE_THREADED_MODE
      See Also:
    • casPool

      public CPECasPool casPool
    • lockForPause

      public final Object lockForPause
    • collectionReader

      private BaseCollectionReader collectionReader
    • pause

      protected boolean pause
    • isRunning

      protected volatile boolean isRunning
    • stopped

      protected volatile boolean stopped
    • killed

      protected volatile boolean killed
    • pauseOnException

      private boolean pauseOnException
    • annotatorList

      private LinkedList annotatorList
    • annotatorDeployList

      private LinkedList annotatorDeployList
    • consumerList

      private LinkedList consumerList
    • consumerDeployList

      private LinkedList consumerDeployList
    • numToProcess

      private long numToProcess
    • poolSize

      private int poolSize
    • procTr

      private ProcessTrace procTr
    • stats

      private Map stats
    • statusCbL

      private ArrayList statusCbL
    • readerFetchSize

      private int readerFetchSize
    • inputQueueSize

      private int inputQueueSize
    • outputQueueSize

      private int outputQueueSize
    • concurrentThreadCount

      private int concurrentThreadCount
    • analysisEngines

      private Hashtable analysisEngines
    • consumers

      private Hashtable consumers
    • casprocessorList

      private CasProcessor[] casprocessorList
    • producer

      private ArtifactProducer producer
    • cpeFactory

      private CPEFactory cpeFactory
    • processingUnits

      protected ProcessingUnit[] processingUnits
    • casConsumerPU

      private ProcessingUnit casConsumerPU
    • outputQueue

      protected BoundedWorkQueue outputQueue
    • workQueue

      protected BoundedWorkQueue workQueue
    • checkpointData

      private CheckpointData checkpointData
    • mixedCasProcessorTypeSupport

      private boolean mixedCasProcessorTypeSupport
    • mPerformanceTuningSettings

      private Properties mPerformanceTuningSettings
    • dbgCtrlThread

      private DebugControlThread dbgCtrlThread
    • pca

    • activeProcessingUnits

      private int activeProcessingUnits
    • hardKill

      private boolean hardKill
    • skippedDocs

      private Hashtable skippedDocs
    • definedCapabilities

      private Capability[] definedCapabilities
    • needsTCas

      private boolean needsTCas
    • crFetchTime

      private long crFetchTime
    • readerState

      private int readerState
    • dropCasOnExceptionPolicy

      private boolean dropCasOnExceptionPolicy
    • singleThreadedCPE

      private boolean singleThreadedCPE
    • nonThreadedProcessingUnit

      private NonThreadedProcessingUnit nonThreadedProcessingUnit
    • nonThreadedCasConsumerProcessingUnit

      private NonThreadedProcessingUnit nonThreadedCasConsumerProcessingUnit
    • initial_cp_list

      private LinkedList initial_cp_list
    • casProcessorsDeployed

      private boolean casProcessorsDeployed
    • consumerThreadStarted

      private boolean consumerThreadStarted
    • readerThreadStarted

      private boolean readerThreadStarted
    • processingThreadsState

      private int[] processingThreadsState
  • Constructor Details

    • CPMEngine

      public CPMEngine(CPMThreadGroup aThreadGroup, CPEFactory aCpeFactory, ProcessTrace aProcTr, CheckpointData aCheckpointData) throws Exception
      Initializes Collection Processing Engine. Assigns this thread and all processing threads created by this component to a common Thread Group.
      Parameters:
      aThreadGroup - - contains all CPM related threads
      aCpeFactory - - CPE factory object responsible for parsing cpe descriptor and creating components
      aProcTr - - instance of the ProcessTrace where the CPM accumulates stats
      aCheckpointData - - checkpoint object facillitating restart from the last known point
      Throws:
      Exception
  • Method Details

    • getProcessingContainers

      public LinkedList getProcessingContainers()
      Returns a list of Processing Containers for Analysis Engines. Each CasProcessor is managed by its own container.
    • getAllProcessingContainers

      public LinkedList getAllProcessingContainers()
      Returns a list of All Processing Containers. Each CasProcessor is managed by its own container.
    • getThreadCount

      public int getThreadCount() throws ResourceConfigurationException
      Returns number of processing threads
      Returns:
      - number of processing threads
      Throws:
      ResourceConfigurationException - -
    • setStats

      public void setStats(Map aMap)
      Plugs in a map where the engine stores perfomance info at runtime
      Parameters:
      aMap - - map for runtime stats and totals
    • getStats

      public Map getStats()
      Returns CPE stats
      Returns:
      Map containing CPE stats
    • setPauseOnException

      public void setPauseOnException(boolean aPause)
      Sets a global flag to indicate to the CPM that it should pause whenever exception occurs
      Parameters:
      aPause - - true if pause is requested on exception, false otherwise
    • isPauseOnException

      public boolean isPauseOnException()
      Returns if the CPM should pause when exception occurs
      Returns:
      - true if the CPM pauses when exception occurs, false otherwise
    • setInputQueueSize

      public void setInputQueueSize(int aInputQueueSize)
      Defines the size of inputQueue. The queue stores this many entities read from the CollectionReader. Every processing pipeline thread will read its entities from this input queue. The CollectionReader is decoupled from the consumer of entities, and continuously replenishes the input queue.
      Parameters:
      aInputQueueSize - the size of the batch.
    • setOutputQueueSize

      public void setOutputQueueSize(int aOutputQueueSize)
      Defines the size of outputQueue. The queue stores this many entities enqueued by every processing pipeline thread.The results of analysis are dumped into this queue for consumer thread to consume its contents.
      Parameters:
      aOutputQueueSize - the size of the batch.
    • setPoolSize

      public void setPoolSize(int aPoolSize)
      Defines the size of Cas Pool.
      Parameters:
      aPoolSize - the size of the Cas pool.
    • getPoolSize

      public int getPoolSize()
    • setConcurrentThreadSize

      public void setConcurrentThreadSize(int aConcurrentThreadSize)
      Defines number of threads executing the processing pipeline concurrently.
      Parameters:
      aConcurrentThreadSize - the size of the batch.
    • addStatusCallbackListener

      public void addStatusCallbackListener(BaseStatusCallbackListener aListener)
    • getCallbackListeners

      public ArrayList getCallbackListeners()
      Returns a list of ALL callback listeners currently registered with the CPM
      Returns:
      -
    • removeStatusCallbackListener

      public void removeStatusCallbackListener(BaseStatusCallbackListener aListener)
      Unregisters given listener from the CPM
      Parameters:
      aListener - - instance of BaseStatusCallbackListener to unregister
    • isKilled

      public boolean isKilled()
      Returns true if this engine has been killed
      Returns:
      true if this engine has been killed
    • dumpState

      private void dumpState()
      Dumps some internal state of the CPE. Used for debugging.
    • killIt

      public void killIt()
      Kill CPM the hard way. None of the entities in the queues will be processed. This methof simply empties all queues and at the end adds EOFToken to the work queue so that all threads go away.
    • isHardKilled

      public boolean isHardKilled()
      Returns if the CPE was killed hard. Soft kill allows the CPE to finish processing all in-transit CASes. Hard kill causes the CPM to stop processing and to throw away all unprocessed CASes from its queues.
      Returns:
      true if the CPE was killed hard
    • asynchStop

      @Deprecated public void asynchStop()
      Deprecated.
    • stopIt

      public void stopIt()
      Stops execution of the Processing Pipeline and this thread.
    • getIndexInList

      private int getIndexInList(List aDeployList, String aName)
      Returns index to a CasProcessor with a given name in a given List
      Parameters:
      aDeployList - - List of CasConsumers to be searched
      aName - - name of the CasConsumer we want to find
      Returns:
      0 - if a CasConsumer is not found in a list, else returns a position in the list where the CasConsumer can found
    • getPositionInListIfExists

      private int getPositionInListIfExists(String aName, List aList)
      Find the position in the list of the Cas Processor with a given name
      Parameters:
      aName -
      aList -
      Returns:
      the position in the list of the Cas Processor with a given name
    • isMultipleDeploymentAllowed

      private boolean isMultipleDeploymentAllowed(String aDescPath, String aCpName, boolean isConsumer) throws Exception
      Parses Cas Processor descriptor and checks if it is parallelizable.
      Parameters:
      aDescPath - - fully qualified path to a CP descriptor
      aCpName - - name of the CP
      isConsumer - - true if the CP is a Cas Consumer, false otherwise
      Returns:
      - true if CP is parallelizable, false otherwise
      Throws:
      Exception - -
    • isParallizable

      public boolean isParallizable(CasProcessor aProcessor, String aCpName) throws Exception
      Determines if a given Cas Processor is parallelizable. Remote Cas Processors are by default parallelizable. For integrated and managed the CPM consults Cas Processor's descriptor to determine if it is parallelizable.
      Parameters:
      aProcessor - - Cas Processor being checked
      aCpName - - name of the CP
      Returns:
      - true if CP is parallelizable, false otherwise
      Throws:
      Exception - -
    • addCasConsumer

      private void addCasConsumer(CasProcessor aProcessor, String aCpName) throws Exception
      Adds Cas Processor to a single-threaded pipeline. This pipeline is fed by the output queue and typicall contains Cas Consumers. AEs can alos be part of this pipeline.
      Parameters:
      aProcessor - - Cas Processor to add to single-threaded pipeline
      aCpName - - name of the Cas Processor
      Throws:
      Exception - -
    • addParallizableCasProcessor

      private void addParallizableCasProcessor(CasProcessor aProcessor, String aCpName) throws Exception
      Add Cas Processor to a list of CPs that are to be run in the parallelizable pipeline. The fact that the CP is in parallelizable pipeline does not mean that there will be instance per pipeline of CP. Its allowed to have a single instance, shareable CP running in multi-threaded pipeline.
      Parameters:
      aProcessor - - CP to add to parallelizable pipeline
      aCpName - - name of the CP
      Throws:
      Exception - -
    • classifyCasProcessors

      private void classifyCasProcessors() throws Exception
      Classify based on Cas Processor capability to run in parallel. Some Cas Processors need to run as single instance only. It scans the list of Cas Processors backwords and moves those Cas Processors that are not parallelizable to a separate single-threade pipeline. This process of moving CPs continues until the first parallelizable Cas Processor is found. Beyond this all Cas Processors are moved to a parallelizable pipeline. If the non-parallelizable CP is in the parallelizable pipeline there simply will be a single instance of it that will be shared by all processing threads.
      Throws:
      Exception - -
    • addCasProcessor

      public void addCasProcessor(CasProcessor aCasProcessor) throws ResourceConfigurationException
      Adds a CASProcessor to the processing pipeline. If a CasProcessor already exists and its status=DISABLED this method will re-enable the CasProcesser.
      Parameters:
      aCasProcessor - CASProcessor to be added to the processing pipeline
      Throws:
      ResourceConfigurationException
    • addCasProcessor

      public void addCasProcessor(CasProcessor aCasProcessor, int aIndex) throws ResourceConfigurationException
      Adds a CASProcessor to the processing pipeline at a given place in the processing pipeline
      Parameters:
      aCasProcessor - CASProcessor to be added to the processing pipeline
      aIndex - - insertion point for a given CasProcessor
      Throws:
      ResourceConfigurationException
    • removeCasProcessor

      public void removeCasProcessor(int aCasProcessorIndex)
      Removes a CASProcessor from the processing pipeline
      Parameters:
      aCasProcessorIndex - - CasProcessor position in processing pipeline
    • disableCasProcessor

      public void disableCasProcessor(int aCasProcessorIndex)
      Disable a CASProcessor in the processing pipeline
      Parameters:
      aCasProcessorIndex - CASProcessor to be added to the processing pipeline
    • disableCasProcessor

      public void disableCasProcessor(String aCasProcessorName)
      Disable a CASProcessor in the processing pipeline
      Parameters:
      aCasProcessorName - CASProcessor to be added to the processing pipeline
    • enableCasProcessor

      public void enableCasProcessor(String aCasProcessorName)
      Disable a CASProcessor in the processing pipeline
      Parameters:
      aCasProcessorName - CASProcessor to be added to the processing pipeline
    • getCasProcessors

      public CasProcessor[] getCasProcessors()
      Returns all CASProcesors in the processing pipeline
    • deployConsumers

      private void deployConsumers() throws AbortCPMException
      Deploys all Cas Consumers
      Throws:
      AbortCPMException - -
    • redeployAnalysisEngine

      public void redeployAnalysisEngine(ProcessingContainer aProcessingContainer) throws Exception
      Deploys CasProcessor and associates it with a ProcessingContainer
      Parameters:
      aProcessingContainer -
      Throws:
      Exception
    • deployAnalysisEngines

      private void deployAnalysisEngines() throws AbortCPMException
      Deploys All Analysis Engines. Analysis Engines run in a replicated processing units seperate from Cas Consumers.
      Throws:
      AbortCPMException - -
    • deployCasProcessors

      public void deployCasProcessors() throws AbortCPMException
      Starts CASProcessor containers one a time. During this phase the container deploys a TAE as local,remote, or integrated CasProcessor.
      Throws:
      AbortCPMException
    • restoreFromCheckpoint

      private void restoreFromCheckpoint(String component, String aEvType)
      Restores named events from the checkpoint
      Parameters:
      component - - component name to restore named event for
      aEvType - - event to restore
    • copyComponentEvents

      private void copyComponentEvents(String aEvType, List aList, ProcessTrace aPTr) throws IOException
      Copy given component events
      Parameters:
      aEvType - - event type
      aList - - list of events to copy
      aPTr - -
      Throws:
      IOException - -
    • isRunning

      public boolean isRunning()
      Returns a global flag indicating if this Thread is in processing state
    • isPaused

      public boolean isPaused()
      Returns a global flag indicating if this Thread is in pause state
    • pauseIt

      public void pauseIt()
      Pauses this thread
    • resumeIt

      public void resumeIt()
      Resumes this thread
    • setCollectionReader

      public void setCollectionReader(BaseCollectionReader aCollectionReader)
      Sets CollectionReader to use during processing
      Parameters:
      aCollectionReader - aCollectionReader
    • setNumToProcess

      public void setNumToProcess(long aNumToProcess)
      Defines the size of the batch
    • getLastProcessedDocId

      public String getLastProcessedDocId()
      Returns Id of the last document processed
    • getLastDocRepository

      public String getLastDocRepository()
    • producePU

      private ProcessingUnit producePU(String aClassName) throws Exception
      Instantiate custom Processing Pipeline
      Parameters:
      aClassName - - name of a class that extends ProcessingUnit
      Returns:
      - an instance of the ProcessingUnit
      Throws:
      Exception - -
    • startDebugControlThread

      private void startDebugControlThread()
    • createOutputQueue

      private BoundedWorkQueue createOutputQueue(int aQueueSize) throws Exception
      Instantiate custom Output Queue
      Parameters:
      aQueueSize - - max size of the queue
      Returns:
      - new instance of the output queue
      Throws:
      Exception - -
    • notifyListenersWithException

      private void notifyListenersWithException(Exception e)
      Notify listeners of a given exception
      Parameters:
      e - - en exception to be sent to listeners
    • pipelineKilled

      public void pipelineKilled(String aPipelineThreadName)
      Callback method used to notify the engine when a processing pipeline is killed due to excessive errors. This method is only called if the processing pipeline is unable to acquire a connection to remote service and when configuration indicates 'kill-pipeline' as the action to take on excessive errors. When running with multiple pipelines, routine decrements a global pipeline counter and tests if there are no more left. When all pipelines are killed as described above, the CPM needs to terminate. Since pipelines are prematurely killed, there are artifacts (CASes) in the work queue. These must be removed from the work queue and disposed of (released) back to the CAS pool so that the Collection Reader thread properly exits.
      Parameters:
      aPipelineThreadName - - name of the pipeline thread exiting from its run() method
    • run

      public void run()
      Using given configuration creates and starts CPE processing pipeline. It is either single-threaded or a multi-threaded pipeline. Which is actually used depends on the configuration defined in the CPE descriptor. In multi-threaded mode, the CPE starts number of threads: 1) ArtifactProducer Thread - this is a thread containing a Collection Reader. It runs asynchronously and it fills a WorkQueue with CASes. 2) CasConsumer Thread - this is an optional thread. It is only instantiated if there Cas Consumers in the pipeline 3) Processing Threads - one or more processing threads, configured identically, that are performing analysis How many threads are started depends on configuration in CPE descriptor All threads started here are placed in a ThreadGroup. This provides a catch-all mechanism for errors that may occur in the CPM. If error is thrown, the ThreadGroup is notified. The ThreadGroup than notifies all registers listeners to give an application a chance to report the error and do necessary cleanup. This routine manages all the threads and makes sure that all of them are cleaned up before returning. The ThreadGroup must cleanup all threads under its control otherwise a memory leak occurs. Even those threads that are not started must be cleaned as they end up in the ThreadGroup when instantiated. The code uses number of state variables to make decisions during cleanup.
      Specified by:
      run in interface Runnable
      Overrides:
      run in class Thread
    • forcePUShutdown

      private void forcePUShutdown()
      Place EOF Token onto a work queue to force thread exit
    • getTimer

      private UimaTimer getTimer() throws Exception
      Return timer to measure performace of the cpm. The timer can optionally be configured in the CPE descriptor. If none defined, the method returns default timer.
      Returns:
      - customer timer or JavaTimer (default)
      Throws:
      Exception - -
    • cleanup

      public void cleanup()
      Null out fields of this object. Call this only when this object is no longer needed.
    • registerTypeSystemsWithCasManager

      private void registerTypeSystemsWithCasManager() throws Exception
      Registers Type Systems of all components with the CasManager.
      Throws:
      Exception
    • callTypeSystemInit

      private void callTypeSystemInit() throws ResourceInitializationException
      Call typeSystemInit method on each component
      Throws:
      ResourceInitializationException
    • stopCasProcessors

      public void stopCasProcessors(boolean kill) throws CasProcessorDeploymentException
      Stops All Cas Processors and optionally changes the status according to kill flag
      Parameters:
      kill - - true if CPE has been stopped before completing normally
      Throws:
      CasProcessorDeploymentException
    • getProgress

      public Progress[] getProgress()
      Returns collectionReader progress.
    • getStatForContainer

      private HashMap getStatForContainer(ProcessingContainer aContainer)
    • saveStat

      private void saveStat(String aStatLabel, String aStatValue, ProcessingContainer aContainer)
    • isProcessorReady

      private boolean isProcessorReady(int aStatus)
      Check if the CASProcessor status is available for processing
    • invalidateCASes

      public void invalidateCASes(CAS[] aCASList)
    • releaseCASes

      public void releaseCASes(CAS[] aCASList)
      Releases given cases back to pool.
      Parameters:
      aCASList - - cas list to release
    • setPerformanceTuningSettings

      public void setPerformanceTuningSettings(Properties aPerformanceTuningSettings)
      Overrides the default performance tuning settings for this CPE. This affects things such as CAS sizing parameters.
      Parameters:
      aPerformanceTuningSettings - the new settings
      See Also:
    • getPerformanceTuningSettings

      public Properties getPerformanceTuningSettings()
      Returns:
      Returns the PerformanceTuningSettings.
    • setProcessControllerAdapter

      public void setProcessControllerAdapter(ProcessControllerAdapter aPca)
      Parameters:
      aPca -
    • getCpeConfig

      protected CpeConfiguration getCpeConfig() throws Exception
      Throws:
      Exception
    • processingUnitShutdown

      void processingUnitShutdown(ProcessingUnit unit)
      Called from the ProcessingUnits when they shutdown due to having received the EOFToken. When all ProcessingUnits have shut down, we put an EOFToken on the output queue so that The CAS Consumers will also shut down. -Adam
    • dropCasOnException

      public boolean dropCasOnException()
    • getCasWithSOFA

      private Object getCasWithSOFA(Object entity, ProcessTrace pTrTemp)
    • needsView

      private boolean needsView()
      Returns:
      true if needsTCas
    • bootstrapCPE

      private void bootstrapCPE() throws Exception
      Initialize the CPE
      Throws:
      Exception - -
    • setupProcessingPipeline

      private void setupProcessingPipeline() throws Exception
      Setup single threaded pipeline
      Throws:
      Exception - -
    • setupConsumerPipeline

      private void setupConsumerPipeline() throws Exception
      Setup Cas Consumer pipeline as single threaded
      Throws:
      Exception - -
    • skipDroppedDocument

      private boolean skipDroppedDocument(Object[] entity)
      Determines if a given CAS should be skipped
      Parameters:
      entity - - container for CAS
      Returns:
      true if a given CAS should be skipped
    • runSingleThreaded

      public void runSingleThreaded() throws Exception
      Runs the CPE in a single thread without queues.
      Throws:
      Exception - -
    • endOfProcessingReached

      private boolean endOfProcessingReached(long entityCount)
      Determines if the CPM processed all documents
      Parameters:
      entityCount - - number of documents processed so far
      Returns:
      true if all documents processed, false otherwise
    • handleException

      private void handleException(Throwable t, Object[] entity, ProcessTrace aPTrace)
      Handle given exception
      Parameters:
      t - - exception to handle
      entity - - CAS container
      aPTrace - - process trace
    • notifyListeners

      private void notifyListeners(int aMsgType, Object[] entity, ProcessTrace aPTrace)
      Parameters:
      aMsgType -
      entity -
      aPTrace -
    • notifyListeners

      private void notifyListeners(int aMsgType, Object[] entity, ProcessTrace aPTrace, Throwable t)
    • callEntityProcessCompleteWithCAS

      public static void callEntityProcessCompleteWithCAS(StatusCallbackListener statCL, CAS cas, EntityProcessStatus eps)
      Internal use only, public for crss package access. switches class loaders and locks cas
      Parameters:
      statCL - status call back listener
      cas - cas
      eps - entity process status
    • getProcessTrace

      private ProcessTrace getProcessTrace() throws Exception
      Throws:
      Exception
    • tearDownCPE

      private void tearDownCPE()
      Stop and cleanup single-threaded CPE.
    • waitForCpmToResumeIfPaused

      private void waitForCpmToResumeIfPaused()