Class ArtifactProducer

java.lang.Object
java.lang.Thread
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer
All Implemented Interfaces:
Runnable

public class ArtifactProducer extends Thread
Component responsible for continuously filling a work queue with bundles containing Cas'es. The queue is shared with a Processing Pipeline that consumes bundles of Cas. As soon as the the bundle is removed from the queue, this component fetches data from configured Collection Reader and enques it onto the queue. This component facilitates asynchronous reading and processing of CAS by seperate threads running in the CPE. When end of processing is reached due to CPM shutdown or max number of entities are processed a special token, called EOFToken is placed onto a queue. It marks end of processing for Processing Units. No more data is expected to be placed on the work queue. The Processing Threads upon seeing the EOFToken are expected to complete processing and do necessary cleanup.
  • Field Details

    • threadState

      public int threadState
    • casPool

      private CPECasPool casPool
    • workQueue

      private BoundedWorkQueue workQueue
    • collectionReader

      private BaseCollectionReader collectionReader
    • readerFetchSize

      private int readerFetchSize
    • casList

      private CAS[] casList
    • entityCount

      private long entityCount
    • maxToProcess

      private long maxToProcess
    • cpm

      private CPMEngine cpm
    • cpmStatTable

      private Map cpmStatTable
    • lastDocId

      private String[] lastDocId
    • totalFetchTime

      private long totalFetchTime
    • timer

      private UimaTimer timer
    • callbackListeners

      private ArrayList callbackListeners
    • timedoutDocs

      private Hashtable timedoutDocs
    • isRunning

      private boolean isRunning
    • globalSharedProcessTrace

      private ProcessTrace globalSharedProcessTrace
  • Constructor Details

    • ArtifactProducer

      public ArtifactProducer(CPMEngine acpm)
      Instantiates and initializes this instance.
      Parameters:
      acpm -
    • ArtifactProducer

      public ArtifactProducer(CPMEngine acpm, CPECasPool aPool)
      Construct instance of this class with a reference to the cpe engine and a pool of cas'es.
      Parameters:
      acpm - - reference to the cpe
      aPool - - pool of cases
  • Method Details

    • isRunning

      public boolean isRunning()
    • setUimaTimer

      public void setUimaTimer(UimaTimer aTimer)
      Plug in Custom Timer to time events
      Parameters:
      aTimer - - custom timer
    • setProcessTrace

      public void setProcessTrace(ProcessTrace aProcTrace)
    • getCollectionReaderTotalFetchTime

      public long getCollectionReaderTotalFetchTime()
      Returns total time spent when fetching entities from a CollectionReader. This provides a way of gauging throughput of a particular CR.
      Returns:
      total time spent when fetching entities. -1 when the fetch time is unknown.
    • cleanup

      public void cleanup()
      Null out fields of this object. Call this only when this object is no longer needed.
    • setNumEntitiesToProcess

      public void setNumEntitiesToProcess(long aNumToProcess)
      Assign total number of entities to process
      Parameters:
      aNumToProcess - - number of entities to read from the Collection Reader
    • setCollectionReader

      public void setCollectionReader(BaseCollectionReader aCollectionReader)
      Assign CollectionReader to be used for reading
      Parameters:
      aCollectionReader - - collection reader as source of data
    • setWorkQueue

      public void setWorkQueue(BoundedWorkQueue aQueue)
      Assigns a queue where the artifacts produced by this component will be deposited
      Parameters:
      aQueue - - queue for the artifacts this class is producing
    • setCPMStatTable

      public void setCPMStatTable(Map aStatTable)
      Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.
      Parameters:
      aStatTable -
    • endOfProcessingReached

      private boolean endOfProcessingReached()
      Determines if the CPM has processed configured number of entities. Called after each fetch from the Collection Reader.
      Returns:
      true - all configurted entities processed, false otherwise
    • fillQueue

      public void fillQueue() throws Exception
      Fills the queue up to capacity. This is called before activating ProcessingPipeline as means of optimizing processing. When pipelines start up there are already entities in the work queue to process.
      Throws:
      Exception
    • readNext

      private Object[] readNext(int fetchSize) throws IOException, CollectionException
      Reads next set of entities from the CollectionReader. This method may return more than one Cas at a time.
      Returns:
      - The Object returned from the method depends on the type of the CollectionReader. Either CASData[] or CASObject[] initialized with document metadata and content is returned. If the CollectionReader has no more entities (EOF), null is returned.
      Throws:
      IOException - - error while reading corpus
      CollectionException - -
    • run

      public void run()
      Runs this thread until the CPM halts or the CollectionReader has no more entities. It continuously fills the work queue with entities returned by the CollectionReader.
      Specified by:
      run in interface Runnable
      Overrides:
      run in class Thread
    • notifyListeners

      private void notifyListeners(CAS aCas, Exception anException)
      Notify registered callback listeners of a given exception.
      Parameters:
      anException - - exception to propagate to callback listeners
    • placeEOFToken

      private void placeEOFToken()
      Place terminating EOFToken into a Work Queue. Any thread reading this token from the queue is responsible for terminating itself.
    • getLastDocId

      public String getLastDocId()
    • invalidate

      public void invalidate(CAS[] aCasList)