Overview - Workers
Introduction
In Voyager Search, workers are the backbone of distributed processing across various components of the platform. Designed for scalability, modularity, and efficiency, workers are responsible for executing defined tasks—such as indexing, extracting metadata, processing files, and monitoring activity—through asynchronous and parallelized operations. They are essential in both administrative (HQ) and user-facing (Navigo) environments.
1. Core Concepts and Terminology
Worker
A worker is a lightweight, modular processing unit that performs a specific task in a distributed queue-based architecture. Workers run as background processes or services and are managed by the HQ backend.
Job
A job is a unit of work assigned to a worker. It includes instructions and payload data needed to complete a task, such as indexing a document, extracting metadata, or generating previews.
Queue
Jobs are distributed to workers using internal queues. Voyager leverages in-memory queues or external services like Amazon SQS for scalable job distribution, especially in cloud environments.
2. Worker Architecture in Voyager
Voyager Search implements a microservice-style approach to workers:
Component | Description |
|---|---|
Job Dispatcher | Assigns jobs to available workers based on type and availability. |
Worker Types | Specialized workers such as |
Messaging System | Supports job dispatch, retry logic, and status reporting (internal, or via SQS). |
Monitoring and Logs | Worker activities are logged and exposed for monitoring via the HQ UI or diagnostic endpoints. |
3. Worker Types and Their Functions
Worker Type | Purpose |
|---|---|
IndexWorker | Executes indexing jobs on supported file types, extracting metadata and content. |
ExtractWorker | Handles data extraction (e.g., zip, shapefile components). |
PreviewWorker | Generates visual previews and thumbnails for indexed items. |
ThumbnailWorker | Specifically processes images to generate thumbnails. |
AuditWorker | Logs and audits processing history. |
NotificationWorker | Sends status or alert notifications based on job outcomes. |
Custom Workers | Organizations can build custom workers for specialized pipelines (e.g., Python module integrations). |
4. Worker Operation in Navigo
Navigo, Voyager Search’s end-user interface, relies on the results produced by workers. Here’s how workers indirectly support Navigo:
Search Results: IndexWorker and ExtractWorker ensure content is discoverable by populating the Solr/Lucene index used in Navigo.
Faceted Search & Metadata: Workers enrich indexed documents with metadata, enabling advanced filtering and sorting.
Thumbnails & Previews: PreviewWorker and ThumbnailWorker support user-friendly browsing experiences in Navigo.
One-Time Downloads: Workers may generate temporary output files made available through Navigo’s download APIs.
Real-Time Updates: Automated job queues allow near real-time indexing and refresh of Navigo’s data catalog.
5. Worker Lifecycle
Job Submission: Triggered by user action or system schedule.
Job Queueing: Added to the appropriate processing queue.
Job Dispatch: Dispatched by HQ to an available worker.
Execution: Worker completes the task, possibly generating additional sub-jobs.
Result Storage: Outputs stored in index, database, or job logs.
Monitoring/Retry: Jobs failing under certain conditions are retried or flagged.
6. Scalability and Deployment
Local Execution: Workers can run locally on the same instance as Voyager HQ.
Clustered Deployment: For large-scale processing, workers can run across multiple machines or containers, sharing a common queue (e.g., Amazon SQS).
Elastic Scaling: In cloud environments, the number of workers can scale based on queue depth or resource thresholds.
7. Diagnostics and Monitoring
VPID Tracking: Each job processed by a worker includes a VPID (Voyager Process ID) to trace execution context.
HQ Job Viewer: Administrators can review job history, worker status, and retry failures from the Voyager HQ interface.
System Logs: All job activities are logged for auditability and troubleshooting.
Conclusion
Voyager Search’s worker system is a critical part of the platform’s distributed, asynchronous architecture. It provides the scalability, modularity, and flexibility needed to process diverse data at scale while ensuring Navigo and other interfaces remain responsive and data-rich. Whether deployed locally or in a distributed cloud configuration, workers ensure that data indexing, enrichment, and delivery are handled efficiently and transparently.