Overview - Repositories
What Are Repositories in Voyager Search?
In Voyager Search, a repository represents a defined connection to a data source. Repositories are the foundation of the indexing pipeline, allowing Vose to discover, ingest, and enrich content across local and remote systems.
Repositories serve as the starting point for indexing, and can be configured to target:
Local file systems
Cloud storage (e.g., S3, Azure)
Web services (e.g., WFS, CSW)
Enterprise platforms (e.g., SharePoint, ArcGIS Portal)
Repository Role in the Indexing Pipeline
Repositories define what data is indexed and how it flows through the Vose pipeline — which includes stages such as:
Discovery – Locating content in the repository
Extraction – Reading metadata and file contents
Processing – Enriching or transforming metadata
Indexing – Committing data to the Flex Index
These stages are managed and executed by Agents, and orchestrated by HQ.
🧩 Repositories are reusable and modular. Once added, they can be scheduled, edited, and linked to multiple pipelines.
Creating and Configuring Repositories
To create a new repository in HQ, you’ll typically go through these five key steps:
1. Selecting the Repository Type
Navigate to Manage > Repositories and click Add Repository. You’ll be prompted to select the type of data source, such as:
File System
Amazon S3
ArcGIS Portal
GeoServer
HTTP/S Endpoint
WFS, WMS, or CSW
Each type determines the available configuration options and pipeline stages that follow.
🔗 Selecting the Repository Type »
2. Configuring Connection Parameters
After selecting the type, you'll enter connection details such as:
Hostname or file path
API key or authentication token
Protocol settings
Filters (e.g., file types, time windows)
These fields ensure secure and scoped access to only relevant content.
🔗 Configuring Connection Parameters
3. Previewing the Repository
Before committing, use the Preview button to:
Test the connection
View discovered content
Ensure filters are working as intended
This step is especially helpful for cloud and API-based repositories where authentication or schema mismatches can occur.
4. Assigning Pipeline Stages
Once your repository is connected, assign it to a pipeline to determine what happens to the data after discovery. Pipeline stages may include:
Metadata enrichment
Geo-tagging
Tagging by file type or size
Publishing to the Flex Index
🔗 Pipeline Stages & Indexing in HQ
5. Saving and Running the Repository Job
Save the repository and either:
Schedule it to run at intervals (e.g., nightly)
Run it immediately via the Index Now button
Once executed, results can be monitored in the Jobs tab in HQ.
Repository Best Practices
Name Clearly: Use clear and descriptive names for each repository.
Use Filters: Limit scope to avoid excessive indexing (e.g., only index
.tif,.docx).Secure Credentials: Always use secure methods (e.g., tokens over basic auth).
Monitor Jobs: Use HQ’s logging and job viewer to track issues and performance.
📑 Summary Table
Step | Description |
|---|---|
| Choose from supported source types |
| Enter connection and filter settings |
| Test access and verify discovered content |
| Link to indexing and enrichment stages |
| Begin indexing manually or automatically |