Overview - Repositories

What Are Repositories in Voyager Search?

In Voyager Search, a repository represents a defined connection to a data source. Repositories are the foundation of the indexing pipeline, allowing Vose to discover, ingest, and enrich content across local and remote systems.

Repositories serve as the starting point for indexing, and can be configured to target:

Local file systems
Cloud storage (e.g., S3, Azure)
Web services (e.g., WFS, CSW)
Enterprise platforms (e.g., SharePoint, ArcGIS Portal)

Repository Role in the Indexing Pipeline

Repositories define what data is indexed and how it flows through the Vose pipeline — which includes stages such as:

Discovery – Locating content in the repository
Extraction – Reading metadata and file contents
Processing – Enriching or transforming metadata
Indexing – Committing data to the Flex Index

These stages are managed and executed by Agents, and orchestrated by HQ.

🧩 Repositories are reusable and modular. Once added, they can be scheduled, edited, and linked to multiple pipelines.

Creating and Configuring Repositories

To create a new repository in HQ, you’ll typically go through these five key steps:

1. Selecting the Repository Type

Navigate to Manage > Repositories and click Add Repository. You’ll be prompted to select the type of data source, such as:

File System
Amazon S3
ArcGIS Portal
GeoServer
HTTP/S Endpoint
WFS, WMS, or CSW

Each type determines the available configuration options and pipeline stages that follow.

🔗 Selecting the Repository Type »

2. Configuring Connection Parameters

After selecting the type, you'll enter connection details such as:

Hostname or file path
API key or authentication token
Protocol settings
Filters (e.g., file types, time windows)

These fields ensure secure and scoped access to only relevant content.

🔗 Configuring Connection Parameters

3. Previewing the Repository

Before committing, use the Preview button to:

Test the connection
View discovered content
Ensure filters are working as intended

This step is especially helpful for cloud and API-based repositories where authentication or schema mismatches can occur.

🔗 Previewing Repositories

4. Assigning Pipeline Stages

Once your repository is connected, assign it to a pipeline to determine what happens to the data after discovery. Pipeline stages may include:

Metadata enrichment
Geo-tagging
Tagging by file type or size
Publishing to the Flex Index

🔗 Pipeline Stages & Indexing in HQ

5. Saving and Running the Repository Job

Save the repository and either:

Schedule it to run at intervals (e.g., nightly)
Run it immediately via the Index Now button

Once executed, results can be monitored in the Jobs tab in HQ.

🔗 Adding Repositories

Repository Best Practices

Name Clearly: Use clear and descriptive names for each repository.
Use Filters: Limit scope to avoid excessive indexing (e.g., only index .tif, .docx).
Secure Credentials: Always use secure methods (e.g., tokens over basic auth).
Monitor Jobs: Use HQ’s logging and job viewer to track issues and performance.

📑 Summary Table

Step	Description

Step	Description
Select Repository Type	Choose from supported source types
Configure Parameters	Enter connection and filter settings
Preview	Test access and verify discovered content
Assign to Pipeline	Link to indexing and enrichment stages
Run or Schedule	Begin indexing manually or automatically