Overview - Repositories

Overview - Repositories

What Are Repositories in Voyager Search?

In Voyager Search, a repository represents a defined connection to a data source. Repositories are the foundation of the indexing pipeline, allowing Vose to discover, ingest, and enrich content across local and remote systems.

Repositories serve as the starting point for indexing, and can be configured to target:

  • Local file systems

  • Cloud storage (e.g., S3, Azure)

  • Web services (e.g., WFS, CSW)

  • Enterprise platforms (e.g., SharePoint, ArcGIS Portal)


Repository Role in the Indexing Pipeline

Repositories define what data is indexed and how it flows through the Vose pipeline — which includes stages such as:

  1. Discovery – Locating content in the repository

  2. Extraction – Reading metadata and file contents

  3. Processing – Enriching or transforming metadata

  4. Indexing – Committing data to the Flex Index

These stages are managed and executed by Agents, and orchestrated by HQ.

🧩 Repositories are reusable and modular. Once added, they can be scheduled, edited, and linked to multiple pipelines.


Creating and Configuring Repositories

To create a new repository in HQ, you’ll typically go through these five key steps:

1. Selecting the Repository Type

Navigate to Manage > Repositories and click Add Repository. You’ll be prompted to select the type of data source, such as:

  • File System

  • Amazon S3

  • ArcGIS Portal

  • GeoServer

  • HTTP/S Endpoint

  • WFS, WMS, or CSW

Each type determines the available configuration options and pipeline stages that follow.

🔗 Selecting the Repository Type »


2. Configuring Connection Parameters

After selecting the type, you'll enter connection details such as:

  • Hostname or file path

  • API key or authentication token

  • Protocol settings

  • Filters (e.g., file types, time windows)

These fields ensure secure and scoped access to only relevant content.

🔗 Configuring Connection Parameters


3. Previewing the Repository

Before committing, use the Preview button to:

  • Test the connection

  • View discovered content

  • Ensure filters are working as intended

This step is especially helpful for cloud and API-based repositories where authentication or schema mismatches can occur.

🔗 Previewing Repositories


4. Assigning Pipeline Stages

Once your repository is connected, assign it to a pipeline to determine what happens to the data after discovery. Pipeline stages may include:

  • Metadata enrichment

  • Geo-tagging

  • Tagging by file type or size

  • Publishing to the Flex Index

🔗 Pipeline Stages & Indexing in HQ


5. Saving and Running the Repository Job

Save the repository and either:

  • Schedule it to run at intervals (e.g., nightly)

  • Run it immediately via the Index Now button

Once executed, results can be monitored in the Jobs tab in HQ.

🔗 Adding Repositories


Repository Best Practices

  • Name Clearly: Use clear and descriptive names for each repository.

  • Use Filters: Limit scope to avoid excessive indexing (e.g., only index .tif, .docx).

  • Secure Credentials: Always use secure methods (e.g., tokens over basic auth).

  • Monitor Jobs: Use HQ’s logging and job viewer to track issues and performance.


📑 Summary Table

Step

Description

Step

Description

  1. Select Repository Type

Choose from supported source types

  1. Configure Parameters

Enter connection and filter settings

  1. Preview

Test access and verify discovered content

  1. Assign to Pipeline

Link to indexing and enrichment stages

  1. Run or Schedule

Begin indexing manually or automatically