Skip to content

Configuring yente

The Yente service is built to require a minimum of configuration, but several environment variables can be used to change its behavior.

Env. variable Default Description
YENTE_INDEX_URL http://index:9200 The URL of your search index provider backend.
YENTE_INDEX_USERNAME - Username for the search provider. Required if connection using Elastic Cloud.
YENTE_INDEX_PASSWORD - Elasticsearch password. Required if connection using Elastic Cloud.
YENTE_INDEX_NAME yente The prefix name that will be used for the search index.
YENTE_INDEX_TYPE elasticsearch Should be one of elasticsearch or opensearch, depending on what provider you use.
YENTE_ELASTICSEARCH_CLOUD_ID - If you are using Elastic Cloud and want to use the ID rather than endpoint URL.
YENTE_OPENSEARCH_REGION - Specifies your region if you are using AWS hosted OpenSearch.
YENTE_OPENSEARCH_SERVICE - Should be aoss if you are using Amazon OpenSearch Serverless Service and es if you are using the default Amazon OpenSearch Service.

Managing data updates

Yente features various configuration options related to data refresh and re-indexing. See [Managing data updates][reindex.md].

Other settings

Env. variable Default Description
YENTE_UPDATE_TOKEN unsafe-default Should be set to a secret string. The token is used with a POST request to the /updatez endpoint to force an immediate re-indexing of the data.
YENTE_HTTP_PROXY - Set a proxy for Yentes outgoing HTTP requests.
YENTE_MAX_BATCH 100 How many entities to accept in a /match batch at most.
YENTE_MATCH_PAGE 5 How many results to return per /match query by default.
YENTE_MAX_MATCHES 500 How many results to return per /match query at most.
YENTE_MATCH_CANDIDATES 10 How many candidates to retrieve from the search as a multiplier of the /match limit. Note that increasing this parameter will also increase query cost, as each of these candidates scored after retrieval from the index.
YENTE_MATCH_FUZZY true Whether to run expensive Levenshtein queries inside ElasticSearch.
YENTE_QUERY_CONCURRENCY 10 How many match and search queries to run against ES in parallel.
YENTE_DELTA_UPDATES true When set to false Yente will download the entire dataset when refreshing the index.
YENTE_STREAM_LOAD true If set to false, will download the full data before indexing it. This can improve the stability of the indexer, especially when the network connection is a bit sketchy, but requires some local disk cache space.