Configuring yente

The Yente service is built to require a minimum of configuration, but several environment variables can be used to change its behavior.

Env. variable	Default	Description
`YENTE_INDEX_URL`	`http://index:9200`	The URL of your search index provider backend.
`YENTE_INDEX_USERNAME`	-	Username for the search provider. Required if connection using Elastic Cloud.
`YENTE_INDEX_PASSWORD`	-	Elasticsearch password. Required if connection using Elastic Cloud.
`YENTE_INDEX_NAME`	`yente`	The prefix name that will be used for the search index.
`YENTE_INDEX_TYPE`	`elasticsearch`	Should be one of `elasticsearch` or `opensearch`, depending on what provider you use.
`YENTE_ELASTICSEARCH_CLOUD_ID`	-	If you are using Elastic Cloud and want to use the ID rather than endpoint URL.
`YENTE_OPENSEARCH_REGION`	-	Specifies your region if you are using AWS hosted OpenSearch.
`YENTE_OPENSEARCH_SERVICE`	-	Should be `aoss` if you are using Amazon OpenSearch Serverless Service and `es` if you are using the default Amazon OpenSearch Service.

Managing data updates

Yente features various configuration options related to data refresh and re-indexing. See Managing data updates.

Env. variable	Default	Description
`YENTE_UPDATE_TOKEN`	`unsafe-default`	Should be set to a secret string. The token is used with a `POST` request to the `/updatez` endpoint to force an immediate re-indexing of the data.
`YENTE_HTTP_PROXY`	-	Set a proxy for Yentes outgoing HTTP requests.
`YENTE_MAX_BATCH`	`100`	How many entities to accept in a `/match` batch at most.
`YENTE_MATCH_PAGE`	`5`	How many results to return per `/match` query by default.
`YENTE_MAX_MATCHES`	`500`	How many results to return per `/match` query at most.
`YENTE_MATCH_CANDIDATES`	`10`	How many candidates to retrieve from the search as a multiplier of the `/match` limit. Note that increasing this parameter will also increase query cost, as each of these candidates scored after retrieval from the index.
`YENTE_MATCH_FUZZY`	`true`	Whether to run expensive Levenshtein queries inside ElasticSearch.
`YENTE_QUERY_CONCURRENCY`	`10`	How many match and search queries to run against ES in parallel.
`YENTE_DELTA_UPDATES`	`true`	When set to `false` Yente will download the entire dataset when refreshing the index.
`YENTE_STREAM_LOAD`	`true`	If set to `false`, will download the full data before indexing it. This can improve the stability of the indexer, especially when the network connection is a bit sketchy, but requires some local disk cache space.