Like most folks, we have also performed extensive benchmarking exercise on existing hardware infrastructure and image flavors. Be aware of different potential noisy-neighbors issues, especially on a multi-tenant-based infrastructure. This is especially true in a cloud-based environment where such information is usually abstracted from end users. Typically, we do not use Elasticsearch as the source of data for critical applications.īefore undertaking any benchmarking exercise, it’s really important to understand the underlying infrastructure that hosts your VMs. If the indices are time-based, then an index purge strategy is logged. Data source/retention: Original data source information (such as Oracle, MySQL, etc.) is captured on an onboarding template.Data read/write information: Consists of expected indexing/search rate, mode of ingestion (batch mode or individual documents), data freshness, average number of users, and specific search queries containing any aggregation, pagination, or sorting operations.Sizing Information: Captures the number of documents, their average document size, and year-on-year growth estimation.
Use case details: Consists of queries relating to use case description and significance.We collect the following information from customers before any use case is onboarded: On-boarding information has helped us in cluster planning and defining SLA for customer commitments. The sizing uses historic learnings from our benchmarking exercises. Based on the inputs provided by the customer, infrastructure sizing is performed. On-boarding informationĬustomers’ requirements are captured onto an onboarding template that contains information such as document size, retention policy, and read/write throughput requirement. This lifecycle stage begins when a new use case is being onboarded onto our ES-AAS platform.
All Elasticsearch clusters deployed within the eBay infrastructure follow our defined Elasticsearch lifecycle depicted in the figure below. This blog provides guidelines on all the different pieces for creating a cluster lifecycle to allow streamlined management of Elasticsearch clusters. The platform currently manages around 35+ clusters and supports multiple data center deployments.
Our ES-AAS platform is hosted in a private internal cloud environment based on OpenStack. Defining an Elasticsearch cluster lifecycleĮBay’s Pronto, our implementation of the “Elasticsearch as service” (ES-AAS) platform, provides fully managed Elasticsearch clusters for various search use cases.