Tune data aggregation and query performance with Elasticsearch. A primary reason for using Elasticsearch is to support searches through data. Users should be able to quickly locate the information they are looking for. Additionally, the system must enable users to ask questions of the data, seek correlations, and come to conclusions that can drive business decisions. This processing is what differentiates data from information. This document summarizes options that you can consider when determining the best way to optimize your system for query and search performance. All performance recommendations depend largely on the scenarios that apply to your situation, the volume of data that you are indexing, and the rate at which applications and users query your data. You should carefully test the results of any change in configuration or indexing structure using your own data and workloads to assess the benefits to your specific scenarios. To this end, this document also describes a number of benchmarks that were performed for one specific scenario implemented using different configurations. ![]() Databases are used in several different situations and can make sorting through a large amount of data very easy. Microsoft Access is a program that makes creating. The following queries were performed as a batch by each iteration of the tests. The names in italics are used to refer to these queries in the remainder of this document. Help for all Office apps. Set up your Office 365 subscription. Find how-to articles and video tutorials. Contact our Answer Techs for assisted support. You can adapt the approach taken to assess the performance of your own systems. The details of these tests are described in the appendix. This section describes some common factors that you should think about when designing indexes that need to support fast querying and searching. Storing multiple types in an index. An Elasticsearch index can contain multiple types. It may be better to avoid this approach and create a separate index for each type. Consider the following points: Different types might specify different analyzers, and it is not always clear which analyzer Elasticsearch should use if a query is performed at the index level rather than at the type level. President Trump is expected to end the Obama-era program that protected young undocumented immigrants from deportation, according to reports. Under the Deferred. Graphs In Microsoft Access Reports With Multiple Queries PhpSee Avoiding Type Gotchas for details. Shards for indexes that hold multiple types will likely be bigger than those for indexes that contain a single type. The bigger a shard, the more effort is required by Elasticsearch to filter data when performing queries. If there is a significant mismatch between data volumes for the types, information for one type can become sparsely distributed across many shards reducing the efficiency of searches that retrieve this data. The effects of sharing an index between types In the upper part of the diagram, the same index is shared by documents of type A and type B. There are many more documents of type A than type B. Searches for type A will involve querying all four shards. The lower part of the diagram shows the effect if separate indexes are created for each type. In this case, searches for type A will only require accessing two shards. Small shards can be more evenly distributed than large shards, making it easier for Elasticsearch to spread the load across nodes. Different types might have different retention periods. It can be difficult to archive old data that shares shards with active data. However, under some circumstances sharing an index across types can be efficient if: Searches regularly span types held in the same index. Graphs In Microsoft Access Reports With Multiple Queries In One SqlThe types only have a small number of documents each. Maintaining a separate set of shards for each type can become a significant overhead in this case. Optimizing index types. An Elasticsearch index contains a copy of the original JSON documents that were used to populate it. This information is held in the _source field of each indexed item. ![]() This data is not searchable, but by default is returned by get and search requests. However, this field incurs overhead and occupies storage, making shards larger and increasing the volume of I/O performed. You can disable the _source field on a per type basis: PUT my_index. Disabling this field also removes the ability to perform the following operations: Updating data in the index by using the update API. Performing searches that return highlighted data. Reindexing from one Elasticsearch index directly to another. Changing mappings or analysis settings. Debugging queries by viewing the original document. Reindexing data. The number of shards available to an index ultimately determines the capacity of the index. You can take an initial (and informed) guess at how many shards will be required, but you should always consider your document re- indexing strategy up front. In many cases, reindexing may be an intended task as data grows. You may not want to allocate a large number of shards to an index initially, for the sake of search optimization, but allocate new shards as the volume of data expands. In other cases reindexing might need to be performed on a more ad- hoc basis if your estimates about data volume growth simply prove to be inaccurate. Note. Reindexing might not be necessary for data that ages quickly. In this case, an application might create a new index for each period of time. Examples include performance logs or audit data which could be stored in a fresh index each day. Reindexing effectively involves creating a new index from the data in an old one, and then removing the old index. If an index is large, this process can take time, and you may need to ensure that the data remains searchable during this period. For this reason, you should create an alias for each index, and queries should retrieve data through these aliases. While reindexing, keep the alias pointing at the old index, and then switch it to reference the new index once reindexing is complete. This approach is also useful for accessing time- based data that creates a new index each day. To access the current data use an alias that rolls over to the new index as it is created. Managing mappings. Elasticsearch uses mappings to determine how to interpret the data that occurs in each field in a document. Each type has its own mapping, which effectively defines a schema for that type. Elasticsearch uses this information to generate inverted indexes for each field in the documents in a type. In any document, each field has a datatype (such as string, date, or long) and a value. You can specify the mappings for an index when the index is first created, or they can be inferred by Elasticsearch when new documents are added to a type. However, consider the following points: Mappings generated dynamically can cause errors depending on how fields are interpreted when documents are added to an index. For example, document 1 could contain a field A that holds a number and causes Elasticsearch to add a mapping that specifies that this field is a long. If a subsequent document is added in which field A contains nonnumeric data, then it will fail. In this case, field A should probably have been interpreted as a string when the first document was added. Specifying this mapping when the index is created can help to prevent such problems. Design your documents to avoid generating excessively large mappings as this can add significant overhead when performing searches, consume lots of memory, and also cause queries to fail to find data. Adopt a consistent naming convention for fields in documents that share the same type. For example, don't use field names such as "first_name", "First. Name", and "forename" in different documents. Use the same field name in each document. Additionally, do not attempt to use values as keys (this is a common approach in Column- Family databases, but can cause inefficiencies and failures with Elasticsearch.) For more information, see Mapping Explosion. Use not_analyzed to avoid tokenization where appropriate. For example, if a document contains a string field named data that holds the value "ABC- DEF" then you might attempt to perform a search for all documents that match this value as follows: GET /myindex/mydata/_search. ABC- DEF". However, this search will fail to return the expected results due to the way in which the string ABC- DEF is tokenized when it is indexed. It will be effectively split into two tokens, ABC and DEF, by the hyphen. This feature is designed to support full text searching, but if you want the string to be interpreted as a single atomic item you should disable tokenization when the document is added to the index. You can use a mapping such as this: PUT /myindex. For more information, see Finding Exact Values. Using doc values. Many queries and aggregations require that data is sorted as part of the search operation. Sorting requires being able to map one or more terms to a list of documents. To assist in this process, Elasticsearch can load all of the values for a field used as a sort key into memory. This information is known as fielddata.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
November 2017
Categories |