Cheat sheet of aem index definition structure
List (~100) of available oak Query Index Definition.

Summary


Oak does not index as much content by default as does Jackrabbit 2. You need to create custom indexes when necessary, much like in traditional RDBMSs.The intention of article is to provide cheatsheat of defnition structure.

Notes


  1. For upto date information and more details refer
  2. Get Latest Oak Hotfix
  3. Must watch Recording Video
  4. Useful Tools
  5. Special Thanks to Tommaso Teofili, Chetan Mehrotra, Alex Parvulescu , Andrew Khoury, Thomas Mueller, Davide Giannella, Eren Aydin,Varun Mehrotra, Goran Brodnik and Vikas Saurabh for the willingness to help me in the journey of some of challenging task with index.
Expand all Collapse all
Node/Property Name Type Default Description
oak:indexnt:unstructured
indexNameoak:QueryIndexDefinitionlucene async index for full text,property,sorting etc...
compatVersionlongBy default Oak uses older Lucene index implementation which does not supports property restrictions, index time aggregation etc. To make use of this feature set it to 2.
typeStringset to lucene. Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.
asyncStringset to async. sends the index update process to a background thread. it might lag behind in reflecting the current repository state while performing the query.
nameStringCaptures the name of the index which is used while logging
blobSizelong Size in bytes used for splitting the index files when storing them in NodeStore.
evaluatePathRestrictionsboolean If enabled the index can evaluate path restrictions.
includedPathsString []/ Oak 1.0.14, 1.2.3 List of paths which should be included in indexing.
excludedPathsString []empty Oak 1.0.14, 1.2.3 List of paths which should be excluded from indexing.
maxFieldLengthlong10000Numbers of terms indexed per field.
codecStringBy default if the index involves fulltext indexing then Oak Lucene uses OakCodec which disables compression. Due to this the index size may grow large. To enable compression you can set the codec to Lucene46. Refer to OAK-2853 for details.
indexPathStringPath of the index definition in the repository. To speed up the indexing with CopyOnWrite you would also need to set indexPath in index definition to the path of index in the repository. For e.g. if your index is defined at /oak:index/lucene then value of indexPath should be set to /oak:index/lucene. This would enable the indexer to perform any read during the indexing process locally and thus avoid costly read from remote. For more details refer to OAK-2247. This feature can be enabled via Lucene Index provider service configuration
functionNameString Name to be used to enable index usage with native query support.
queryPathshttps://issues.apache.org/jira/browse/OAK-2599
reindexboolean
persistenceStringTo store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
pathStringTo store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
indexRulesnt:unstructured
ruleNament:unstructuredAn index configuration can define one or more indexingRules for different nodeTypes.The ruleName will be nodeName like nt:base
inheritedbooleantrueDetermines if the rule is applicable on exact match or can be applied if match is done on basis of nodeType inheritance
indexNodeNamebooleanfalsesince Oak 1.0.20, 1.2.5 If set to true then index would also be created for node name. This would enable faster evaluation of queries involving constraints on Node name. For example:- select [jcr:path] from [nt:base] where NAME() = 'kite'
includePropertyTypesString [] Applicable when index is enabled for fulltext indexingFor full text index defaults to include all typesString array of property types which should be indexed.
costPerExecutionDouble For each query, the overhead is one operation. For each entry in the index, the cost is one.
costPerEntryDouble For each query, the overhead is one operation. For each entry in the index, the cost is one.
propertiesnt:unstructuredEach index rule consist of one ore more property definition defined under properties.
propertyNament:unstructuredCan be any name generally provide property name.
nameStringProperty name. If not defined then property name is set to the node name. If isRegexp is true then it defines the regular expression only to immediate property. Can also be set to a relative property.
propertyIndexboolean Whether the index for this property is used for equality conditions, ordering, and is not null conditions.
isRegexpboolean" If set to true then property name would be interpreted as a regular expression and the given definition would be applicable for matching property names. Note that expression should be structured such that it does not match ‘/’. .* - This property definition is applicable for all properties of given node jcr:content/metadata/.* - This property definition is applicable for all properties of child node jcr:content/metadata"
nodeScopeIndexboolean"Control whether the value of a property should be part of fulltext index. That is, you can do a jcr:contains(., ‘foo’) and it will return nodes that have a string property that contains the word foo. Example //element(*, app:Asset)[jcr:contains(., ‘image’)]"
boostdoublesince Oak 1.2.5 If the property is included in nodeScopeIndex then it defines the boost done for the index value against the given property name.
indexboolean Determines if this property should be indexed. Mostly useful for fulltext index where some properties need to be excluded from getting indexed.
useInExcerptbooleanControls whether the value of a property should be used to create an excerpt. The value of the property is still full-text indexed when set to false, but it will never show up in an excerpt for its parent node. If set to true then property value would be stored separately within index causing the index size to increase. So set it to true only if you make use of excerpt feature
analyzedboolean"Set this to true if the property is used as part of contains. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)]"
orderedboolean"If the property is to be used in order by clause to perform sorting then this should be set to true. This should be set to true only if the property is to be used to perform sorting as it increases the index size. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)] order by @size"
typeStringJCR Property type. Can be one of Date, Boolean, Double or Long. Mostly inferred from the indexed value. However in some cases where same property type is not used consistently across various nodes then it would recommened to specify the type explicitly.
nullCheckEnabledboolean" Since 1.0.12 If the property is checked for is null then this should be set to true. This should only be enabled for nodeTypes which are not generic as it leads to index entry for all nodes of that type where this property is not set. _//element(*, app:Asset)[not(jcr:content/@excludeFromSearch)] It would be better to use a query which checks for property existence or property being set to specific values as such queries can make use of index without any extra storage cost."
useInSuggestbooleansince Oak 1.1.17, 1.0.15 controls from which properties terms to be used for suggestions will be taken.
useInSpellcheckbooleansince Oak 1.1.17, 1.0.13 controls from which properties terms to be used for spellcheck corrections will be taken.
facetsbooleansince Oak 1.3.14 "used for retrieving facets, in order to do so the property facets must be set to true on the property definition."
facetsnt:unstructuredsince Oak 1.3.14 By default ACL checks are always performed on facets by the Lucene property index however this can be avoided by setting the property secure to false in the facets configuration node.
secureboolean
aggregatesnt:unstructured to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
ruleNament:unstructuredAn index configuration can define one or more aggregates for different nodeTypes.The ruleName will be nodeName like nt:base
reaggregateLimitlong5 (See JCR-2989 for details).
aggregateNodeIncludent:unstructured
pathString"Path pattern to include. Example
jcr:content - Name explicitly specified
* - Any child node at depth 1
*/* - Any child node at depth 2
primaryTypeStringRestrict the included nodes to a certain type. The restriction would be applied on the last node in given path
relativeNodebooleanBoolean property indicates that query can be performed against specific node
analyzersnt:unstructured @since Oak 1.2.0
defaultnt:unstructured
classStringExample:- org.apache.lucene.analysis.standard.StandardAnalyzer
luceneMatchVersionStringTo confirm to specific version specify it via luceneMatchVersion otherwise Oak would use a default version depending on version of Lucene it is shipped with. Ex:- LUCENE_47
stopwordsnt:file
charFiltersnt:unstructuredThe filters needs to be ordered
HTMLStrip
Mapping
tokenizernt:unstructuredThe filters needs to be ordered
name
filtersnt:unstructuredThe filters needs to be ordered
LowerCase
Stopnt:unstructured
words
stopx.txtnt:fileone or more file nodes. x can be 1 to n
PorterStemnt:unstructured
Synonymnt:unstructured
synonyms
synonym.txtnt:fileone or more file nodes. x can be 1 to n
pathTextnt:unstructured
tikant:unstructured
maxExtractLengthlong
config.xmlnt:filehttps://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
suggestionnt:unstructured
suggestUpdateFrequencyMinuteslong
suggestAnalyzedbooleanAnalyzed suggestions can be enabled by setting suggestAnalyzed property to true
indexNameoak:QueryIndexDefinitionSynchronous property index
typeStringset to property. Is useful whenever there is a query with a property constraint that is not full-text.
propertyNamesName[] index one property per index. (If multiple properties are indexed within one index, then the index contains all nodes that has either one of the properties, which can make the query less efficient, and can make the query pick the wrong index.)
uniquebooleana uniqueness constraint on this property is added. Ensure you set declaringNodeTypes, otherwise all nodes of the repository are affected (which is most likely not what you want), and you are not able to version the node.
declaringNodeTypesName[] the index only applies to a specified node type
reindexboolean
includedPathsString[]/ The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
excludedPathsString[]none The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
entryCountLongthe estimated number of path entries in the index, to override the cost estimation (a high entry count means a high cost).
keyCountLongthe estimated number of keys in the index, to override the cost estimation (a high key count means a lower cost and a low key count means a high cost when searching for specific keys; has no effect when searching for “is not null”).
reindex-asyncbooleanpushing the property index updates to a background job and when the indexing process is done, the property definition will be switched back to a synchronous updates mode. need to start the dedicated background job via a jmx call to the PropertyIndexAsyncReindex#startPropertyIndexAsyncReindex MBean. future
indexNameoak:QueryIndexDefinitionThis is deprecated. The Ordered index is an extension of the Property index. It keeps the order of the indexed property persistent in the repository.
typeStringset to ordered. Is useful speed up queries with "ORDER BY", equality and range clauses.
propertyNamesName It has to be a simple value list of type Name.
asyncStringThe index can be defind as asynchronous by setting the async property to async.
directionStringascendingThe direction of the sorting can be configured, by adding the direction property. It can have a value of ascending or descending.
reindexboolean The reindex flag which if set to true, will trigger a full content re-index.
indexNameoak:QueryIndexDefinitionThe purpose of the Solr index is mainly full-text search but it can also be used to index search by path, property restrictions and primary type restrictions. This means the Solr index in Oak can be used for any type of JCR query.
typeStringset to solr.
asyncStringset to async
reindexboolean The reindex flag which if set to true, will trigger a full content re-index.