Skip to content

Search configuration

Robert Bossy edited this page Jul 26, 2017 · 8 revisions

Table of Contents

File structure

<alvisir-search
    index="..."
    expander-index="..."
    default-field="..."
    query-expansion="..."
>

    <normalization field="...">...</normalization>

    <field-alias alias="...">
        ...
    </field-alias>

    <field-fragments field="..." fragments="..."/>

    <mandatory-field>...</mandatory-field>

    <expansion method="path" type="..." max-depth="..." max-sub-paths="..."/>

    <global-facet name="..." term-prefix="..." field="..." sort="..." cutoff="..." max-facets="..."/>

    <document-facet name="..." term-prefix="..." field="..." sort="..." label-query="..."/>

    <search-counts facets="..." snippets="..."/>
</alvisir-search>

General properties

<alvisir-search
    index="..."
    expander-index="..."
    default-field="..."
    query-expansion="..."
>

index (mandatory)

Path to the AlvisIR index directory. This index must have been created with AlvisNLP.

If the path is relative, it will be resolved from the directory where the search configuration file was read.

expander-index (optional)

Path to the expander indexer.

If the path is relative, it will be resolved from the directory where the search configuration file was read.

If this property is not specified, the queries will not be expanded.

default-field (mandatory)

Default query fields. Terms without a field qualifier will be searched in the specified default field.

query-expansion (optional)

Query expansion method. This property accepts two values:

  • basic: each term and each phrase are expanded individually.
  • advanced: term sequences are expanded, phrases are never expanded.

Field aliases

<field-alias alias="ALIAS">
    <field>FIELD1</field>
    <field>FIELD2</field>
    ...
</field-alias>

Declare the field alias ALIAS. ALIAS will be treated as a field in queries and configuration.

FIELD1, FIELD2, ... must be indexed fields. A search for a term in a ALIAS is equivalent to a search of this term in either FIELD1, FIELD2, ...

Normalization filters

<normalization field="FIELD">FILTER</normalization>

Applies a normalization filter on terms and phrases. The normalization filter is performed before expansion.

The normalization filter is applied for terms and phrases for the specified field or alias (FIELD). If the field is omitted, then the normalization filter is applied on all fields.

FILTER is the filter specification, it can take one of the following values:

  • case, lowercase, lower, case-folding: transliterate all characters to lowercase, perform case-insensitive search.
  • ascii, ascii-folding: transliterate all characters to ASCII characters, perform diacritics-insensitive search.
  • english, english-stemming: replace terms by their English stem.
  • french, french-stemming: replace terms by their French stem.

Several consecutive filters are specified by separating them with commas (,).

Field fragments

<field-fragments field="FIELD" fragments="FRAGMENTS" annotation="SENT" />

Specify which how to build the snippet fragments for the specified field or alias (FIELD).

FRAGMENTS is the fragment building method, it accepts one of the following values:

  • silent: never build a fragment for the specified field.
  • whole: build a single fragment with the whole contents of the field.
  • sentence: build fragments that respect sentence boundaries, in this case SENT specifies the token that indicates a sentence boundary.

Mandatory fields

<mandatory-field>FIELD</mandatory-field>

Always return a fragment from the specified field, even if it does not contain a search term. Mandatory fields fragments always contain the whole field contents (overrides field-fragments).

Expansions

<expansion type="TYPE" method="METHOD" max-depth="DEPTH" max-sub-paths="MAX" />

Specify the expansion parameters for TYPE entities.

METHOD is the expansion method, it accepts two values:

  • term: terms are expanded as entity canonical representations.
  • path: terms are expanded as concept paths, in this case DEPTH and MAX are used to limit the sub-concept listing in the explanation slot.

Facets

<global-facet
    name="NAME"
    field="FIELD"
    term-prefix="PREFIX"

    regexp="REGEXP"
    group="GROUP"
    capitalize="CAPS"
    upper-case="UPPER"
    lower-case="LOWER"

    query-type="QTYPE"
    query-field="QFIELD"
    label-query="QLABEL"

    cutoff="CUTOFF"
    max-facets="MAX"
    sort="SORT"
/>
<document-facet
    name="NAME"
    field="FIELD"
    term-prefix="PREFIX"

    regexp="REGEXP"
    group="GROUP"
    capitalize="CAPS"
    upper-case="UPPER"
    lower-case="LOWER"

    query-type="QTYPE"
    query-field="QFIELD"
    label-query="QLABEL"

    cutoff="CUTOFF"
    max-facets="MAX"
    sort="SORT"
/>

global-facet produces facets for the response set.

document-facet produces facets for each document in the response set.

name (mandatory)

Name of the facet. The name is displayed on screen.

field (mandatory)

Field or alias from which to extract facet terms.

term-prefix (mandatory)

Prefix of terms to include in facet.

Facet label properties (optional)

These properties control the facet label.

regexp and group

regexp specifies a regular expression. The facet term label is the capturing group specified by group. If group is omitted, then the label is the whole match. If the regular expression has no match in the term, then the term is used as is.

If regexp is omitted, the label is the canonical form of the expansion of the facet term. If the expansion yields no result, then the facet label is equal to the facet term.

capitalize (boolean)

Capitalize the term.

upper-case (boolean)

Turn all characters into upper case.

lower-case (boolean)

Turn all characters into lower case.

Query refinement properties

These properties control how queries are refined when the facet is activated.

query-type (optional)

Type of the query. This property accepts three values:

  • term: the facet term is queried as a single term.
  • prefix: the facet term is queried as a term prefix.
  • phrase: the facet is queried as a phrase.

If omitted, the default query type is phrase.

query-field (mandatory)

Field name or alias where the facet term is searched.

label-query (boolean)

Query with the facet label instead of the facet term. If omitted, then the refinement query uses the facet term.

Facet list properties

sort (optional)

Sort criterion for the facets. This property accepts two values:

  • term: number of occurrences of the facet term.
  • document: number of documents where the facet term occurs.

If omitted, the default sort criterion is term.

cutoff (number, optional)

Exclude facet terms with lower count than this property. The count is specified by sort. IF omitted the lowest count is 0.

max-facets (number, optional)

Only include the number of facet terms specified by this property. If omitted, the number of facet terms is unlimited.

Search counts (optional)

<search-counts
    facets="FACETS"
    snippets="SNIPPETS"
/>

facets (number)

Number of documents to scan for global facets. If omitted, then scan all documents in the response set: this is not recommended because it may slow down the response.

snippets (number)

Number of snippets to return. If omitted, then 10 snippets are produced.