This page documents how you can use Zettair to query an inverted index. There are two executables that can be used for querying indexes build by Zettair:
These can be used with either zet or zet_trec to change the similarity metric used by Zettair.
Use the Okapi BM25 metric.
Set the k1 parameter for the Okapi BM25 metric to the specified floating point value.
Set the b parameter for the Okapi BM25 metric to the specified floating point value.
Set the k3 parameter for the Okapi BM25 metric to the specified floating point value.
Use the pivoted cosine metric, with the pivot provided as a floating point value.
Use the cosine metric.
Use Dave Hawking's adaptation of the Okapi BM25 metric, with the alpha value provided as a floating point number.
Use Anh and Moffat's impact-ordered evaluation, including separate metric. --anh-impact must have been used when building the index in order to employ impact-ordered query evaluation.
Use the Dirichlet-smoothed, query-likelihood language modelling metric with mu value given as an unsigned integer.
Usage: zet [query1 ... queryN]
Index querying options:
Give the name of the index to use. If no name is given then 'index' is used by default. The prefix may contain directory path elements.
Sets the maximum number of results returned in response to each query. The default is 20.
Instructs Zettair to read queries from the given file, instead of from stdin.
Uses the words contained in the given filename as stop words (not evaluated) during querying. If no filename is given, a default stop list is loaded.
Instructs Zettair to use approximately 500MB of memory while querying. The default memory usage should be around 20MB.
Sets the number of results to skip for each query. This can be useful in obtaining more results for a query without repeating those already obtained. The default is 0.
Choose the type of document summarisation to perform. none means do not provide document summaries with the query results; this is the default. The other alternatives specify how to highlight the search terms in the summary. plain specifies not to highlight the search terms. capitalise highlights the search terms by capitalising them. tag highlights them by surrounding them with <b> tags.
For searching, the given queries (query1 ... queryN) are Google-like queries that are used to search the index. Queries consist of keywords and phrases (represented "like this") optionally separated by the operators AND and OR (operators MUST be capitalised). The default operator is OR. Search is case-insensitive, except for recognition of AND and OR. Stopping and stemming are not performed. All results are ranked by relevance. Note that the Google operator '-' and modifiers are not currently supported.
If no queries are found in the command line, Zettair will start in interactive mode. In this mode queries are read from standard input and executed. Interactive mode exits once it can no longer read from standard input. You can cause it to exit by entering the end-of-file control character, typically control-d.
print version information
print a help message
Sample Command Line:
Example queries:
Note that if you are entering queries at the command line, you will probably have to escape (using the backslash or other means) double quotes for phrases. e.g.
Usage: zet_trec index
TREC querying options:
Add TREC topic_file to list of topic files to process.
Add files listed in file to list of topic files to process
Output run_id as id for this evaluation (run_id is a text field in trec_eval output)
Number of results to output per query.
Use topic titles in queries (this is the default if none of -t, -a or -d are specified).
Use topic descriptions in queries.
Use topic narratives in queries.
Print queries to stderr as they are constructed from the topic file and resolved.
Print the total time taken in querying to stderr after all topics have been resolved. The time printed excludes index loading time.
Insert dummy entries for topics that have no answers in the results set. This has been required for TREC terabyte submissions in the past.
Don't stop if a query cannot be constructed from a topic. Useful when running large, noisy query logs.
Uses the words contained in the given filename as stop words (not evaluated) during querying. If no filename is given, a default stop list is loaded.
Instructs Zettair to use approximately 500MB of memory while querying. The default memory usage should be around 20MB.
Instead of printing search results in TREC format, the results are evaluated against the given Qrels file, in TREC Qrel format, and trec_eval-like output is produced.
The name of the index that is queried using the TREC topic files.
Print help message
Print version information
Sample Command Line:
The file query.log can then be evaluated with trec_eval against pre-prepared relevance judgements.