Explanation of the Log Matching Statistics

Recording detailed match statistics is available for both the extract and tag subcommands. The logs of the two subcommands are mostly identical, with the exception of the paired_end_reads_statistics section, which is only present in the extract subcommand log after processing paired-end reads, and an additional field tag in the JSON log of the tag subcommand. The examples on this page are based on the extract subcommand.

There are two types of log output format available: plain text and JSON. An example of each is provided below. Except for minor differences in formatting, the information contained in both log formats is the same. Plain text and JSON logs are written to files, provided with the -l and -j arguments, respectively. If only the flag are present, the log will be written to the terminal (stdout).

When processing paired-end reads, additional statistics are collected. If a single file is processed, the number of extracted records equals the number of distinct records with a hit. In paired-end reads, however, a match in one read of a pair will extract both of them.

Plain Text Log

The log in plain text format is meant to be human-readable. Lines starting with a # contain meta-information and matching statistics. The log also contains a tab-delimited table with an entry for each pattern found in a record, listing the file name, record ID, query sequence and zero-based position of the match. These lines are not prefixed with a # to facilitate inspection and further analysis, e.g. using grep -v '^#' to extract a tab-delimited table.

After the table, patterns and their number of occurences are listed. This is followed by summary statistics.

#MerKurio extract log
#2024-09-23T00:13:17+02:00[Europe/Vienna]
#Running MerKurio version 1.0.0
#Command line: merkurio extract -i reads_1.fastq -2 reads_2.fastq -s ATCG -l -r
#Searching for 2 patterns
#
#File           Record       Pattern  Position (zero-based)
reads_1.fastq   record 1/1   ATCG     0
reads_2.fastq   record 1/2   CGAT     147
reads_1.fastq   record 57/1  ATCG     13
#
#Number of patterns found: 2/2 (100.00 %)
#Pattern     Count
#ATCG        2
#CGAT        1
#
#Total number of records searched: 10000
#Total number of characters searched: 150000
#Total number of hits: 3
#Number of distinct records with a hit: 3

When processing paired-end reads, the log will contain an additional block of statistics:

#
#Total number of hits in file 1: 2
#Total number of hits in file 2: 1
#Number of distinct records with a hit in file 1: 2
#Number of distinct records with a hit in file 2: 1
#Total number of extracted records: 4

Keyboard shortcuts

MerKurio Manual

Explanation of the Log Matching Statistics

Plain Text Log