Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Useful jq Commands

jq is a lightweight and flexible command-line JSON processing language.

A work-in-progress list of jq commands to analyse the JSON logs of MerKurio on the command line. Replace log.json with the path to your JSON file.

Quality metrics

For paired-end reads:

jq -r '. | "Number of reads with matches: \(.matching_statistics.number_of_distinct_records_with_a_hit)
Number of extracted reads: \(.paired_end_reads_statistics.number_of_extracted_records)
Total number of reads: \(.matching_statistics.number_of_records_searched)
Match rate: \((.matching_statistics.number_of_distinct_records_with_a_hit/.matching_statistics.number_of_records_searched)*100*1000|round/1000)/100% of reads contained target k-mers
Extraction rate: \((.paired_end_reads_statistics.number_of_extracted_records/.matching_statistics.number_of_records_searched)*100*1000|round/1000)/100% of reads were extracted"' log.json

Summary reports

For single files:

jq -r '
"--- MerKurio Extraction Summary ---",
"Run: " + .meta_information.timestamp,
"Algorithm: " + .meta_information."search-algorithm",
"",
"SEARCH RESULTS:",
"• Patterns searched: " + (.matching_statistics.number_of_patterns_searched | tostring),
"• Patterns found: " + (.matching_statistics.number_of_patterns_found | tostring) + " (" + ((.matching_statistics.number_of_patterns_found/.matching_statistics.number_of_patterns_searched)*100*100 | round/100 | tostring) + "%)",
"• Total matches: " + (.matching_statistics.number_of_matches | tostring),
"• Reads with hits: " + (.matching_statistics.number_of_distinct_records_with_a_hit | tostring) + "/" + (.matching_statistics.number_of_records_searched | tostring),
""
' log.json

For paired-end reads:

jq -r '
"--- MerKurio Extraction Summary ---",
"Run: " + .meta_information.timestamp,
"Algorithm: " + .meta_information."search-algorithm",
"",
"SEARCH RESULTS:",
"• Patterns searched: " + (.matching_statistics.number_of_patterns_searched | tostring),
"• Patterns found: " + (.matching_statistics.number_of_patterns_found | tostring) + " (" + ((.matching_statistics.number_of_patterns_found/.matching_statistics.number_of_patterns_searched)*100*100 | round/100 | tostring) + "%)",
"• Total matches: " + (.matching_statistics.number_of_matches | tostring),
"• Reads with hits: " + (.matching_statistics.number_of_distinct_records_with_a_hit | tostring) + "/" + (.matching_statistics.number_of_records_searched | tostring),
"",
"PAIRED-END DETAILS:",
"• R1 hits: " + (.paired_end_reads_statistics.number_of_hits_in_file_1 | tostring) + " in " + (.paired_end_reads_statistics.number_of_distinct_records_with_a_hit_in_file_1 | tostring) + " reads",
"• R2 hits: " + (.paired_end_reads_statistics.number_of_hits_in_file_2 | tostring) + " in " + (.paired_end_reads_statistics.number_of_distinct_records_with_a_hit_in_file_2 | tostring) + " reads",
"• Extracted pairs: " + (.paired_end_reads_statistics.number_of_extracted_records | tostring),
""
' log.json

Data export

K-mers with at least one match to a FASTA file, with the match count in the sequence header:

jq -r '
  .pattern_hit_counts
  | to_entries
  | map(select(.value > 0))
  | to_entries
  | map(">kmer"+((.key + 1)|tostring)+"|count="+(.value.value|tostring)+"\n"+.value.key)
  | .[]
' log.json > matched_kmers.fasta

Pattern hit counts to TSV:

jq -r '.pattern_hit_counts | to_entries | ["Pattern", "Count"], (.[] | [.key, .value]) | @tsv' log.json > kmer_counts.tsv

Pattern positions to TSV:

jq -r '.matching_records | ["Pattern", "Position", "File", "ReadID"], (.[] | [.pattern, .position, .file, .record_id]) | @tsv' log.json > kmer_positions.tsv

Multiple hits of the same patterns

Producing compact output (empty if nothing is found):

jq -r '
.matching_records
| group_by(.pattern)
| .[]
| . as $pattern_group
| ($pattern_group | group_by(.record_id) | map(select(length > 1))) as $multi_hits
| if ($multi_hits | length) > 0 then
    "   Pattern: " + $pattern_group[0].pattern,
    "   Reads with multiple hits:",
    ($multi_hits[] | "   • " + .[0].record_id + " (" + (length | tostring) + " hits at positions: " + ([.[].position | tonumber] | sort | map(tostring) | join(", ")) + ")")
  else empty end
' log.json

Display information line-wise:

jq -r '
.matching_records
| group_by(.pattern)
| .[]
| . as $pattern_group
| ($pattern_group | group_by(.record_id) | map(select(length > 1))) as $multi_hits
| if ($multi_hits | length) > 0 then
    "",
    "=" * 60,
    "PATTERN: " + $pattern_group[0].pattern,
    "=" * 60,
    ($multi_hits[] |
      "Read: " + .[0].record_id + " (File: " + .[0].file + ")",
      "Hits: " + (length | tostring),
      (. | sort_by(.position | tonumber) | .[] | "  Position " + .position),
      ""
    )
  else empty end
' log.json

Less detailed output:

jq -r '
.matching_records
| group_by(.pattern)
| map({
    pattern: .[0].pattern,
    multi_hit_reads: (. | group_by(.record_id) | map(select(length > 1)) | length),
    total_multi_hits: (. | group_by(.record_id) | map(select(length > 1)) | map(length) | add // 0)
  })
| map(select(.multi_hit_reads > 0))
| if length > 0 then
    .[] | .pattern[0:30] + "... : " + (.multi_hit_reads | tostring) + " read(s) with " + (.total_multi_hits | tostring) + " total multi-hits"
  else
    "No reads found with multiple hits of the same pattern"
  end
' log.json