Features

KCFTOOLS provides a comprehensive set of features for genomic analysis, leveraging k-mer counting techniques to facilitate various tasks. Below is an overview of the key features:

getVariations

  • Identifies variations between a reference genome and a query sample by counting k-mer presence/absence.
  • Supports different windowing strategies (fixed-length, gene models, or transcript features).
  • Outputs a KCF file containing the variations with detailed identity scores.
  • Allows memory-efficient processing with options for multi-threading.

findIBS

  • Identifies Identity-by-State (IBS) windows or Variable Regions (VR) in a KCF file.
  • Calculates identity scores for each window and filters them based on user-defined thresholds.
  • Merges consequent windows to create larger regions of identical/variable regions.
  • Outputs a summary file with detailed information about the IBS/VR windows.

cohort

  • Combines multiple KCF files (sample-wise) into a single cohort file.
  • Facilitates cohort-level analyses by aggregating variations across multiple samples.
  • Maintains the number of windows and reference sequence information for each sample.

kcf2gt

  • Converts a KCF file into a genotype matrix format.
  • Outputs a matrix where rows represent samples and columns represent allele codes (0, 1, 2, -1).
  • Outputs a map file that maps the numeric number to their respective reference contig/chromosome name.
  • The output files are suitable for further analyses in tools like PLINK, Tassel or GAPIT
  • Supports options for filtering based on allele frequency and minimum missing data.

kcf2tsv

  • Converts a KCF file into a tab-separated values (TSV) format.
  • Outputs a file with detailed information about each window, including start and end positions, total k-mers, observed k-mers, variations and kmer_distance (identical to IBSpy output).

getAttributes

  • Extracts attributes from a KCF file and outputs them in a tab-separated format.
  • Helps in retrieving specific information about the variations, such as k-mer counts and identity scores.
  • This data could be used effectively for plotting and visualization of variation data.

splitKCF

  • Splits a KCF file into multiple smaller files based on chromosome or contig.
  • Facilitates parallel processing of large KCF files by breaking them down into manageable chunks.
  • Maintains the integrity of the KCF file format while allowing for efficient data handling.

increaseWindow

  • Increases the window size of a KCF file by a specified factor.
  • Useful for adjusting the resolution of the variations detected in the KCF file.
  • Maintains the original KCF file structure while expanding the analysis window.
  • Supports options for adjusting the effective length and identity score calculations based on the new window size.
  • Allows for re-evaluation of variations with the new window size, potentially avoiding the need to re-run the getVariations command.
  • Outputs a new KCF file with the adjusted window size and recalculated identity scores.

scoreRecalc

  • Recalculates identity scores in a KCF file based on updated weights.
  • Allows for dynamic adjustment of scoring criteria without re-running the entire variation detection process.
  • Converts a KCF file into PLINK format files (.ped and .map).
  • Facilitates the use of KCF data in PLINK for further genetic analyses.
  • Supports options for filtering based on allele frequency and missing data.
  • Outputs files that are compatible with PLINK