Rules specific to Hastings

create_cov_excel

Python script that creates a gene coverage summary excel file

🐍 Rule

rule create_cov_excel:
    input:
        bedfile=config["reference"]["coverage_bed"],
        cov_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
        cov_thresh="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
        duplication_file="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
        genepanels=config["reference"]["genepanels"],
        low_cov="qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt",
        summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
    output:
        out=temp("qc/create_cov_excel/{sample}_{type}.coverage.xlsx"),
    log:
        "qc/create_cov_excel/{sample}_{type}.log",
    benchmark:
        repeat(
            "qc/create_cov_excel/create_cov_excel_{sample}_{type}.benchmark.tsv",
            config.get("create_cov_excel", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("create_cov_excel", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("create_cov_excel", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("create_cov_excel", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("create_cov_excel", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("create_cov_excel", {}).get("container", config["default_container"])
    message:
        "{rule}: Get coverage analysis per gene into excel, with tab for each panel and one for all genes in bed"
    script:
        "../scripts/create_excel.py"

↔ input / output files

Rule parameters Key Value Description
input bedfile config["reference"]["coverage_bed"] bed file with the regions used for the coverage analysis
cov_regions "qc/mosdepth_bed/{sample}_{type}.regions.bed.gz" bed file with coverage for the regions from mosdepth_bed
cov_thresh "qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz" threshold file from mosdepth_bed
duplication_file "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt" text file from picard_collect_duplication_metrics with duplication metrics
genepanels config["reference"]["genepanels"] list of which genepanels that should be used
low_cov "qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt" text file with low coverage regions from mosdepth_bed
summary "qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt" text file with the summary from mosdepth_bed
output out "qc/create_cov_excel/{sample}_{type}.coverage.xlsx" excel file with tabs for each genepanel with coverage analysis for each gene

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
covLimits string for which coverage depth should the percentage of bases with that coverage or above be calculated, default value "10 20 30"

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

[tsv2vcf]

Convert exomedepth calls in tsv format to VCF

🐍 Rule

rule tsv2vcf:
    input:
        tsv="cnv_sv/exomedepth_call/{sample}_{type}.txt",
        ref=config["reference"]["fasta"],
    output:
        vcf="cnv_sv/exomedepth_call/{sample}_{type}.vcf",
    params:
        extra=config.get("tsv2vcf", {}).get("extra", ""),
    log:
        "cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.log",
    benchmark:
        repeat(
            "cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.benchmark.tsv", config.get("tsv2vcf", {}).get("benchmark_repeats", 1)
        )
    threads: config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("tsv2vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("tsv2vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("tsv2vcf", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("tsv2vcf", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("tsv2vcf", {}).get("container", config["default_container"])
    message:
        "{rule}: convert {input.tsv} to VCF"
    script:
        "../scripts/tsv2vcf.sh"

↔ input / output files

Rule parameters Key Value Description
input tsv "cnv_sv/exomedepth_call/{sample}_{type}.txt" Exomdepth calls in csv format
ref config["reference"]["fasta"] reference geneome fasta file
output vcf "cnv_sv/exomedepth_call/{sample}_{type}.vcf" Exomedepth calls in compressed VCF

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time