Rules specific to Hastings
create_cov_excel
Python script that creates a gene coverage summary excel file
Rule
rule create_cov_excel:
input:
bedfile=config["reference"]["coverage_bed"],
cov_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
cov_thresh="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
duplication_file="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
genepanels=config["reference"]["genepanels"],
low_cov="qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt",
summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
output:
out=temp("qc/create_cov_excel/{sample}_{type}.coverage.xlsx"),
log:
"qc/create_cov_excel/{sample}_{type}.log",
benchmark:
repeat(
"qc/create_cov_excel/create_cov_excel_{sample}_{type}.benchmark.tsv",
config.get("create_cov_excel", {}).get("benchmark_repeats", 1),
)
threads: config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("create_cov_excel", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("create_cov_excel", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("create_cov_excel", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("create_cov_excel", {}).get("time", config["default_resources"]["time"]),
container:
config.get("create_cov_excel", {}).get("container", config["default_container"])
message:
"{rule}: Get coverage analysis per gene into excel, with tab for each panel and one for all genes in bed"
script:
"../scripts/create_excel.py"
| Rule parameters |
Key |
Value |
Description |
| input |
bedfile |
config["reference"]["coverage_bed"] |
bed file with the regions used for the coverage analysis |
| cov_regions |
"qc/mosdepth_bed/{sample}_{type}.regions.bed.gz" |
bed file with coverage for the regions from mosdepth_bed |
| cov_thresh |
"qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz" |
threshold file from mosdepth_bed |
| duplication_file |
"qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt" |
text file from picard_collect_duplication_metrics with duplication metrics |
| genepanels |
config["reference"]["genepanels"] |
list of which genepanels that should be used |
| low_cov |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt" |
text file with low coverage regions from mosdepth_bed |
| summary |
"qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt" |
text file with the summary from mosdepth_bed |
| output |
out |
"qc/create_cov_excel/{sample}_{type}.coverage.xlsx" |
excel file with tabs for each genepanel with coverage analysis for each gene |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| covLimits |
string |
for which coverage depth should the percentage of bases with that coverage or above be calculated, default value "10 20 30" |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
[tsv2vcf]
Convert exomedepth calls in tsv format to VCF
Rule
rule tsv2vcf:
input:
tsv="cnv_sv/exomedepth_call/{sample}_{type}.txt",
ref=config["reference"]["fasta"],
output:
vcf="cnv_sv/exomedepth_call/{sample}_{type}.vcf",
params:
extra=config.get("tsv2vcf", {}).get("extra", ""),
log:
"cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.log",
benchmark:
repeat(
"cnv_sv/exomedepth_call/{sample}_{type}.vcf.gz.benchmark.tsv", config.get("tsv2vcf", {}).get("benchmark_repeats", 1)
)
threads: config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("tsv2vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("tsv2vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("tsv2vcf", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("tsv2vcf", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("tsv2vcf", {}).get("time", config["default_resources"]["time"]),
container:
config.get("tsv2vcf", {}).get("container", config["default_container"])
message:
"{rule}: convert {input.tsv} to VCF"
script:
"../scripts/tsv2vcf.sh"
| Rule parameters |
Key |
Value |
Description |
| input |
tsv |
"cnv_sv/exomedepth_call/{sample}_{type}.txt" |
Exomdepth calls in csv format |
| ref |
config["reference"]["fasta"] |
reference geneome fasta file |
| output |
vcf |
"cnv_sv/exomedepth_call/{sample}_{type}.vcf" |
Exomedepth calls in compressed VCF |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |