Run S-LDSC Pipeline: Calculate disease cell score using hg19 BED and Baseline v2.2

This function runs the S-LDSC pipeline using PolyFun scripts for:

Munging summary statistics ("munge_polyfun_sumstats.py"),
Generating annotation files from a user-supplied BED (hg38) via "make_annot.py",
Computing LD scores with "ldsc.py –l2",
Performing final heritability estimation ("ldsc.py –h2") by combining the custom annotation with baseline v2.2 annotations.

run_sldsc(
  chrs,
  polyfun_path,
  ldsc_path,
  sumstats_path,
  n,
  trait,
  onekg_path,
  bed_dir,
  baseline_dir,
  frqfile_pref,
  hm3_snps,
  weights_pref,
  out_dir
)

Arguments

chrs

Number of chromosomes for running S-LDSC (Default is 1:22).

sumstats_path

Directory containing the cleaned GWAS summary statistics file (format: <trait>_sumstats.txt.gz).

n

Sample size of GWAS.

trait

GWAS trait name (e.g., "SIM", "IBD").

onekg_path

Prefix to 1000G plink files in hg19 (e.g., "/LDSCORE/zenodo/1000G_EUR_Phase3_plink/1000G.EUR.QC.").

bed_dir

Directory containing the user-supplied BED file(s) in hg19. Only the first .bed found is used.

baseline_dir

Prefix to 1000G baseline v2.2 annotation in hg19 (e.g., "/LDSCORE/zenodo/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD.").

frqfile_pref

Prefix to the 1000G .frq or .afreq files in hg19 (e.g., "/LDSCORE/zenodo/1000G_Phase3_frq/1000G.EUR.QC.").

hm3_snps

Path to the HapMap3 no-MHC SNP list in hg19 (e.g., "/LDSCORE/zenodo/hm3_no_MHC.list.txt").

weights_pref

Path to the 1000G weights in hg19 (e.g., "/LDSCORE/zenodo/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.").

out_dir

Directory where outputs are saved. The function creates annotations/<trait> and results subdirectories.

#' @details Step-by-step:

Munge summary stats <trait>_sumstats.txt.gz into a parquet with munge_polyfun_sumstats.py.
For each chromosome 1..22:
- Create annotation (make_annot.py) from the user .bed plus the .bim for that chromosome.
- Compute LD scores (ldsc.py --l2) for the annotation.
Finally, run ldsc.py --h2 referencing both the newly created annotation (out_dir/annotations/<trait>/<trait>.<chr>) and baseline_dir, providing frqfile_pref for the allele frequency files and the hm3_snps for SNP filtering.

polyfun_code_dir

Directory to access polyFUN code scripts (e.g., munge_polyfun_sumstats.py); must be python3-compatible.

ldsc_code_dir

Directory to access LDSC code scripts (e.g., make_annot.py, ldsc.py).

Value

No object is returned. All intermediate files (.annot.gz, .ldscore.gz) and final LDSC results (.log, .results) are written to out_dir.

Examples

if (FALSE) { # \dontrun{
run_sldsc(chrs = 1:22,
          polyfun_code_dir = "/path/polyfun/",
          ldsc_code_dir = "/path/ldsc/",
          sumstats_path = "/path/sumstats/",
          n = 60000, trait = "SIM",
          onekg_path = "/LDSCORE/zenodo/1000G_EUR_Phase3_plink/1000G.EUR.QC.",
          bed_dir = "/path/my_bedfiles/",
          baseline_dir = "/LDSCORE/zenodo/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD.",
          frqfile_pref = "/LDSCORE/zenodo/1000G_Phase3_frq/1000G.EUR.QC.",
          hm3_snps = "/LDSCORE/zenodo/hm3_no_MHC.list.txt",
          weights_pref = "/LDSCORE/zenodo/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.",
          out_dir = "results/")
} # }