scads: A package for using scATAC-seq data and GWAS sumstats to calculate disease cell score

scads(
  count_matrix,
  nTopics = 10,
  n_s = 1000,
  n_c = 8,
  baseline_method = "gc",
  bl_celltype = NULL,
  bl_celltype_peak_file = NULL,
  sumstats_dir,
  gwas_nsamps,
  gwas_trait,
  outdir,
  polyfun_code_dir,
  ldsc_code_dir,
  onekg_path,
  baseline_dir,
  frqfile_pref,
  hm3_snps,
  weights_pref,
  continuous_topic_annot = FALSE,
  n_cores_ldsc = NULL,
  chrs = 1:22,
  genome = "hg19",
  save_intermediates = TRUE,
  resume_from_step = NULL,
  verbose = TRUE,
  ...
)

Arguments

count_matrix: A peak-by-cell count matrix.
nTopics: The number of topics (clusters) to fit. Default is 10.
n_s: Number of simulations for fastTopics-DE.
n_c: Number of cores to use for parallelization.
baseline_method: Method for calculating baseline f_kj0 (Default is gc; other options: constant, average, estimate)
bl_celltype: Use only if baseline_method == "estimate", otherwise NULL
bl_celltype_peak_file: Use only if baseline_method == "estimate", otherwise NULL
sumstats_dir: Directory containing GWAS sumstats (file format: TRAIT.sumstats.txt.gz).
gwas_nsamps: Sample size of GWAS.
gwas_trait: GWAS trait name (e.g., "SIM", "IBD").
outdir: Directory where outputs are saved.
polyfun_code_dir: Directory to access polyFUN (LDSC) code scripts; must be python3-compatible.
ldsc_code_dir: Directory to access LDSC code scripts.
onekg_path: Prefix to 1000G plink files in hg19 (/LDSCORE/zenodo/1000G_EUR_Phase3_plink/1000G.EUR.QC.). (use hg38 version if input data is hg38).
baseline_dir: Prefix to 1000G baseline v2.2 annotation in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD."). (use hg38 version if input data is hg38).
frqfile_pref: Prefix to the 1000G .frq or .afreq files in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_frq/1000G.EUR.QC."). (use hg38 version if input data is hg38).
hm3_snps: Path to the HapMap3 no‐MHC SNP list in hg19 (e.g. "/LDSCORE/zenodo/hm3_no_MHC.list.txt").(use hg38 version if input data is hg38).
weights_pref: Path to the 1000G weights in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.").(use hg38 version if input data is hg38).
continuous_topic_annot: Logical. If TRUE, create continuous topic annotations (via get_continuous_annot_beds()) and run run_sldsc_cont(). If FALSE, use the binary annotation approach (get_annot_beds() and run_sldsc()). Default is FALSE.
n_cores_ldsc: Number of cores for parallel S-LDSC runs across topics. By default (NULL), uses min(detectCores(), nTopics, 5). Set to 1 to run topics sequentially.
chrs: Number of chromosomes for running S-LDSC (Default is chr1-22)
genome: Genome build (Default is hg19)
save_intermediates: Logical. If TRUE, saves intermediate results (fastTopics, BED dirs, cell scores) to outdir as RDS files. Default is TRUE.
resume_from_step: Integer (1-4). If specified, resumes the pipeline from this step, loading prior intermediate results from outdir. Default is NULL (run all steps).
verbose: Logical. If TRUE, prints progress messages. Default is TRUE.
...: Additional arguments passed to internal functions.

Value

A list containing the fitted topic model (out1) and the cell scores (cs_res) results.

Examples

if (FALSE) { # \dontrun{
result <- scads(count_matrix, nTopics = 15, sumstats_dir="/some/sumstats", gwas_nsamps=60000,
                gwas_trait="SIM", outdir="results/", 
                polyfun_code_dir = "/some/path/polyfun/", ldsc_code_dir = "/some/path/ldsc/",
                onekg_path="/some/path/1000G.EUR.QC.",
                baseline_dir="/some/path/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD.",
                frqfile_pref="/some/path/1000G_Phase3_frq/1000G.EUR.QC.",
                hm3_snps="/some/path/hm3_no_MHC.list.txt", 
                weights_pref="/some/path/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.")

} # }