scads: A package for using scATAC-seq data and GWAS sumstats to calculate disease cell score

scads(
  count_matrix,
  nTopics = 10,
  n_s = 1000,
  n_c = 8,
  baseline_method = "gc",
  bl_celltype = NULL,
  bl_celltype_peak_file = NULL,
  sumstats_dir,
  gwas_nsamps,
  gwas_trait,
  outdir,
  polyfun_code_dir,
  ldsc_code_dir,
  onekg_path,
  baseline_dir,
  frqfile_pref,
  hm3_snps,
  weights_pref,
  continuous_topic_annot = FALSE,
  n_cores_ldsc = NULL,
  chrs = 1:22,
  genome = "hg19",
  save_intermediates = TRUE,
  resume_from_step = NULL,
  verbose = TRUE,
  ...
)

Arguments

count_matrix

A peak-by-cell count matrix.

nTopics

The number of topics (clusters) to fit. Default is 10.

n_s

Number of simulations for fastTopics-DE.

n_c

Number of cores to use for parallelization.

baseline_method

Method for calculating baseline f_kj0 (Default is gc; other options: constant, average, estimate)

bl_celltype

Use only if baseline_method == "estimate", otherwise NULL

bl_celltype_peak_file

Use only if baseline_method == "estimate", otherwise NULL

sumstats_dir

Directory containing GWAS sumstats (file format: TRAIT.sumstats.txt.gz).

gwas_nsamps

Sample size of GWAS.

gwas_trait

GWAS trait name (e.g., "SIM", "IBD").

outdir

Directory where outputs are saved.

polyfun_code_dir

Directory to access polyFUN (LDSC) code scripts; must be python3-compatible.

ldsc_code_dir

Directory to access LDSC code scripts.

onekg_path

Prefix to 1000G plink files in hg19 (/LDSCORE/zenodo/1000G_EUR_Phase3_plink/1000G.EUR.QC.). (use hg38 version if input data is hg38).

baseline_dir

Prefix to 1000G baseline v2.2 annotation in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD."). (use hg38 version if input data is hg38).

frqfile_pref

Prefix to the 1000G .frq or .afreq files in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_frq/1000G.EUR.QC."). (use hg38 version if input data is hg38).

hm3_snps

Path to the HapMap3 no‐MHC SNP list in hg19 (e.g. "/LDSCORE/zenodo/hm3_no_MHC.list.txt").(use hg38 version if input data is hg38).

weights_pref

Path to the 1000G weights in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.").(use hg38 version if input data is hg38).

continuous_topic_annot

Logical. If TRUE, create continuous topic annotations (via get_continuous_annot_beds()) and run run_sldsc_cont(). If FALSE, use the binary annotation approach (get_annot_beds() and run_sldsc()). Default is FALSE.

n_cores_ldsc

Number of cores for parallel S-LDSC runs across topics. By default (NULL), uses min(detectCores(), nTopics, 5). Set to 1 to run topics sequentially.

chrs

Number of chromosomes for running S-LDSC (Default is chr1-22)

genome

Genome build (Default is hg19)

save_intermediates

Logical. If TRUE, saves intermediate results (fastTopics, BED dirs, cell scores) to outdir as RDS files. Default is TRUE.

resume_from_step

Integer (1-4). If specified, resumes the pipeline from this step, loading prior intermediate results from outdir. Default is NULL (run all steps).

verbose

Logical. If TRUE, prints progress messages. Default is TRUE.

...

Additional arguments passed to internal functions.

Value

A list containing the fitted topic model (out1) and the cell scores (cs_res) results.

Examples

if (FALSE) { # \dontrun{
result <- scads(count_matrix, nTopics = 15, sumstats_dir="/some/sumstats", gwas_nsamps=60000,
                gwas_trait="SIM", outdir="results/", 
                polyfun_code_dir = "/some/path/polyfun/", ldsc_code_dir = "/some/path/ldsc/",
                onekg_path="/some/path/1000G.EUR.QC.",
                baseline_dir="/some/path/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD.",
                frqfile_pref="/some/path/1000G_Phase3_frq/1000G.EUR.QC.",
                hm3_snps="/some/path/hm3_no_MHC.list.txt", 
                weights_pref="/some/path/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.")

} # }