scads.Rdscads: A package for using scATAC-seq data and GWAS sumstats to calculate disease cell score
scads(
count_matrix,
nTopics = 10,
n_s = 1000,
n_c = 8,
baseline_method = "gc",
bl_celltype = NULL,
bl_celltype_peak_file = NULL,
sumstats_dir,
gwas_nsamps,
gwas_trait,
outdir,
polyfun_code_dir,
ldsc_code_dir,
onekg_path,
baseline_dir,
frqfile_pref,
hm3_snps,
weights_pref,
continuous_topic_annot = FALSE,
n_cores_ldsc = NULL,
chrs = 1:22,
genome = "hg19",
save_intermediates = TRUE,
resume_from_step = NULL,
verbose = TRUE,
...
)A peak-by-cell count matrix.
The number of topics (clusters) to fit. Default is 10.
Number of simulations for fastTopics-DE.
Number of cores to use for parallelization.
Method for calculating baseline f_kj0 (Default is gc; other options: constant, average, estimate)
Use only if baseline_method == "estimate", otherwise NULL
Use only if baseline_method == "estimate", otherwise NULL
Directory containing GWAS sumstats (file format: TRAIT.sumstats.txt.gz).
Sample size of GWAS.
GWAS trait name (e.g., "SIM", "IBD").
Directory where outputs are saved.
Directory to access polyFUN (LDSC) code scripts; must be python3-compatible.
Directory to access LDSC code scripts.
Prefix to 1000G plink files in hg19 (/LDSCORE/zenodo/1000G_EUR_Phase3_plink/1000G.EUR.QC.). (use hg38 version if input data is hg38).
Prefix to 1000G baseline v2.2 annotation in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD."). (use hg38 version if input data is hg38).
Prefix to the 1000G .frq or .afreq files in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_frq/1000G.EUR.QC."). (use hg38 version if input data is hg38).
Path to the HapMap3 no‐MHC SNP list in hg19 (e.g. "/LDSCORE/zenodo/hm3_no_MHC.list.txt").(use hg38 version if input data is hg38).
Path to the 1000G weights in hg19 (e.g. "/LDSCORE/zenodo/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.").(use hg38 version if input data is hg38).
Logical. If TRUE, create continuous topic annotations
(via get_continuous_annot_beds()) and run run_sldsc_cont().
If FALSE, use the binary annotation approach (get_annot_beds() and run_sldsc()).
Default is FALSE.
Number of cores for parallel S-LDSC runs across topics.
By default (NULL), uses min(detectCores(), nTopics, 5).
Set to 1 to run topics sequentially.
Number of chromosomes for running S-LDSC (Default is chr1-22)
Genome build (Default is hg19)
Logical. If TRUE, saves intermediate results (fastTopics, BED dirs,
cell scores) to outdir as RDS files. Default is TRUE.
Integer (1-4). If specified, resumes the pipeline from this step,
loading prior intermediate results from outdir. Default is NULL (run all steps).
Logical. If TRUE, prints progress messages. Default is TRUE.
Additional arguments passed to internal functions.
A list containing the fitted topic model (out1) and the cell scores (cs_res) results.
if (FALSE) { # \dontrun{
result <- scads(count_matrix, nTopics = 15, sumstats_dir="/some/sumstats", gwas_nsamps=60000,
gwas_trait="SIM", outdir="results/",
polyfun_code_dir = "/some/path/polyfun/", ldsc_code_dir = "/some/path/ldsc/",
onekg_path="/some/path/1000G.EUR.QC.",
baseline_dir="/some/path/1000G_Phase3_baselineLD_v2.2_ldscores/baselineLD.",
frqfile_pref="/some/path/1000G_Phase3_frq/1000G.EUR.QC.",
hm3_snps="/some/path/hm3_no_MHC.list.txt",
weights_pref="/some/path/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.")
} # }