Bayesian Cell CountsR exampleWe have provided an example of our analysis that is hopefully easy to translate to other examples. This is in: github.com/BayesianCellCounts/R_example This contains an example data file, data.csv. This file is based on the one used in the case study 1 but has been preprocessed, in case study 1 some of the processing is performed by the R script, here, to make the R script more generic, these steps are done in advance, the details are given in data.README. The sample data includes 23 brain regions for 40 animals with 10 animals in each of four cohorts. The file data.csv contains four columns:
Our dataset does not have a count for every (animal,region) pair, there is missing data and this is allowed by the model. However, it does not allow more than one count for each (animal,region); the preprocessing we have done includes a set which sums over repeated counts and duplicates will cause an error. To run the code use:
This will sample the posterior using the model stored as model/model_hs.stan. This will produce four chains stored as fits/fit_hs.rds.
The chains contain multiple samples from the posterior which allows probabalistics statements to be made about the model. Typically this means plotting graphs and most often these are `contrast' graphs, as in Figure 4B/4C in Dimmock et al. (2025). This graph is produced by the plot_results.r and running:
will give options; an example of this command for our sample data is
This computes the contrast between two of our
conditions, Sham_Familiar
and Sham_Novel and orders the regions in the graph by
the absolute value of the frequentist mean. This plotting only works
if the chains are available; the graph is stored in results/. As described under --help there is an option for changing the format of the output file, this probably works but we have only actually tests png and pdf.
To produce the diagnostic plots use plot_diagnostics.r, this has a help file accessed as
and adds the graphs in the diagnostics folder.
Stan modelThe Stan model contains the priors. The business part is the line the exponential of gamma[i] is
the mean of the Poisson process that generates the count for
sample i. The paper makes more of an attempt to
describe this well and at length, but roughly and in short,
the theta[group_idx[i]] is the value
that gamma[i] is expected to take because of the
cohort and region that sample comes from. The rest of the expression
deals with variability around this value and includes the machinery of
the horseshoe prior. The prior for theta is provided by
this implies that our rough a priori guess for the mean of the logged cell count values is 5, plus or minus 2.
If the mean of your log cell counts differ significantly from this, it might be useful to
change these values before running the model on your data.
|