From NorthShore Analytics
Jump to: navigation, search
  • Write readable code:

    • create a high level function that calls analytical and auxiliary functions as needed;

    • reference packages only where the use of such packages is required at the lowest level;

    • indent your code

    • use spaces around operators, after commas, after opening and before closing braces and parentheses;

    • wrap long lines at column 80 (remember the punch cards? I’m only partially joking here...);

    • use knitr-style comments;

    • in a function, first list the required, then the optional parameters.

  • Write meaningful comments:

    • in a function

      • explain what the function is for;

      • describe input and output arguments;

      • list any specific parameter values that present special cases.

      The mode function in the NS.CA.statUtils package is an example:

              ## ---- NS.CA.mode ---- 
              ## Mode(s) of the distribution
              ## Usage
              ###  NS.CA.mode(x, fun=function(y) {y})
              ## Arguments
              ###  x - matrix or data frame containing the distribution(s) (convert to matrix if list)
              ###  fun - function determining which mode to select in the multimodal case
              NS.CA.mode <- function(x, fun=function(y) {y}) {
                ux <- unique(x)
                t<-tabulate(match(x, ux))
                fun(ux[which(t == max(t))])
    • in a loop

      • mark nested long loops if necessary;

      • document “forks” as appropriate

    • in an if-then-else structure

      • explain what the logic means when necessary;

      • mark matching braces as needed.

  • Favor sapply, lapply and ddply over for;

  • Above all, DO NOT copy and paste! If a piece of code is used more than once, turn it into a function instead.

  • Avoid rbind wherever possible since it can be slow.

  • Reduce early and often, e.g.,

        # aggregated charges and costs 
          totalChgAll[[aggregator]] <- Reduce( function(...) merge(..., by=totalGroupBy, all=T, suffixes=totSuff), totChgCostsNotNull)

Accepted naming conventions are listed in Table 6.1.