Coder Social home page Coder Social logo

sas-codebook's Introduction

sas-codebook

This repo contains SASยฎ code for producing a PDF summary of each variable in a dataset. Each variable summary is confined to a 1" tall strip, allowing for 9 variables to be summarized on a page.

Metadata is provided for each variable in bold text. This metadata includes the variable's name, label, type, and (if applicable) format.

Below the metadata are the number of non-missing values, the number of missing values, and the number of unique values.

Continuous variables are summarized graphically using a combination histogram/boxplot. Continuous variables are also summarized tabularly using the mean, standard deviation, min, max, median, and quartiles.

sas-codebook continuous example

Categorical variables are summarized graphically using horizontal bar charts (or dot plots). Categorical variables are also summarized tabularly using counts and percents.

sas-codebook categorical example

For more details about the SAS code see the Wiki.

sas-codebook's People

Contributors

qspencer avatar srosanba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sas-codebook's Issues

more metadata [3]

Consider adding more high-level information at the top of each report related to the dataset. For instance:

  • Dataset label or descriptor
  • Replicate some R package that summarizes datasets at a high level (pct of missing obs, for instance)

panels: add "obs by panelby" at top [7]

Add a table at the top of page 1 which displays the number of observations in the dataset for each level of the panelby variable. Will likely want to match the sort of the subsequent plots, but this is not a certainty. Might display counts as text or bar charts.

vertical: long VAR= list causes SAS error

When the VAR= list is long enough to cause multiple pages of output within each BY= level, SAS crashes. This is apparently an issue because of the use of STARTPAGE=NOW at the top of each BY= level. Because I am using STARTPAGE=NOW, SAS gives up trying to control page breaks. If there are a lot of variables (more than will fit on a page), it crashes. ๐Ÿ˜ข

Short-term hack for users: don't specify more than 8 variable names in the VAR= parameter. Call the macro multiple of there are more than 8 variables of interest.

Possible long-term workarounds include:

  • Control all page breaks by making assumptions about the heights of headers and strips.
  • Possibly rewrite the right half of the report to be an annotated image (thereby controlling the height).
  • Possibly rewrite the header to be an annotated image, again to control the height.
  • Both of the above would likely depend upon the experimental OUTFILE= option.

vertical: add summary of by levels at top of report [5]

Complete list of all combinations of the by variables and count of obs within each level. Could run into width issues if lots of variables are specified. Might need to go landscape for this first page and then flip back to portrait for subsequent pages.

panels: possibly work around latency with one-page-at-a-time rewrite [2-8]

Latencies never seem to appear on page 1, so perhaps create several one-page outputs and stack them together on the back end. Would lose the automatic page numbering, though could possibly manually number the one-at-a-time pages as they are created. Getting the "of Y" part would require more pre-planning/measuring/counting, but again still possible.

generic: add appendix of all (unformatted = formatted) values

To appease those who would say "that's not a codebook", consider adding an appendix at the end which shows more than just the "Top 5" that are shown in the 1" strips. Would like include counts and maybe percents. Might still need to include some sort of max for cases with 100+ unique values.

If this is implemented, consider including hyperlinks so that folks could jump back and forth from the strips to the appendix and back. Not sure how to do this, but how hard could it be!?

Graphics Don't Output to PDF

I'm running the macro "codebook_generic" and the PDF looks great, but the graphic for each variable doesn't end up on the PDF. Only the graphic for the first variable appears. The graphics just get output to PNG files in my data folder. I've only specified the macro variables "DATA" and "PDFPREFIX". I'm using SAS/STAT 14.1. Do you know what might be going on?

generic: deal with long variable names

The macro creates secondary versions of variables by adding prefixes (e.g., VAR1 becomes CB_CHAR_VAR1). This is problematic when incoming variable names are already approaching the length limit of 32. Need to update the code to somehow get around this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.