

The aggregate command allows us to create more complex tables, across the levels of several categorical variables. Table2 <- aggregate(MASS, list(SMOKE), FUN=mean) Anyway – now we use the aggregate command to obtain a table of mean body mass across the two levels of smoker (i.e. Note that the aggregate command does not return the variable names. Table1 <- aggregate(MASS, list(GENDER), FUN=mean) Finally, the function you wish to apply (in this case you want the mean) becomes the third argument. Then the categorical variable appears inside the list command. The continuous variable becomes the first argument. Let’s use the aggregate command to obtain a table of mean body mass across the two levels of gender.

GENDER TREATMENT MASS SMOKE EXERCISE RECOVER Cut and paste the following data set into R. We have data on their gender, their body mass in kg, whether or not they exercise, whether or not they smoke, and whether or not they recovered after treatment. Here is a data set of patients receiving medical treatment (A, B or C). This is where the aggregate command is so helpful.

In any case, you may wish to produce summary statistics for each level of the categorical variable. Your data set may include other categorical variables such as Ethnicity, Hair Colour, the Treatments received by patients in a medical study, or the number of cylinders in motor vehicles. You may have a complex data set that includes categorical variables of several levels, and you may wish to create summary tables for each level of the categorical variable.įor example, your data set may include the variable Gender, a two-level categorical variable with levels Male and Female. # produces mpg.m wt.m mpg.s wt.In Part 10, let’s look at the aggregate command for creating summary tables using R. SummaryBy(mpg + wt ~ cyl + vs, data = mtcars, It defines the desired table using a model formula and a function. The doBy package provides much of the functionality of SAS PROC SUMMARY.

Median, mad, min, max, skew, kurtosis, se Summary Statistics by GroupĪ simple way of generating summary statistics by grouping variable is available in the psych package. Median, mean, SE.mean, CI.mean, var, std.dev, coef.var # nbr.val, nbr.null, nbr.na, min max, range, sum, # Tukey min,lower-hinge, median,upper-hinge,max # mean,median,25th and 75th quartiles,min,max There are also numerous R functions designed to provide a range of descriptive statistics at once. Possible functions used in sapply include mean, sd, var, min, max, median, range, and quantile. # get means for variables in data frame mydata One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic. R provides a wide range of functions for obtaining summary statistics.
