Skip to contents

summary_ functions summarize data and return metrics related to them.

summary_cat

The goal of summary_cat is to summarize categorical variables.

set.seed(123);g <- c(sample(letters,100,replace = TRUE),NA)

summary_cat(g)
#> # A tibble: 1 × 6
#>       n    na blank_space n_distinct mode  modality
#>   <int> <int>       <int>      <int> <chr>    <int>
#> 1   101     1           0         25 y            1

summary_num

The goal of summary_num is to summarize numeric variables.

set.seed(123);x <- c(rnorm(10),NA,10)

summary_num(x) %>% glimpse()
#> Rows: 1
#> Columns: 8
#> $ min  <dbl> -1.265061
#> $ p25  <dbl> -0.5030688
#> $ p50  <dbl> 0.07050839
#> $ p75  <dbl> 1.009812
#> $ max  <dbl> 10
#> $ mode <dbl> -0.2678934
#> $ mean <dbl> 0.9769324
#> $ cv   <dbl> 3.2

It is also possible to return metrics related to type of values we have:

summary_num(x,type = TRUE) %>% glimpse()
#> Rows: 1
#> Columns: 13
#> $ n          <int> 12
#> $ na         <int> 1
#> $ negative   <int> 5
#> $ equal_zero <int> 0
#> $ positive   <int> 6
#> $ min        <dbl> -1.265061
#> $ p25        <dbl> -0.5030688
#> $ p50        <dbl> 0.07050839
#> $ p75        <dbl> 1.009812
#> $ max        <dbl> 10
#> $ mode       <dbl> -0.2678934
#> $ mean       <dbl> 0.9769324
#> $ cv         <dbl> 3.2

We can also add other means.

summary_num(x,other_means = TRUE) %>% glimpse()
#> Warning in warn_any_logic(x = x, operator = `<`, value = 0, warning = "Negative
#> values will be ignored."): Negative values will be ignored.
#> Rows: 1
#> Columns: 10
#> $ min            <dbl> -1.265061
#> $ p25            <dbl> -0.5030688
#> $ p50            <dbl> 0.07050839
#> $ p75            <dbl> 1.009812
#> $ max            <dbl> 10
#> $ mode           <dbl> -0.2678934
#> $ mean           <dbl> 0.9769324
#> $ cv             <dbl> 3.2
#> $ geometric_mean <dbl> 0.6946152
#> $ harmonic_mean  <dbl> 0.7436103

summary_seq

The goal of summary_seq is to compute the number of sequential repeated values.

y <- c(1, 1, 1, 2, 2, 6, 7, 1, 1)

summary_seq(y)
#> # A tibble: 5 × 2
#>   value num_rep
#>   <dbl>   <int>
#> 1     1       3
#> 2     2       2
#> 3     6       1
#> 4     7       1
#> 5     1       2

summary_xy

The goal of summary_xy is to summary two numeric variables, by computing some metrics such as: Pearson, Kendall and Spearman correlation coefficients.

x <- rnorm(100)

y <- rnorm(100)

summary_xy(x,y)
#> # A tibble: 1 × 4
#>   covariance pearson  kendall spearman
#>        <dbl>   <dbl>    <dbl>    <dbl>
#> 1    -0.0654 -0.0724 -0.00970  -0.0144