附录 G — 其它有趣的包

G.1 janitor

整理、清洗数据的好上手的包。

G.1.1 整理列名

library(janitor)
# https://github.com/sfirke/janitor
fake_raw <- tibble::tribble(
  ~id, ~`count/num`, ~W.t, ~Case, ~`time--d`, ~`%percent`,
  1L, "china", 3L, "w", 5L, 25L,
  2L, "us", 4L, "f", 6L, 34L,
  3L, "india", 5L, "q", 8L, 78L
)
fake_raw
# A tibble: 3 × 6
     id `count/num`   W.t Case  `time--d` `%percent`
  <int> <chr>       <int> <chr>     <int>      <int>
1     1 china           3 w             5         25
2     2 us              4 f             6         34
3     3 india           5 q             8         78
fake_raw %>%
  janitor::clean_names()
# A tibble: 3 × 6
     id count_num   w_t case  time_d percent_percent
  <int> <chr>     <int> <chr>  <int>           <int>
1     1 china         3 w          5              25
2     2 us            4 f          6              34
3     3 india         5 q          8              78

G.1.2count()函数对比

# `count()`函数
mtcars %>%
  dplyr::count(cyl, sort = TRUE) %>%
  mutate(percent = 100 * n / sum(n))
  cyl  n percent
1   8 14    43.8
2   4 11    34.4
3   6  7    21.9
# 使用janitor包的tabyl函数
mtcars %>%
  janitor::tabyl(cyl)
 cyl  n percent
   4 11   0.344
   6  7   0.219
   8 14   0.438

G.2 粘贴小表格datapasta

  • 安装install.packages(“datapasta”)
  • 鼠标选中并复制网页中的表格
  • 在 Rstudio 中的Addins找到datapasta,并点击paste as tribble

G.3 搞定模型公式

equatiomatic

library(equatiomatic)
mod1 <- lm(mpg ~ cyl + disp, mtcars)
extract_eq(mod1)

\[ \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \epsilon \]

extract_eq(mod1, use_coefs = TRUE)

\[ \operatorname{\widehat{mpg}} = 34.66 - 1.59(\operatorname{cyl}) - 0.02(\operatorname{disp}) \]

G.4 自动搞定报告

G.4.1 生成报告

report包

library(report)
model <- lm(Sepal.Length ~ Species, iris)
summary(model) %>%
  broom::tidy()
# A tibble: 3 × 5
  term              estimate std.error statistic   p.value
  <chr>                <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)           5.01    0.0728     68.8  1.13e-113
2 Speciesversicolor     0.93    0.103       9.03 8.77e- 16
3 Speciesvirginica      1.58    0.103      15.4  2.21e- 32
report(model)
We fitted a linear model (estimated using OLS) to predict Sepal.Length with
Species (formula: Sepal.Length ~ Species). The model explains a statistically
significant and substantial proportion of variance (R2 = 0.62, F(2, 147) =
119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to
Species = setosa, is at 5.01 (95% CI [4.86, 5.15], t(147) = 68.76, p < .001).
Within this model:

  - The effect of Species [versicolor] is statistically significant and positive
(beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12,
95% CI [0.88, 1.37])
  - The effect of Species [virginica] is statistically significant and positive
(beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91,
95% CI [1.66, 2.16])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

G.4.2 自动模型评估

library(performance)
model <- lm(mpg ~ wt * cyl + gear, data = mtcars)
performance::check_model(model)

G.4.3 统计表格

gtsummary

library(gtsummary)
gtsummary::trial %>%
  dplyr::select(trt, age, grade, response) %>%
  gtsummary::tbl_summary(
    by = trt,
    missing = "no"
  ) %>%
  gtsummary::add_p() %>%
  gtsummary::add_overall() %>%
  gtsummary::add_n() %>%
  gtsummary::bold_labels()
Characteristic N Overall
N = 2001
Drug A
N = 981
Drug B
N = 1021
p-value2
Age 189 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.7
Grade 200


0.9
    I
68 (34%) 35 (36%) 33 (32%)
    II
68 (34%) 32 (33%) 36 (35%)
    III
64 (32%) 31 (32%) 33 (32%)
Tumor Response 193 61 (32%) 28 (29%) 33 (34%) 0.5
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test
trial %>%
  select(trt, age, grade) %>%
  tbl_summary(
    by = trt,
    missing = "no",
    statistic = all_continuous() ~ "{median} ({p25}, {p75})"
  ) %>%
  modify_header(
    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p)}%)"
  ) %>%
  add_n() %>%
  bold_labels() %>%
  modify_spanning_header(all_stat_cols() ~ "**Chemotherapy Treatment**")
Characteristic N
Chemotherapy Treatment
Drug A
N = 98 (49%)1
Drug B
N = 102 (51%)1
Age 189 46 (37, 60) 48 (39, 56)
Grade 200

    I
35 (36%) 33 (32%)
    II
32 (33%) 36 (35%)
    III
31 (32%) 33 (32%)
1 Median (Q1, Q3); n (%)