04-author-paper.Rmd

# 实证网络探索

*进行某个实证网络度分布的幂律拟合*

下面我们将针对经济物理学家合作网的度分布进行幂律拟合。

```{r setup-author}
papers_authors <- read_tsv(
  "datasets/经济物理学家合作网络/paperID_authorID.txt",
  col_names = c("paper_id", "author_id")
) %>%
  # index of 0 is not supported in R
  mutate_all(~ . + 1)
num_authors <- n_distinct(papers_authors$author_id)
author_matrix <- matrix(0, num_authors, num_authors)
# get the number of cooperation papers for each pair of authors
papers <- unique(papers_authors$paper_id)
authors_coop <- tibble()
for (paper in papers) {
  paper_authors <- filter(papers_authors, paper_id == paper)
  if (nrow(paper_authors) <= 1)
    next
  paper_coop <- expand_grid(
    author1 = paper_authors$author_id,
    author2 = paper_authors$author_id
  ) %>%
    filter(author1 != author2) %>%
    add_column(n = 1)
  authors_coop <- bind_rows(authors_coop, paper_coop)
}
authors_coop <- authors_coop %>%
  group_by(author1, author2) %>%
  summarise(n = sum(n)) %>%
  ungroup()
# construct a non-weighted adjacency matrix
author_matrix[as.matrix(select(authors_coop, starts_with("author")))] <- 1
author_graph_raw <- graph_from_adjacency_matrix(author_matrix, "undirected")
# find the largest connected components
author_graph <- decompose(author_graph_raw) %>%
  enframe(name = "id", value = "graph") %>%
  mutate(num_vertices = map_dbl(graph, vcount)) %>%
  filter(num_vertices == max(num_vertices)) %>%
  pluck("graph", 1)
```

在创建经济物理学家合作网时，我们定义只要两个科学家有过合著一篇文章则表示两者之间有一条连边。这样构造出网络后，我们又找出了其最大连通集团。图\@ref(fig:visualize-author)给出了该最大连通集团的可视化结果（其中包含`r vcount(author_graph)`位科学家）。

```{r visualize-author, fig.width=6, fig.height=6, fig.cap="经济物理学家合作网可视化"}
par(mar = c(0, 0, 0, 0) + 0.1)
plot(
  author_graph,
  layout = layout_with_kk,
  vertex.color = "black",
  vertex.size = 1,
  vertex.label = NA,
  edge.curved = 0
)
```

图\@ref(fig:degree-distribution-author)分别给出了在原始坐标轴和双对数坐标轴下的度分布图。根据双对数情况下的分布图，可以大致看出来图形呈一条直线，因此基本适合进行幂律分布拟合。

```{r degree-distribution-author, fig.width=6, fig.height=4, fig.cap="度分布图。A：原始坐标轴；B：双对数坐标轴。"}
deg_dis <- tibble(
  freq = degree_distribution(author_graph),
  k = seq(0, by = 1, length.out = length(freq))
) %>%
  filter(k != 0)
plot_grid(
  ggplot(deg_dis, aes(k, freq)) +
    geom_point(color = "lightblue") +
    labs(y = expression(P(k))) +
    theme_few(),
  ggplot(deg_dis, aes(k, freq)) +
    geom_point(color = "lightblue") +
    labs(y = expression(paste(log[10], P(k), sep = ""))) +
    theme_few() +
    scale_x_log10() +
    scale_y_log10(),
  ncol = 1,
  labels = "AUTO"
)
```

最后，采用Clauset、Shalizi和Newman三人在2009年的文章中提到的方法对网络的度分布做幂律拟合，表\@ref(tab:fit-power-law)中给出了拟合的结果。

```{r fit-power-law}
fit_results <- fit_power_law(degree(author_graph)) %>%
  as_tibble()
fit_results %>%
  select(alpha, logLik, KS.p) %>%
  knitr::kable(
    digits = 3,
    col.names = c("$\\gamma$", "log likelihood", "$p$"),
    caption = "幂律拟合统计结果小结"
  )
```

注意到$p$值大于0.05，从统计意义上看，模型的拟合结果很好。所以，经济物理学家合作网的度分布服从$\gamma$值为`r round(fit_results$alpha, 3)`的幂律分布，即`r str_glue("$P(k)\\propto e^{{-{round(fit_results$alpha, 3)}}}$")`。