twarc-network builds a reply, quote, retweet and mention network from a file of tweets that you've collected using twarc. It will write out the network as a gexf, gml, json, csv or html file. It uses networkx for the graph model and d3 for the html presentation.
If you know CSS you can hack at the generated HTML file to modify the style to suit your needs. If you come up with a more pleasing representation please send a pull request! Exporting as a gexf, or gml will allow you to import the data into tools like Gephi, Cytoscape and GraphViz for further analysis and visualization.
To install you will need to:
pip3 install twarc-network
First you will need to collect some data with twarc:
twarc2 search blacklivesmatter > tweets.jsonl
Once you've got some data you can create the default D3 HTML visualization:
twarc2 network tweets.jsonl network.html
or gexf:
twarc2 network tweets.jsonl --format gexf network.gexf
or gml:
twarc2 network tweets.jsonl --format gml network.gml
or json:
twarc2 network tweets.jsonl --format json network.json
or CSV edge list:
twarc2 network tweets.jsonl --format csv network.csv
Tweets can be connected together as replies, quotes and retweets. If you would like to see the network oriented around nodes that are tweets instead of users you can:
twarc2 network tweets.jsonl --nodes tweets network.html
Hashtags can can be connected when they are used together in a tweet. So you can visualize a network where nodes are hashtags:
twarc2 network tweets.jsonl --nodes hashtags > network.html
By default, when user and tweet graphs are built, all types of interactions are used as edges: Retweet, reply or quote in the case of tweets; retweet, reply, quote or mention in the case of users. But you can also limit the types considered. For example, if you only want retweet edges, you can:
twarc2 network tweets.jsonl tweets.html --edges retweet
Or if you only want replies and quotes, you can:
twarc2 network tweets.jsonl tweets.html --edges reply --edges quote
Depending on the data you are analyzing it can be helpful to remove weakly connected components in the graph that are smaller than some number. For example if you don't want to visualize networks where two nodes are only connected to each other and not anyone else you can:
twarc2 network tweets.jsonl tweets.html --min-component-size 3
It's less common but you can also remove nodes that are part of too large subgraphs. For example if you wanted to remove any components that were larger than 10:
twarc2 network tweets.jsonl tweets.html --max-component-size 10
The possible node attributes are the following:
screen_name
: When the node is a user, its username; by default, it is used as the label of the nodes. When the node is a tweet, the username of its author.user_id
: When the node is a user, its id; if you want to use it as the label of the nodes, you can use the flag--id-as-label
. When the node is a tweet, the id of its author.start_date
: The date of the first interaction that made the node appear in the graph. For example, if the node is a retweet, it is its date of creation. Or if the node is an original tweet, it is the date of the first retweet, reply or quote. The format isdd/mm/yyyy hh:mm:ss
.
The possible edge attributes are the following:
type
: When the nodes are tweets, one of the following values:retweet
,reply
orquote
.retweet
: When the nodes are users, the number of retweets the source has made to the target.reply
: When the nodes are users, the number of replies the source has made to the target.quote
: When the nodes are users, the number of quotes the source has made to the target.mention
: When the nodes are users, the number of mentions the source has made to the target.weight
: When the nodes are users, the sum ofretweet
,reply
,quote
andmention
. When the nodes are hashtags, the number of tweets that contained both hashtags.