A recipe for finding associated hashtags from a co-hashtag network
This recipe illustrates a way to streamline the process of hashtag snowballing, that is, starting from one selection of hashtags, choosing other associated hashtags and creating a more extensive hashtags list. This process can be helpful to expand a list of hashtags for querying social media (Twitter or Instagram) to obtain a set of posts. This recipe assumes that you have an initial list of hashtags (provided by domain experts or otherwise compiled) which you have used to collect a co-hashtag network (i.e. network of hashtags used together). This is based on the notion of “query snowballing” (Rogers, 2018). We will use Gephi and Google Spreadsheet to explore the co-hashtag network looking for other interesting hashtags.
This recipe can be used to work with any co-hashtag network (e.g., from Twitter or Instagram). It assumes you already have a working network file that can be opened with Gephi (.gdf, .gephi, or two tables with edges and nodes data).
The edges table describes connections between hashtags and the strength of those connections (weight), that is, how many times two hashtags have been used together. Depending on how the network dataset is generated, this might change, but most commonly, connections among nodes in the edges table are described by a numerical Id that uniquely identifies a node. Each unique numerical Id is coupled with the hashtag name in the column Label in the nodes table. In this step, we use a spreadsheet function (i.e., vertical lookup) to copy hashtag names from the nodes table to the edges table. It will help to explore the dataset in the next step. Practically, we will add two columns in the edges table: source label and target label, and we will populate these columns with hashtags names from the nodes table.
=VLOOKUP(A2,nodes!A:B,2,false)
(This assumes that the nodes table is in a sheet named “nodes”, and the first two columns (A and B) are the Id column and the Label column.)
=VLOOKUP(C2,nodes!A:B,2,false)
(This assumes that the nodes table is in a sheet named “nodes”, and the first two columns (A and B) are the Id column and the Label column.)Now that we have prepared a table with co-hashtag connections and their strength (i.e. weight), we can explore hashtags and expand the list.