A recipe to extract a subset of tweets based on one or more hashtags using OpenRefine
This recipe presents a method to export small subsets from a full Twitter dataset based on a selection of hashtags. In particular, with this recipe, one will query a Twitter dataset with one hashtag or two or more hashtags connected by basic boolean operations (i.e., OR, AND). It can be used to extract a selection of tweets to analyse later with spreadsheet software.
This recipe starts from a full Twitter dataset exported from the Twitter Capture and Analysis Toolset (DMI-TCAT). Still, it can be used with any Twitter data, as long as there is a column with tweets text and one column with hashtags (which should be in the same cell and separated by semicolon).

📃 Steps
Installing OpenRefine
  - Download latest version of OpenRefine from this link
 
  - Follow instructions based on OS (Mac or Windows)
 
  - Refer to the official documentation for troubleshooting errors when installing OpenRefine
 
  - When installed, double click on the Open refine icon
 
  - OpenRefine opens directly in your browser
 
  - If the browser does not open, you can type this URL in your browser bar
 
Opening the full dataset with OpenRefine
  - Select the file from your computer and press [next]
 
  - Click [create project] on the top right corner
 

Creating a subset based on 1 hashtag
This step can be used to select tweets containing one specific hashtag. For example, extracting all the tweets with #greenpeace.
  - On the header of the column containing the hashtags, click the small arrow next to the column name, and choose [Text filter]
 
  - On the panel [hashtags] in the top left corner write the hashtag you want to filter (e.g. greenpeace)
 
  - You can set the filter to be case sensitive checking the box at the bottom
 
  - Note that on top of the spreadsheet you can see how many tweets match your criteria (in this case 80 tweets)
 
  - Click on [Export] on the top right corner to export a csv file or Excel file
 

This step can be used to filter tweets based on more than one hashtag. This query technique will result in a subset of all tweets mentioning at least one of the selected hashtags. For example, extracting all the tweets with #greenpeace or #extinctionrebellion.
  - On the header of the column containing the hashtags, click the small arrow next to the column name, and choose [Text filter]
 
  - On the panel [hashtags] in the top left corner, select the [regular expression] option
 
  - In the same panel, write the hashtags separated by a pipe: 
greenpeace|extinctionrebellion 
  - You can set the filter to be case sensitive checking the box at the bottom
 
  - Note that on top of the spreadsheet you can see how many tweets match your criteria (in this case 478 tweets)
 
  - Click on [Export] on the top right corner to export a csv file or Excel file
 

This step can be used to filter tweets based on more than one hashtag. This query technique will result in a subset of all tweets mentioning one of the selected hashtags. For example, extracting all the tweets with #greenpeace and #extinctionrebellion.
  - On the header of the column containing the hashtags, click the small arrow next to the column name, and choose [Text filter]
 
  - On the panel [hashtags] in the top left corner write the first hashtag you want to filter (e.g. greenpeace)
 
  - On the header of the column containing the hashtags, click again the small arrow next to the column name, and select [Text filter]: a new filtering panel will appear
 
  - In the second panel, write the second hashtag you want to filter (e.g. extinctionrebellion)
 
  - Note that on top of the spreadsheet you can see how many tweets match your criteria (in this case only 2 tweets)
 
  - Click on [Export] on the top right corner to export a csv file or Excel file
 
