Recipes

🌁🤖 Content similarity image clusters (with computer vision), annotated

Identifying thematic clusters inside a (rather large) image collection.

This approach helps to cluster and visualise images in a collection, according to how machine learning algorithms classify their content. This can be used to identify thematic visual clusters inside a collection of images as well as to quantify them. It is similar to a co-hashtag analysis, but undertaken with visual content. In practice, with the help of computer vision one generates tags for each image and then uses shared tags to visually cluster similar images. There are four main phases. First, images are tagged with help of a computer vision API. Second, images are downloaded and saved locally. Third, a network of images and tags is built and visualized in Gephi. Finally, images are loaded into the network and exported. The process ends with annotation of clusters on Vector.com.

🗄️ Examples

Mapping Parisian Urban Nature Abstract / PDF / Composite images

🧱 Inputs from TCAT

“Media frequency”
Or “Export all tweets from selection” → column “from_user_profile_image_url”

📃 Steps

Tag images with computer vision API

Open data with Google Spreadsheet
Export csv with URLs list from Google Spreadsheet
Import URLs list in image tagging tool) (be sure to get your own Clarifai API key and paste it into the tool)
Run tagging (click button: process input file)

Download images locally and resize them

You can do this, for example, using a browser extension or using the command line.

Using browser extension

Install Tab Save Chrome extension
Copy and paste URLs list in Tab Save
Download images from URLs list with Tab Save
Go to Bulk Resize Photos
Drag images
Resize by 50%
Unzip folder

Using command line

An alternative approach to downloading images is to use the command line.

First you’ll need to install wget a tool for downloading files using HTTP and other protocols. How you do this will depend on your operating system, your command line interface and your package manager. For example,
- If you’re on a Mac you can install a package manager such as Homebrew and you can use $ brew edit wget
- If you’re on Linux you can use $ apt-get install wget
Put the image URLs into a single csv file
Create a folder where you’d like the images to go and navigate to the folder using the command line
- You can use ls to list the files at your location and cd to change to a given directory
You can download the files listed in the csv file using the command wget -i [path to the csv file] --show-progress
- For further details and other options see the wget manual
If this works you should have a folder full of images. 🎏
- If you’d like to create a text file with the images which are in the folder (e.g. to check which have downloaded or to add file names to a dataset) you can use ls -1 >> [name of your file.csv] to generate another csv files with the names of all the files which have successfully downloaded.

Prepare edges table for Gephi

Import csv output of Image tagging tool in Google spreadsheet
Rename headers: url → Source; concept → Target; confidence → Weight
Export edges csv from Google Spreadsheet

Prepare nodes table for Gephi

Note: you can also use Table2Net to create graph files from csv files, but here we will do this manually to update image locations.

Copy “Source” column in a new sheet
Rename column as “Id”
Make a copy of “Id” column into a new column
Rename the new column as “image”
In the new column “Image”, transform URLs strings into file names (see below for options)
Export node csv from Google Spreadsheet

Adding image file names with find and replace

Sort column “Image” alphabetically
There are different types of urls, but all of them ends with the image name. The goal is to delete every character before the image name with the Find and Replace tool. Proceed by group of similar URLs.
Select column “Image”
Edit → find and replace
Find a string such as https://pbs.twimg.com/media/ and replace with nothing (be sure that you are doing the search only in a “specific range”, which in this case is the column “Image”)
Repeat for each type of URL, until you only have image names and no URLs

Adding file names with VLOOKUP

Create new sheet called “urls” and copy and paste the image URLs into it.
Create a new ‘Named range’ by clicking on the “Data” menu and then “Named ranges” then “Add a range” from the right hand menu, enter the name “url_list” and select the urls in the sheet and click “Ok” and “Done”
Create new sheet called “images” and copy and paste the downloaded file names into it (if you used wget method above you can use ls -1 >> [name of your file.csv] to get a list of the downloaded files)
Use the VLOOKUP function to find the URLs associated with each of the images in the “images” sheet by using the following formula next to the first cell on the sheet =VLOOKUP("*"&A1,url_list, 1,FALSE). You can double click the small square in the bottom of the cell or drag down to lookup the rest of the images in the sheet.
Copy the table of images and associated urls and paste into a new sheet (e.g. “image_urls_values”) using “Edit” > “Paste special” > “Paste values only”. Create a new named range (using the same process described above) by selecting these values and naming them “image_url_table”.
In the “nodes” sheet, create a new column called “Image” and use VLOOKUP again to find the associated URLs with the following formula =VLOOKUP(A2,image_url_table, 2,FALSE). You can double click the small square in the bottom of the cell or drag down to lookup the rest of the images in the sheet.

Import network and visualize clusters with Gephi

Open Gephi
Download and install “Image preview” plugin
Data laboratory → import spreadsheet
Import edges table
Data laboratory → import spreadsheet
Import nodes table (be sure to have checked: “append to existing workspace”)
Resize nodes based on “out-degree”
Spatialise network with Force Atlas 2

Export image from Gephi and annotate

In the Finder, find one image file
Find path (on Mac: command + i)
Copy path
Go to Gephi → Preview window
Select “Render nodes as images”
In the field “Image Path”: paste image path
Set nodes opacity to 0
Deselect “show edges”
Click “Refresh” to generate image network
Export png
Import png in Vectr.com
Annotate custers

Note: if the images don’t show up and/or the nodes continue to appear behind the images you may have to turn opacity to 100 in “Preview Settings” of the “Preview” panel and then change the node colour to white by going to “Overview” > “Appearance” > “Nodes” > “Unique” and selecting white (#ffffff) as the node colour. Upon refreshing and exporting the network you should see just clusters of images, without nodes or edges.

🐙 Inspiration, acknowledgments and contributors

This and other visual methods recipes were originally formulated by Gabriele Colombo drawing on his doctoral work exploring the design of composite images. They were documented and refined for a module on Digital Methods for Internet Studies: Concepts, Devices and Data convened by Liliana Bounegru and Jonathan Gray at the Department of Digital Humanities, King’s College London, leading to a set of collaborative group projects with their students and the European Forest Institute. The approaches behind these recipes draw on several years of experimentation with images in the context of research and teaching at the Visual Methodologies Collective (Amsterdam University of Applied Sciences), the Digital Methods Initiative (University of Amsterdam), DensityDesign Lab (Politecnico di Milano), the médialab (Sciences Po, Paris) and beyond. You can read more about these approaches in Colombo, 2019 and Niederer & Colombo, 2019. Further readings can be found in the visual methods Zotero bibliography.