Skip to content

May 3, 2011

Adventures in Tagging

by brainoids

This is a “three birds with one stone” entry:

  1. I wanted a test project to learn GraphViz and its related programming syntax for generating network diagrams.
  2. I wanted to see how much semantic structure was built into my Delicious bookmarks, after two years and ~1190 entries worth of use.
  3. I wanted to do the latter before Delicious gets transferred to its new owner and potentially gets dorked up.   (Yahoo’s utter apathy and neglect has at the very least been benign.  Although I am optimistic about AVOS…)

More specifically on #2, has enough information content found its way into my assigned tags to self-organize related keywords?  (I’m fascinated with the concept of folksonomies, even if a folksonomy technically requires more than one contributor.   

Garden-variety Wordle’s are easy enough to construct from Delicious bookmark tags, but are little more than toys, failing to reveal any linkage or semantic content.   (Other than the, I’m sure, accidentally fortuitous for me, choice of a Space Shuttle-type envelope for the “horizontal and alphabetical” tag ordering).   This makes for a nice bumper sticker, but not much else:

GraphViz (which happens to also have a very nice and convenient MacOS X interface) helps add back connectivity between tags.  Lots of connectivity:

The graphic is easier to penetrate in its PDF version.  In the diagram, I’ve connected pairs of tags that occur together in three or more bookmarks.  Tag bubbles are scaled (manually) by their overall frequency of occurrence.   The rest of the organization is done automatically by GraphViz in attempting to minimize the complexity (!) of the network’s edge (connection) layout.

Overall I’m neither disappointed nor ecstatic with the results.   There are nicely laid out “regions” on the diagram:  an “aerospace” zone in the middle-left, a “government” sector at lower-left, a “business” area at center-bottom, even a “shopping” nook at far-bottom.   Interestingly, the “travel” quarter got pulled up into the “aerospace and policy” regions, presumably due to multiple connections for  “Europe” and “China” (which I have both travelled to, and also bookmark often from a space policy perspective).

Alternatively, there are limits to how much a complex, multidimensional semantic space can be “flattened” to 2-D.   The almost-lowest tier of tags sometimes has the appearance of a catch-all, and some tags have found themselves stranded there.

For a great discussion of folksonomies, check out Moritz Stefaner’s thesis – several years old but still an excellent read.

Below is a subset of the GraphViz code used to generate the network diagram.   It’s probably “kindergarten” in terms of GraphViz grammar level and any tips on improving the layout would be welcome.  Unfortunately, not much of this process was automated – scraping of the delicious bookmarks, scaling, and transcription to .gv format were a manual (although not painful) process.

digraph Delicious {
	graph [splines=true,overlap=false,concentrate=true,style="bold"]
	node [style=filled fillcolor=red]	

	"space" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=24.5,width=73.5,fontsize=703.5];	
	"visualization" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=22.36,width=67.1,fontsize=670.8];	
	"NASA" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=20.82,width=62.45,fontsize=624.5];
	"travel" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=16.832,width=50.4975,fontsize=504.975] ;

	"RP" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=2.357,width=7.071,fontsize=70.71] ;
	"upper_stage" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=2.357,width=7.071,fontsize=70.71] ;
	"vehicle" [shape=ellipse,regular=false,style=filled,fillcolor=green,height=2.357,width=7.071,fontsize=70.71] ;...
	space -> NASA [dir=none, weight=67];
	space -> launch [dir=none, weight=35];
	space -> policy [dir=none, weight=26];
	space -> science [dir=none, weight=26];

	booster -> reusable [dir=none, weight=2];
	booster -> RP [dir=none, weight=2];
	HLV -> Senate [dir=none, weight=2];
	HLV -> heavylift [dir=none, weight=2];


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

%d bloggers like this: