Semantic Searching Meets Dataporn
Following up on my earlier discussion of semantic searching and scholarly research, I’ve done a little more digging on recent work done in this area, emphasizing the visualization problem, which should really be added as another “major advance” needed to operationalize semantic searching.
Stumbling across the visualcomplexity gallery unearthed quite a few interesting leads, as well many hours of networked browsing and a large payment to Amazon.com for some coffee table books; I confess to being one of those interdisciplinary Applefanatic aesthetic geeks who revels at the intersection of technology and industrial design, ala TED (c.f. earlier rants and diatribes about user interface design). While beautiful, innovative, and impressive, the entries at visualcomplexity also underscore how far we yet have to go in taking complex, networked data and taming it into a user interface that actually helps people acquire knowledge, perform research and understand complexity. Yes, the following semantic graph of French intellectual property is a thing of aesthetic beauty, but it’s also borderline dataporn for systems junkies like me:
Actually, this might be considered “third generation dataporn”, with the first generation being dot-matrix ASCII nudes of the 70’s/80’s, the second generation being Mandelbrot art (c.f. coffee table books) of the 80’s/90’s, and the third generation being complexity and connectivity art of the ’00s. I have a vague hunch these could be used to also trace out the social evolution of geekdom, from the sort of brutal, rapelike man-imposes-prurient-interests-upon-machine geekdom of ASCII porn, to the puritanical mathematics-driven nonlinear/recursive geekdom of the Mandelbrot art era, to philosopher-king/semantic metageeks of today. But I digress…
I latched on to the term dataporn when a colleague jokingly used it recently, but it seems to be a term already out there in the wild, with occasionally good commentary. This blog post echoes my concern: that the allure of dataporn may be an unintentional stumbling block on the road to operationalizing semantic searching.
A few tools stand out as particularly promising. The Processing and, I think more intriguingly, Flare visualization languages promise to put the worst of visualization drudgery at least partially in the background, and provide interactive and fluid user interface environments more likely to encourage data exploration and utilization (check out the layouts tab of the Flare demo). The Flare dependency graph, an evolution of radial graphs, looks particularly promising both for its aesthetic appeal and visual clarity. The dependency graph has been used beautifully by the Eigenfactor Project, analyzing – guess what? – dependencies between scholarly research journals:
While mapping out the interconnectivity of all science disciplines seems a bit too much like navel-gazing for my tastes, this visualization approach would map beautifully down to examining dependencies within a specific knowledge domain; following the Eigenfactor.org tool, the outer radial wedges could be specific research themes of interest, the inner, finer-grained wedges could be individual authors or papers, and … voila, a tool to visualize what I had only hinted at it my cartoon mockups. (Not to mention a tool for many other uses … I’m already itchy to get my hands on the source code and feed our NASA employee-by-employee inventory of all technical competencies into this tool, as a way to understand flexibilities in future strategic workforce planning as we phase from one development project to the next…)
Browsing the images and videos there will give a pretty good feel for the approach, absent the underlying algorithms. This tool seems to be screaming to be fed some intelligent back-end, automated semantic data / knowledge mining from large topical repositories. I particularly like the fluid interface and the visual metaphor of people (or institutions, or any objects) as being “in” knowledge domains, as well as the conversion of all those wretched hairballs of network connections to density maps or clusters. (Yes, I get it, your software can track millions of connections. I don’t need to see them all to get my job done any more than I need to see the subroutine calls of my operating system. It’s your job as programmer to make hay with them quietly and out of sight…)
I’ll stick to my guns and predict that such tools will first become operationally useful within very well bounded knowledge domains, whether those of specific scientific research fields or corporate technical wikis; the “universe at large” is as yet too complex a beast to map down to a small enough number of dimensions (and connections) to be useful to us mere mortals.