Posted by: brainoids | 9 May 2009

A Common Crowdsourcing Platform?

Amazing how a 3-minute TED talk can stimulate alot of thinking. The following snippet describes a cell-phone-based solution for citizen reporting of political oppression in Africa:

Note that Peter Gabriel was already working on lower-tech human rights crowdsourcing back in 2006:

Hersman’s comments about extending the range of the platform are most intriguing. The Ushahidi site already features additional deployments of crowdsourcing reports of swine flu and Indian election monitoring.   As inspiring as this effort is, it also begs the question of the eventual proliferation of niche deployments, and whether a common platform or clearinghouse is possible (how many “drops” of interest must a crowdsourcing participants remember and bookmark?).   (I’m reminded of my very first post here about iPhone severe weather spotting crowdsourcing.   How many citizen spotter apps would I eventually load up?)

One approach Hersman discusses is leveraging a common platform such as Twitter and somehow mining the information out of the tweetstream.   This would require significant back-end filter development (as noted) as well as some scheme to assess veracity or credibility.   As optimistic as I am about the eventual maturation of semantic technologies, this puts the payoff pretty far in the future, I think.

Another approach could be a niche for the ‘next big social networking site’, a clearinghouse for crowdsourcing reports on a multitude of topics – a “big dream” version of Ushahidi.   One site, one set of APIs, one place to report (again leveraging cell phones and smartphones), but with a variety of modular plug-in interfaces for topics of interest.   A quick-and-dirty brainstorm might yield ideas like those below, spanning the gamut from developing world to developed world priorities and concerns; from the mundane to the severe:

  1. Political oppression / violence (c.f. Ushahidi)
  2. Election fraud (c.f. Ushahidi)
  3. “Garden variety” crime
  4. Disease outbreaks (c.f. avian or swine flu)
  5. Air quality
  6. Traffic / accidents
  7. Severe weather
  8. Mundane weather
  9. Climate phenology

Now, granted, that’s a pretty grim and “engaged activist citizen” centric list.   People love to contribute, but I think it would require a particular and, sadly, niche frame of mind to attract widespread participation.   Assuming a lot of transparent geotagging, the pot could be sweetened by adding modules which “accentuate the positive” (heads-up, there’s a beautiful sunset going on right now; rainbow on the horizon! ; whatever).   Reports could easily be converted in parallel to tweets, facebook updates, whatever, but a key element is that the data collection itself could be centralized at the “hip, go-to place”.

Why would centralization be a good idea?   I think, because of the critical issue of credibility and validation.   eBay showed us how critical participant credibility is to highly-democratized commerce, and really, the concept above is just a highly democratized marketplace of information.    With a central crowdsourcing clearinghouse platform, participants could incrementally earn credibility with each “valid” report.   eBay establishes credibility by, effectively, peer review.   Crowdspotters could earn credibility by a couple of mechanisms (I’m envisioning some point scheme):

  1. Peer validation (low points earned incrementally when the same event is reported by multiple spotters in the same time / location)
  2. Independent corroboration (some back-end algorithms validate user reports against “official” databases after the fact, and then assign out higher point awards)

Credibility scores then become a badge of pride / status – an incentive for participating in improving the global, connected citizen database.   There would have to be some weighting (corroborated political oppression reports get way higher points than “it’s warm and sunny in San Diego”), as well as serious consideration to how not to inadvertently disenfranchise the developing world (the villager in Africa who can’t afford unlimited text messages or an iPhone), but those are really all implementation and calibration issues.   The key “thing” is leveraging common and mundane ways for people to contribute to establish credibility and keep people engaged for when the “rare big important” reporting opportunities come up.

Technically, there’s no really Twitter itself couldn’t spin this up as riding in parallel with its current activity.  If you want to yammer into the global tweetstream-of-consciousness in an unstructured way, fine, but if you want to contribute something and have it much more likely to be picked up and used for good, well, flip over to the “crowdspotter” Twitter applet tab, or text using some #standardformat and Twitter peels off the reports to the appropriate database.   I’d even have no problem with Twitter leveraging the collected data and selling it into aftermarkets (c.f. my iPhone severe weather spotting post); heck, this might even give it a sustainable, Google-esque business model.   As an aside, if some topics were so critical (political oppression) that niche deployments and “editors” like Ushahidi were still warranted, there’s no reason such deployments couldn’t stand to leverage and benefit from a more structured and distributed common platform of “lower quality input” reports described above

The global proliferation of cell phones – and, eventually, smartphones – opens up far too many possibilities for crowdspotting and democratization and empowerment to let it languish.   Niche deployments are a good thing, but will face an uphill battle if participants have to find their way to the niches; the critical factor for crowdsourcing and crowdspotting is the huge volume of potential participants.   This seems like a huge disruptive technology opportunity waiting for someone to pick up and run with …

Posted by: brainoids | 18 April 2009

Semantic Searching Meets Dataporn

Following up on my earlier discussion of semantic searching and scholarly research, I’ve done a little more digging on recent work done in this area, emphasizing the visualization problem, which should really be added as another “major advance” needed to operationalize semantic searching.

Stumbling across the visualcomplexity gallery unearthed quite a few interesting leads, as well many hours of networked browsing and a large payment to Amazon.com for some coffee table books; I confess to being one of those interdisciplinary Applefanatic aesthetic geeks who revels at the intersection of technology and industrial design, ala TED (c.f. earlier rants and diatribes about user interface design).   While beautiful, innovative, and impressive, the entries at visualcomplexity also underscore how far we yet have to go in taking complex, networked data and taming it into a user interface that actually helps people acquire knowledge, perform research and understand complexity.  Yes, the following semantic graph of French intellectual property is a thing of aesthetic beauty, but it’s also borderline dataporn for systems junkies like me:

Semantic graph of French intellectual property

Semantic graph of French intellectual property

Actually, this might be considered “third generation dataporn”, with the first generation being dot-matrix ASCII nudes of the 70’s/80’s, the second generation being Mandelbrot art (c.f. coffee table books) of the 80’s/90’s, and the third generation being complexity and connectivity art of the ’00s.    I have a vague hunch these could be used to also trace out the social evolution of geekdom, from the sort of brutal, rapelike man-imposes-prurient-interests-upon-machine geekdom of ASCII porn, to the puritanical mathematics-driven nonlinear/recursive geekdom of the Mandelbrot art era, to philosopher-king/semantic metageeks of today.   But I digress…

I latched on to the term dataporn when a colleague jokingly used it recently, but it seems to be a term already out there in the wild, with occasionally good commentary.   This blog post echoes my concern: that the allure of dataporn may be an unintentional stumbling block on the road to operationalizing semantic searching.

Hardcore dataporn: The internet circa 2003

Hardcore dataporn: The internet circa 2003

A few tools stand out as particularly promising.    The Processing and, I think more intriguingly, Flare visualization languages promise to put the worst of visualization drudgery at least partially in the background, and provide interactive and fluid user interface environments more likely to encourage data exploration and utilization (check out the layouts tab of the Flare demo).   The Flare dependency graph, an evolution of radial graphs, looks particularly promising both for its aesthetic appeal and visual clarity.   The dependency graph has been used beautifully by the Eigenfactor Project, analyzing – guess what? – dependencies between scholarly research journals:

Dependency graphs between scientific disciplines and journals

Dependency graphs between scientific disciplines and journals

radial_04-2

Subsetted research journal dependencies

While mapping out the interconnectivity of all science disciplines seems a bit too much like navel-gazing for my tastes, this visualization approach would map beautifully down to examining dependencies within a specific knowledge domain; following the Eigenfactor.org tool, the outer radial wedges could be specific research themes of interest, the inner, finer-grained wedges could be individual authors or papers, and … voila, a tool to visualize what I had only hinted at it my cartoon mockups.   (Not to mention a tool for many other uses … I’m already itchy to get my hands on the source code and feed our NASA employee-by-employee inventory of all technical competencies into this tool, as a way to understand flexibilities in future strategic workforce planning as we phase from one development project to the next…)

Another promising visualization approach by someone who appears to have the “user interface get-it gene” is the Knowledge Cartography project:

Visualization from the Knowledge Cartography project

Visualization from the Knowledge Cartography project

Browsing the images and videos there will give a pretty good feel for the approach, absent the underlying algorithms.   This tool seems to be screaming to be fed some intelligent back-end, automated semantic data / knowledge mining from large topical repositories.   I particularly like the fluid interface and the visual metaphor of people (or institutions, or any objects) as being “in” knowledge domains, as well as the conversion of all those wretched hairballs of network connections to density maps or clusters.  (Yes, I get it, your software can track millions of connections.  I don’t need to see them all to get my job done any more than I need to see the subroutine calls of my operating system.   It’s your job as programmer to make hay with them quietly and out of sight…)

I’ll stick to my guns and predict that such tools will first become operationally useful within very well bounded knowledge domains, whether those of specific scientific research fields or corporate technical wikis; the “universe at large” is as yet too complex a beast to map down to a small enough number of dimensions (and connections) to be useful to us mere mortals.

Posted by: brainoids | 16 April 2009

From Plankton to Hailstones

Another drive-by brainoid: while learning about the Processing visualization language, I stumbled across this beautiful visualization of a system dynamics model of marine ecosystems, which captures the evolution of three discrete categories: nutrients, phytoplankton, zooplankton (the latter two across a range of size bins):

Dynamic visualization of a multi-category nonlinear system

Dynamic visualization of a multi-category nonlinear system

This would be a perfect and beautiful visualization tool to apply to cloud physics models, which feature complex and sometimes nonlinear interactions between various categories of hydrometeors (supercooled water droplets, ice crystals, snow, graupel and hail, etc).   The tunable parameters could include aerosol or ice nuclei concentration, cloud updraft rate, ambient temperature, etc.   To run in real-time this would be more of a cloud physics parameterization visualization rather than a full-out cloud model or thunderstorm model, but could be an incredibly useful learning tool.

I love dataporn.

Posted by: brainoids | 15 April 2009

Mobile Weather Spotting (Avian Redux)

Harkening back to my March post on iPhone-enabled severe weather spotting/reporting, here’s a similar concept applied to birding.

Courtesy birdpost.com (and no, I’m not a birder – but the site ended up as a nominee for the Peoples’ Choice Webby Awards and looked intriguing).

Posted by: brainoids | 12 April 2009

Social Networking, Semantic Searching and Science

The premise.

One “feature request for the online universe” that I still carry from my previous science career can be loosely characterized by the following questions:

  1. “I’m researching topic X. What are the seminal papers in this area? Who are the primary researchers whose work I shoud read?”
    Truly semantic literature searches

    Truly semantic literature searches

  2. “Ten years ago I published a paper on topic Y, then got distracted by grant funding in another area. I’d like to understand the full ‘intellectual lineage’ of this paper, now that others have had a chance to chew on it. What has it lead to? Have questions been answered? Have new ones emerged?”

    The 'lineage' of a scholarly paper, automatically detected

    The 'lineage' of a scholarly paper, automatically detected

  3. “Who are the currently active researchers working on topics most closely related to my own research? Are there any bright new stars whose work I should keep an eye on?”

    Self-discovered scientific social networks, from publications

    Self-discovered scientific social networks, from publications. Green scientists are already "in" a user's network, grey scientists have been discovered and recommended for addition. Node size denotes similarity of research interests to one's own.

The third mockup is heavily inspired by the TouchGraph network mapping utility applied to Facebook, as shown below.

TouchGraph network map of relationships and network connections.

TouchGraph network map of relationships and network connections. A combination of node size, spatial layout, color, and connection thickness are used to denote relationships (here, mutual connectivity and relationship cluster analysis, number of photos in which multiple people appear, membership in meta-networks).

In TouchGraph/Facebook, individuals manually attach themselves to regional or institutional networks, and the graph organizes people clusters to be proximate to their networks.   Swap “manual regional network” for “automatically identified research theme” in the mockup above, but the same basic concept holds.

Scientists (or, I suppose, any scholarly researchers) know that the traditional solutions to these questions are social, and/or “manual”. Researchers typically get the answer to (1) in their formal graduate education (which may lead to “frozen in time” syndrome), by word of mouth, or by roll-up-the-sleeves painstaking literature searches. These approaches work reasonably well, although they tend to marginalize those without access to a rich social network (graduate students in a small department, researchers with limited conference travel resources, those in developing countries, those looking to bridge disciplines, etc). Conventional online social networks may offer some relief there, but as this commentary notes, social networking does not seem to be making rapid inroads into scientific communities, because the ROI isn’t readily apparent. (I’ll second this observation with anecdotal evidence of the puzzled, head-scratching virtual looks I get when I invite old colleagues to join LinkedIn. In fairness, LinkedIn perhaps has yet to demonstrate its value beyond contact management and general nosiness appeal).

Questions (2) and (3) might be waved off with the “expectation of currency” : active researchers in a field are expected to regularly review all new publications and keep abreast of those related to their interests. This is a legitimate expectation, albeit with a few caveats. First, this expectation tends to reinforce trends towards increased stovepiping and niche expertise, trends which I would suggest encourage the further “industrialization” of science, and work against scientific innovation. Second, it has limited sustainability, since overall scientific research output continues to increase (data from 1991-1998 indicate 2-10% growth per year, depending on region, and it’s probably safe to say that trend has not decreased with the emergence of e-publication efficiencies after 1998):

Growth in scientific publications 1991-1998

Growth in scientific publications 1991-1998

“But wait, there’s Google Scholar…” Yes, indeed there is, and it’s a good start. But I maintain that as wonderful as Google is, its output is still a painfully inefficient drop into actual research workflows. A checklist is still a checklist, and Google “finds” are still granular items stripped of their semantic context. I argue that for scholarly publications, that semantic context has two components: the history of (and followup from) individual papers, and the continuity in thinking and theory of specific authors. On a long time scale, scholarly research and publication is a knowledge process, and existing search tools cannot look forwards or backwards into the links in the knowledge chain. That currently happens only in the wetware of the individual researcher/scientist. I find this frustrating, since we are rapidly approaching the point when all of the necessary raw data will be accessible online, but the “back end” linkages, tools and standards to extract knowledge from that data are still lagging.

A pause for context, and a “prize dataset”

Why am I fired up about this topic? I’ll share a little history which might illustrate my frustration on the gap between what’s possible and what we have, as well as a “prize dataset” to possibly use going forward. From 1995-2005 I was fortunate enough to participate in the “Information Systems Committee” of the American Meteorological Society. The committee name is somewhat misleading: this group served as a strategic planning body for the long-term stewardship of AMS’ scholarly publications, and their migration from print to online journal archival and distribution. AMS staff are the best I’ve ever worked with, and the Society was incredibly forward looking in its approach to the migration. Some of the group’s finest accomplishments:

  1. Wrestled with online copyright issues far ahead of other groups, and helped define copyright policy standards later adopted by other professional societies.
  2. Successfully migrated the journal business model from print to online without traumatizing or cannibalizing other AMS business areas – again, far ahead of the curve.
  3. In this migration, converted the complete history of AMS publications to digital format, retaining its content. I.e., not just “digital photocopies”. This is incredibly important, and a practice which many other societies did not follow in the mad late-90’s rush to convert legacy materials.
  4. Migrated content not directly to HTML, but to a core SGML format which included basic semantic metadata and tagging. If your eyes are glazing over at this point, here’s the impact: all papers are/can be rendered on the fly not only in user interface standards of today (PDF/HTML) but into whatever standards emerge in the future. (“It’s the content, stupid.”)
  5. Identified a controlled vocabulary for the discipline of atmospheric science (the AMS Glossary). This will be important later on when it comes to semantic analysis.
  6. Fully embraced the “persistent URL” DOI standard employed by CrossRef, ensuring survivability and accessibility of all AMS content, and allowing built-in reverse and forward indexing of citations (BINGO!).
  7. Achieved all of this with a business model that ensured historical content could be made freely available to the public, after a reasonable period from its initial publication.

Here’s an example of many of these features in action (an old paper of mine). Note the persistent URLs, reverse/forward citations, as well as the full content rendered “on-the-fly” from a parent dataset into multiple formats (HTML, PDF). This is all now Standard Operating Procedure at AMS.

Why the brag session? Because of all these choices were made with an eye towards maximizing what could be done with the scientific knowledge that AMS stewards not based on the capabilities of today, but based on the capabilities we probably will have 10-20 years out, given advances in online protocols and standards, computing horsepower, lexical and semantic analysis, etc. AMS thus has a perfect “prize dataset” to experiment with some of those far-future capabilities today. Even if the rest of the world is bogged down in proprietary solutions, limited data, and jostling standards, the AMS repository could be used to prototype what “should” be possible five to ten years out.

A few key advances are needed to truly achieve knowledge-based processing of the wealth of online scholarly publications now (and in future) available:

Gaps: Universal Connectivity

Complete embracing of the CrossRef DOI standard (or something very similar to it) is a must. CrossRef is the social network “core asset” that binds scholarly publications together, through time. While its basic unit is the publication itself, the connectivity and continuity of individual authors’ thinking is something of a derived product. (Indeed, I’m tempted to think of scientists as secondary “processing” nodes in a greater flow in which the publications are the truly meaningful nodes.)

Gaps: Operationalized Semantic Analysis

Currently exploratory techniques in semantic analysis must begin to find their way into the mainstream. “Keyword frequency count” type searching is simply too rudimentary for the types of knowledge mining needed for scholarly publications. A key issue here is scientific (or indeed, any scholarly) jargon. Key words and phrases primarily have meaning only within a specific context, and typically hide a much deeper set of implied meetings and contexts. For automated tools to learn these contexts and connections, much more sophisticated approaches are needed.

Self discovered network maps could be one approach, but a potentially more lucrative tack would be to treat it as a generic problem of nonlinear dimensional reduction, i.e., collapsing a large number of words and phrases (a scientific journal article, or corpus of journal articles … i.e., a very high dimensional dataset) to a much smaller number of contextual dimensions; a “coordinate system for meaning”.   (I believe there’s some overlap with the concept of ontologies here.  I tend to think we’re likely to make most progress with “supervised self-discovered”, rather than community-developed, ontologies.  Although I wouldn’t go quite so far as the metacrap diatribe level of cynicism.).

That’s pretty abstract, but consider the examples of self discovered keyword organization by Roweis at al using Locally Linear Embedding:

Keyword organization using Locally Linear Embedding

Keyword self-organization using Locally Linear Embedding, from Roweis

A low dimensional “coordinate vector” for a set of keywords, concepts or themes would allow true “quantitative” proximity detection searching between publications or documents. Just for fun, here are a couple of other examples of LLE in action, just because it’s so cool.

A two-dimensional coordinate system for lip shape/status, from Roweis

A self-organized two-dimensional coordinate system for lip shape/status, from Roweis

A one-dimensional coordinate system for OCR character slant, from Roweis

A self-organized one-dimensional coordinate system for OCR character slant, from Roweis

Something about developing quantitative coordinate systems for complex meanings presses all my “rife with opportunity” buttons. I’d strongly recommend checking out Saul & Roweis and Seung and Lee’s work, starting with “The Manifold Ways of Perception”.

Gaps: New Intellectual Property Tools

Assuming that the development of semantic analysis schemes is probably beyond the capabilities or resources of most professional societies or commercial scientific journal publishers, some other organization (whether public or private) will probably end up doing it. For this, we need copyright license schemes which allow access to full text content, but limited only to the applications of indexing, semantic analysis and knowledge mining. Users of these licenses would be prohibited from redistribution of anything other than semantic extracts of source documents.  (Actually, perhaps new tools are not even needed.   What I am proposing as an algorithm is precisely what occurs within every scientist’s head using the subscriptions and license permissions they already have.   Somehow I suspect this nuance will be lost the first time a commercial vendor seeks knowledge-mining access to a for-profit professional society’s journal assets…)

If this issue ends up getting really thorny, one potential solution might be to involve some trusted and neutral third party, such as the U.S. Library of Congress, as the “knowledge mining broker”.

Gaps: Bringing it all together

The final step would be the marriage of semantic analysis and the “journal social network” provided by CrossRef to enable truly value-added scientific social networking. The strengths of network connections would be determined by semantic matching, thus allowing the very large reverse and forward networks to be pruned to the most relevant “threads” of thought, which is precisely the quantum leap needed to move beyond Google-type searching. This convergence of capabilities would, in theory, all the examples at the start of this article to become the norm, rather than pipe dreams.

Extensibility (or Bootstrappability)

As for piloting such capabilities, I suggested above that the AMS repository would make an excellent pilot database. However, because the problem is general (see below), there are many possible test data sets. In theory, smaller / private wiki databases could also be used to test a pilot, if there was preservation of author participation in the wiki system (I honestly don’t know enough about wiki under-the-hood capabilities to know if this could work – if it could, then perhaps partnership with Wikipedia itself would provide a launching platform to take such capabilities to national visibility).

My motivations here are largely centered on scientific research, but this is really just a special case of the problem of “getting Knowledge Management Systems to start delivering some payoff”. Knowledge management systems have been on the corporate scene for quite some time, yet consistently languish when it comes to user satisfaction. (See the  Bain Management Tool Survey 2009 , slide 15) (And from my personal experience, rarely go much beyond the “common data repository” stage – at best). I suspect the immaturity of back-end, knowledge extraction capability is the stumbling block: without this capability, it becomes difficult to justify the very large startup and maintenance costs of knowledge management systems efforts (not to mention the significant additional overhead on individual workers and contributors to populate and maintain the content).  This article on knowledge management system challenges is a bit old, but has some very good perspectives.

Precisely because of this coupling to knowledge management and the business sector, I worry that the potential payoff of these semantic search capabilities is so high that their first operational emergence will be proprietary and patented, and that this in turn will price the capabilities beyond the reach of nonprofit or public sector research organizations. This would be very bad for our national “competitive edge”. In a world where our information (knowledge) assets are a key source of competitive advantage, a national (public sector) investment in our ability to access and utilize this knowledge seems about as fundamental and foundational an investment as there could possibly be. At the risk of sounding hyperbolic, I’d liken this to DARPA pioneering of the internet. The national investment helped lead to global capabilities, but because of our relative global position in intellectual capital, the U.S. economy was at the forefront of profiting from the infrastructure investment.

Posted by: brainoids | 10 April 2009

Virtually Mirroring The Physical World

A recent conversation about upgrading some of our NASA public exhibits to “self-narrated tour” capabilities has sent me down a speculative rabbit trail of thinking more broadly about virtual interactivity with physical world objects.   Museums have, in some ways, paved the way here: from self-guided Walkman tours to self-guided iPod Shuffle tours to museum cell phone tours, allowing portable end-user interactivity, but I think this interactivity is typically decoupled (absent user intervention) from real world objects.   I think (and I suspect many others have thought) that it’s not too far downstream that we’ll have sufficient standards, automation and capabilities in place to allow key (if not many) real world objects to accessible, virtual-world analogues (yep – full circle to the Coffee Pot Webcam).

RFID seems to be another piece of this puzzle, but relies on custom hardware not (yet) accessible to the general public.   The world where, for example, key machinery in a plant has embedded RFID which, when scanned with a custom reader and back-end software, allows a laptop-enabled user to access its technical specs, service instructions, service history, etc probably already exists.   The world where a company (or individual) doesn’t have to invest in custom reader interface hardware/software probably doesn’t yet exist, and it’s that world I’m interested in.   The world where this model extends even further and allows average Joes-on-the-street to directly interact with (learn about, bookmark, GPS-locate, inventory, photograph, comment on) objects beyond industrial machinery is even more intriguing.   PDA-plugin reader cards are getting us close to it, but still require a dedicated investment on the user end.

I think two key advances are needed for this to go viral and spread beyond quaint “old economy” expensive and proprietary niche implementation:

  1. Miniaturization / embedding of RFID (or other next-gen ID tagging) antennae (does Bluetooth have a role here?).   Yes, I want my 8th generation iPhone to be able to probe the world around it, know what’s nearby, and quickly take me to the virtual ‘home’ of an object and let me learn or interact with it.   This could be a museum exhibit (narration and background info), book at the library or bookstore (reviews), piece of machinery I’m working on (specs / instructions / inventory / geolocation / maintenance history), used car I’m buying (history of insurance claims against it), you name it.   Sure, I could fire up the iPhone and Google the generic item, but (a) I’m lazy and that’s a comparatively clunky user interface, and (b) sometime the specific item is of more interest than the generic one.   (Note bene, it’s entirely possible that cloud-based image recognition / Googling could outpace the need for items “broadcasting” their existence.  The MIT Media Lab is already “going there” with Sixth Sense. If my PDA camera can upload to a smart visual identification Google-system in the sky, RF broadcasting may end up being very quaint and 20th century.   If I were Google, I’d be thinking very hard about automated “semantic” image recognition / classification, since far, far more smartphones will have cameras than RFID readers in the foreseeable futures.  ”Oh, that’s a bar code; oh, that’s a VIN; oh, that’s a Toyota Corolla 2004; oh, that’s an Ares V rocket model” … you get it).
  2. A protocol for handling arbitrary RFID (or other) object identification and routing to live web sites.   Proprietary RFID-based asset management systems probably provide this as customized sandbox software; we need an open and global version for the rest of the world.   Something like persistent URLs or CrossRef’s DOI system for scholarly articles could be models.

What’s really intriguing about pairing this capability with cell phones is (ala iPhone) the pairing of identification with geolocation.   This truly opens the door to a virtual “mirror world” somewhere down the world; a geotagged/georeferenced “SecondLife” that’s accessible portably and seamlessly.   I doubt this is novel thinking, but it is very exciting…

Posted by: brainoids | 29 March 2009

Sustainable Earth Observations and Space Exploration

I don’t plan to use this space often to write about work-related topics, and think it would be inappropriate to do so for anything within NASA’s current congressionally authorized mission. There are, however, a couple of “far downstream” topics I do feel strongly about, and occasionally will hit them here. So long as no one on the Hill is (yet) paying us to work them, I believe they’re fair game for a private blog.

A Game-Changing Capability

Politics-willing, within the next decade the United States will regain a strategic capability it lost nearly 40 years ago when we ended the Apollo (and Saturn) programs, namely truly heavy-lift launch capabilities. The Shuttle is an incredible launch system, but can only deliver a fraction of the mass to (low earth) orbit that the old Saturn V’s could. NASA’s Ares V “big heavy”, a key element of our Constellation Program to return to the moon, and establish a human-tended outpost there, will restore that heavy lift capability and then some.

Ares V Heavy Lift

Ares V Heavy Lift

The lunar exploration program is exciting enough in its own right, but the prospect of once again having heavy-lift capability has many forward-looking scientists fairly dialed up about the possibilities as well.

Astrophysicists are picturing, in the 2020’s-2030’s time frame, truly mammoth new observatories which literally dwarf the Hubble Space Telescope. (For those who have seen the engineering model of Hubble at the National Air and Space Museum, which towers in one exhibit hall, think that size scaled up by a factor of 3-4). For astrophysicists, it’s size (width) that matters – wider optics unlock the ability to peer deeper into the universe (and hence earlier into its evolution). Wider is heavier, and a “big heavy” Ares V launcher allows optics never-before dreamed of.

Planetary scientists see the new vehicle as bringing the outer solar system a large step closer to home. For them, the game-changing feature of Ares V is mass: the “heavy” in heavy lift allows much more complex robotic probes reaching their destinations more quickly, with more fuel/power, longer missions, and greater telemetry, not only visiting Jupiter and Saturn and their moons, but someday even bringing bits of them home to Earth to study.

Whither Earth Science?

Amidst this excitement, we might expect earth scientists to also be scrambling to invent new concepts to tap the game-changing capabilities of an Ares V. But the climate and earth science community has responded with a big yawn, barely even aware that the launch world will soon radically change. Part of this is due to the way our current science “machinery” works: the National Research Council has just finished its first Earth Science “Decadal Survey”, which lays out the new instrument and mission priorities for the 2010-2020 time frame. These include development of a long list of exciting and important – but short-duration – new missions to probe different aspects of the earth system from space. In short, the numbers have been handed out at the deli counter, and everyone is figuring out what to do while waiting their turn. The world beyond 2020 as yet holds little interest.

This disinterest is not surprising; the earth science community is challenged enough in developing instruments to explore the earth and climate system in new ways and with ever-higher precision. Cost overruns in earth science missions are, perplexingly, systematically higher than planetary or astrophysical science missions (my own suspicion is that we push then envelope of earth science instrument technology advances so aggressively that what should be the simplest, close-to-home space missions to develop end up being some of the highest risk). In such a world, heavy lift doesn’t sound very exciting, since we can barely afford the systems we launch with “conventional” rockets.

A Broken Business Model?

This, however, is precisely the problem. The current national “architecture” – or, perhaps, “business model” – for earth science and climate observations is fundamentally broken. To oversimplify: NASA develops a number of “neat new toys” which fly short-duration missions to peel back different corners of understanding of the earth system. (Make no mistake, these toys do enable critical basic science research, but ultimately they are “lab experiments”, not long-term campaigns). In an ideal world, these get “operationalized” and handed over to NOAA to develop into long term, stable, systematic measurements – the baseline against which we can measure earth system change. Continuity, stability, and quality (precision) are the most fundamental requirements for orbital observations of the earth system, if what we care about is understanding its natural and human-driven change. These requirements are precisely what we do not achieve with today’s business model – beyond the same few “core” observations we’ve measured for decades – because the “handoff” to operationalization doesn’t happen. The business model that has self-evolved (and I say “business model” intentionally, since fundamentally it is a model that “feeds” funding to the earth science industry of scientists and government labs) has placed novelty (new capability development) at the top of the requirements list, trumping continuity, stability and quality. The simple truth is there is not – and will never be – enough funding to meet all of those requirements at once in the same architecture, and while the current approach has significant forward momentum, it provides little in the way of sustainability.

The current plan for continuous, systematic measurements underscores this breakage. NASA’s core “EOS” low earth orbiting satellites (Terra, Aqua, Aura) were emplaced around 2000 and were not designed for long term missions. The “replacement” system, NPOESS, being designed by NASA, NOAA and DoD will not be ready until the middle of the next decade, again features individual satellites with fairly short lifetimes, and has development costs which are already spinning out of control. Relatively modest changes in budgets have also led to de-manifestation of critical sensors, threatening significant holes in our long-term climate baseline record. Once again: this is not a sustainable business model. Only a few brave souls are thinking about “what comes next after NPOESS”, and radical architecture changes are not currently in the trade space. The time for such thinking is now: flagship development efforts are generational, and as the “next big thing” nears its development phase, we already need to be thinking about the “next next big thing”.

Which gets, finally, to the punch-line: What could (or should) earth scientists do with a new national game-changing strategic capability (heavy lift launch)? I think the top-level answer is straightforward: tackle the greatest challenge, namely, the broken business model. The answer isn’t as sexy as huge new astrophysical telescopes, or wonderful new complexity in planetary exploration missions, it’s much more mundane: design for continuity, because continuity of earth system measurements is “what it’s all about”.

Sustainable Observations

Rather than using the huge new mass capability of an Ares V to develop wider apertures or antennas (higher resolutions, new bands), we can apply it towards longevity, and build Earth science observatory platforms and instruments which:

  1. Are “fueled up” to maintain their orbits for very long missions (apply mass towards fuel).
  2. Are refuel-able, whether through human or robotic servicing missions (apply mass towards grappling or docking systems).
  3. Are highly rad-hardened and shielded, to protect against degradation and failure of systems by the harsh space radiation environment (apply mass towards shielding).
  4. Are highly redundant, with few or no “single string” systems (apply mass towards redundancy).
  5. Are, like Hubble, upgradeable, by human or robotic servicing missions. One behemoth of a mothership would feature standardized interfaces for upgrades to core measurements, as well as “trial” measurements.
  6. It would also be very interesting to scrub the current and planned sets of Earth observations and determine how many could be achieved with “shared front end optics” followed by split back-end detectors.

To demonstrate this has value from a business/architecture perspective, we’d have to document how much of the total long term national investment in earth/climate observations using the “many short standalone missions” model goes into building what may be redundant buses (platforms), redundant optics, intercalibration and validation efforts, etc.

A side-benefit of the mothership / observatory approach is true co-location of measurements, which any earth scientist knows provides a nonlinear return on investment for the same observation. The TRMM mission is a perfect example.

The elephant in the room, of course, is “how do we avoid the NPOESS debacle” in such a model (massive cost growth and de-scoping of capabilities). I think the answer lies in holding to the postulated top-level requirements: continuity, stability and quality. This would not be a Christmas tree off of which we hang the most bleeding edge, riskiest instruments; but a generational national investment in climate measurement. Anything hung off this Christmas tree would be required to follow best practices of mission cost containment: disciplined and restrained requirements development, extended Phase A’s and Phase B’s before development, stringent requirements on component technology heritage and maturity (TRL), extended testing, etc.  The associated costs or this more ‘measured’ approach are borne by weaning ourselves from the current rush-to-fly-the-next-toy-and-maximize-the-number-of-toys architecture, with all of its redundancies and cost inefficiencies.

“Building a tank” is by no means the most inspirational pitch for leveraging a new heavy-lift launch capability towards benefits here on Earth. But perhaps there’s a highly relevant message in the subtext: as the country (and hopefully world) slowly revectors towards awareness that our Earthbound practices must begin to embrace sustainability as a fundamental design principle, we should be applying that same thinking to our space-based monitoring of the Earth itself. “Ares V for Sustainability” – sustainability of the architecture, and business, of earth observation – is a slogan I can get behind, and even get fired up about.

Posted by: brainoids | 20 March 2009

Personality type redux

As a “repeat-tester” Myers-Briggs Type Inventory INFJ, I’m amused to find many online descriptions which include the phrase “loves to take personality tests” – because I do.   INFJ’s are supposed to be “people systems” people, so I suppose that makes sense.

Narcissistic aside and blog-self-justification: For INFJs the dominant quality in their lives is their attention to the inner world of possibilities, ideas, and symbols. Knowing by way of insight is paramount for INFJs, and they often manifest a deep concern for people and relationships as well. INFJs often have deep interests in creative expression as well as issues of spirituality and human development. While the energy and attention of INFJs are naturally drawn to the inner world of ideas and insights, what people often first encounter with INFJs is their drive for closure and for the application of their ideas to people’s concerns.

One of the aspects of MBTI that has always mildly bothered me has been that 4 data vectors (introvert/extrovert, intuitive/sensate, thinking/feeling, perceiving/judging) is just a wee bit too much for the average Joe to keep loaded in memory.   Something is needed to boil the system down to its essence for practical use.

The folks at 4-D leadership coaching may have (inadvertently) done just that.   While 4-D does much more than just personality typing (and I’d highly recommend their services for coaching leadership teams), I’ve been most interested in their quick-and-dirty typing system.  As I’ll show in a moment, I like to think of it as “rudimentary MBTI for executives”.

The 4-D scheme boils down to two axes:   whether individuals tend to focus on ideas vs people (or, rational/logical vs personal), and whether individuals tend to focus on the present vs the possible:

4-D model primary "axes" of personal focus

4-D model primary "axes" of personal focus

As with any inventory, people answer a series of questions and based on the answers find how much time they spend in each of the four “quadrants” above (note: it’s not all-or-nothing; typically people will have a dominant quadrant and a “blind spot” quadrant).    This yields 4 types of preferred behaviors:

4 types of behaviors in the 4-D model

4 types of behaviors in the 4-D model

I scored a strong “green” (valuing) when I took the test, although my “blue” (visioning) was pretty high too.    “Yellow” and “orange” (relating and directing) were my blind spots, which is fairly consistent with experience – I have to spend extra energy and attention at work towards not overlooking those areas.   They’re learned and self-reminded behaviors rather than innate one.   Under the 4-D model, a strong individual (and team) is well-balanced among the four quadrants.

Quick readers might see where this is going.   The 4-D axes are remarkably similar to the MBTI “inner” axes of Sensate vs iNtuitive (present vs possible) and Thinking vs Feeling (ideas vs people).  These axes loosely correspond to how individuals prefer to gather, and process, information.  MBTI purists will complain that all for ‘coordinates’ are necessary, but remember, this is shorthand for everyday use.

Mapping the 16 MBTI personality types into this space yields:

Myers-Briggs personality types mapped to the simplified 4-D space

Myers-Briggs personality types mapped to the simplified 4-D space

I’ve also taken some extra liberties of arranging the introverts and perceivers more “inside” the box than the extroverts and judgers.   This yields a pretty coherent arrangement: the Myers-Briggs types do tend to correspond meaningfully to the 4-D behaviors (Architects, Invetors, Masterminds and Field Marshalls are indeed “Visioning”; Teachers, Counselors, Champions and Healers are indeed “Valuing” (“Cultivating” is also used to describe this quadrant in 4-D).   This also helps overcome a frustration with MBTI, which is that while my core type is INFJ, in the various areas of my life, I know I have to operate in different modes, which is what 4-D is all about.

Bookmark and Share

Posted by: brainoids | 19 March 2009

Federal spending sand chart

Not a bright idea per se, but a handy data visualization to share:

This is a sand chart of Federal budget outlays, normalized to FY09 dollars, going back to 1962.   You can drag the stack bars at top to get different top-level views, as well as drill down using the menu at left.  Note also the “percentage” button at lower right, which allows for viewing in a ‘percent of portfolio’ stack; looking at each major category at left while in this mode helps show how our national spending priorities have evolved over time.

Federal budget outlays, 1962-2009, in 2009 dollars

Federal budget outlays, 1962-2009, in 2009 dollars

I’ll update it with current and forward-looking estimates once the full FY10 President’s Budget Request comes out in April, and probably backfill some key categories back to the 40’s as well.

Some interesting slices:

  • Drill down Human Resources -> Health -> Health Research to see the “NIH Doubling” (now a tripling).  The parent Health category is also eye-opening.
  • National Defense -> National Defense -> Department of Defense breaks out the 20-year defense cyclicals, with swings of ~$150B-$200B.
  • Other Functions -> Administration of Justice is also fairly amazing (remember, these are already corrected for inflation).
  • Digging around will also reveal some key national crises, like the 70’s energy crisis, the savings and loan crisis, and Katrina.

Bookmark and Share

Posted by: brainoids | 18 March 2009

iPhone Color-Blindness Correction

It has always amazed me how poorly-served the color-blind population is when it comes to automated, seamless solutions and adaptations to their condition in the digital and print worlds.   I first realized that color-blindness was nowhere near as rare as I had thought it to be while working as a scientist and making very heavy use of color-encoding in my data visualizations; this was an effective way to learn that a full 2% of the population are dichromats, and around 5% have some form of color blindness.

One of my visualizations in full color

One of my visualizations in full color

The same image with simulated deuteranope (red/green) color deficit.

The same image with simulated deuteranope (red/green) color deficit.

The details of the above visualization aren’t terribly important; suffice it to say that what the viewer needs to “get” is that different colors (in this case, types of cloud/storm vertical profiles) live coherently in different areas of the data space (here’s the full writeup).   Deuteranopes are significantly disadvantaged when I show them this chart.

What is particularly perplexing is that automated solutions should be fairly trivial.   Mathematically, this is just a problem of finding an optimal transformation (mapping) from a three-dimensional data space to a two-dimensional data-space (for dichromacy), allowing for some adjustable parameters based on a particular individuals type or degree of color blindness.   Not hard, and easily achievable in real-time given current computing power. 

These folks have some rudimentary “Daltonize” algorithms assembled together on a web page, as well as a Photoshop filter, which illustrate what the results could be (there’s also a Firefox extension to test/simulate).  Interestingly, the simple algorithm above doesn’t do terribly well on my test cases above; I think this is because they, by design, arrange “related” colors next to each other, so the Daltonize algorithm’s ability to boost contrast is overridden by the brain’s desire to overlay coherence.

Great opportunity for:

  1. A real-time, iPhone-based color correction “color loupe”: simply a tap into the iPhone’s onboard camera viewfinder, with some back-end real-time color correction.   At minimum, this would help color-blind individuals with what (I assume) are all those frustrating instances (not just scientific data visualization) where data are encoded into colors.   Based on near-term iPhone usage (17m sold through Dec 08), and the stats above, this is something like a 250,000 – 1,000,000 person market.   Nice return for a $0.99-$1.99 price point app.
  2. Apple to get on the game, and get an edge with this 5% of the Windows user base, with native, real-time Mac color correction (including support for print correction via ColorSync).   Actually, given how forward-looking they have been with Universal Access, it’s just plain mystifying that they haven’t hit this niche by now.  (Note: there appears to be a Linux/GNOME applet which provides some type of filtering.)

What am I missing?   Has someone snapped up all the patents and squirreled them away, blocking any kind of systematic solutions?   This is way overdue…

Bookmark and Share

Older Posts »

Categories