Consumer-First to Build the Semantic Web
September 1st, 2009 by Peter Sweeney (@petersweeney)
How Do We Roll Out The Semantic Web? Paradoxically, the fast track may involve getting help from billions of people who know nothing about the Semantic Web and have no interest in it.
Challenges with current approaches
Most of the current approaches to building the Semantic Web focus on content. We create semantic representations of existing assets such as databases, documents, and social media. Machines “read” this knowledge and execute tasks on behalf of consumers. In the Semantic Web world, the approach is content first, consumers second.
Unfortunately, semantifying content is proving to be an extraordinarily daunting task. When we expand the scope of the problem beyond our existing content assets to include knowledge generally, in all its subjective and boundless glory, the challenges of a content-first approach becomes clear. We need alternative strategies, and more importantly, many hands on the problem. (Note 1: Problems)
The consumer-first approach
One such alternative approach reverses the order of operations. It puts the focus on consumers before content. On the surface, this may seem outlandish but, in many ways, it provides a more semantic and scalable roadmap.
Under a consumer-first approach, we build systems that let consumers create the semantics that reflect how they approach, interact with, and experience content. Rather than describing the content, we represent aspects of the consumers’ thoughts, perspectives, and intentions. These consumer knowledge models can then impose a structure on the mass of unstructured content on the Web.
The major difference here is that the semantics and structure are discovered through the expectations that consumers have for the content, rather than being imposed by knowledge engineers in advance.
Note also that consumer-first is not Web 2.0. While Web 2.0 collaborative processes are obviously consumer driven, they are often framed within the task of annotating content as opposed to annotating the mental models of the consumers themselves. Further, the complexity of semantics and knowledge representation demands a Web 3.0 industrial approach to simplify things for consumers.
Example of consumer-first
As an example, let’s take a real estate application. Under a content-first approach, developers would provide some structure to organize the real estate properties in the database. To be effective, they would attempt to anticipate the needs of the consumers, and semantically annotate the property listings accordingly. Finally, consumers are asked to search through these structured listings to find a property that meets their needs.
Within a consumer-first approach, the system would begin by creating a mental model of the consumer’s requirements, a representation of their “dream home” if you like. Once the consumer articulates their “dream home”, it is used as a lens through which the property listings could be evaluated.
Given a formal structure of the consumer’s input, the property listings can remain in a largely unstructured form. However, in the process of filtering the listings through these mental models, each property could be annotated with the semantics identified by consumers.
In other words, semantic webs evolve as a by-product of this consumer-first process, instead of a bottleneck at the start. Technically, the problem shifts from the knowledge representation of existing content to the knowledge representation of abstract thought.
Technical feasibility and testing
To the question of technical feasibility, there are many existing technologies that would support this consumer-first approach: NLP to translate natural language; Web 2.0-style workflow such as semantic wikis; ontology editors; and many others. But critically, it requires a break with the past, putting the focus of the knowledge modeling unambiguously on the consumer wants instead of the producer wares. (Note 2: Technical comparisons)
At Primal Fusion, we’re testing the productivity of this consumer-first approach through our thought networking service. You can begin exploring this approach through our current alpha release (demo video, product walk-through, free registration).
Thought networks are semantic data structures that capture the mental models that people create as they work with content to complete their tasks.
Consumers brainstorm using the Primal Fusion service to create thought networks of the ideas they want to express. A semantic synthesis technology builds data structures in response to the consumers’ interactions. Once these mental models are created, software agents interact with Web services such as search engines to accomplish tasks such as searching, filtering, harvesting, and organizing information. (Note 3: User interactions)
As software agents interact with the semantic data and the Web, formerly unstructured sources are annotated with the semantics provided by consumers. In our alpha, a text filtering process categorizes and describes the content within expansive knowledge models generated by the consumers.
The result is a significant production of semantic data with minimal effort from consumers; the software provides the heavy lifting in both the construction of the thought networks and the tasks of the software agents.
Consumers create this valuable semantic data as a by-product of their activities, costing them little in terms of their time or the skills required. At the moment, we are working to express the output in open vocabularies and plan to publish these semantic representations as Linked Data.
Scalability
In terms of scalability, this approach may seem counterintuitive. Our content assets are very tangible, while our thoughts and intentions are very abstract and seemingly boundless.
However, this focus on the consumer reduces the unbounded nature of knowledge to two very tractable dimensions: How many consumers do you need to serve and how many interactions (or tasks) do you need to support at any given time?
On closer inspection, both aspects are quite finite, and intuitively so. It is far easier for a consumer to provide a snapshot of their knowledge directed to a specific task, than for a producer to try to anticipate all the possible perspectives on their content.
Obviously, on a global scale, the Semantic Web remains an extremely large undertaking, but consumer-first offers an important scalability benefit. The factors of production—the consumers driving this process—move in lockstep with their consumption.
As consumers begin to capture their mental models and tasks in this formal way, there is a decreasing incremental cost due to the potential for individual reuse. We can reapply aspects of the same mental models to many tasks.
Billions of semantic generators
There is so much focus on the complexity and intricacies of making our content semantic and machine-readable that a simple observation is often overlooked: All of this effort is aimed at meeting the needs of people.
Creating knowledge models of content is valuable, but only one side of the coin. We also need to create knowledge models for people as the consumers of content, to give a voice to their perspectives. But most importantly, we need to recognize that every person is a semantic generator; collectively, we are the ultimate meaning-making machines. How can we better devise solutions to put those billions of semantic generators to work on building the Semantic Web?
Notes
- Problems with content-first: The problems of scale in a production-centric model are well documented. Creating semantic representations for existing digital content is prohibitively expensive due to the amount of online content and the compounding effect in the volume of data needed to create machine-readable semantics. Further, there is a need to incorporate personal semantics, representing the interpretations and viewpoints of the consumers of the content that further complicates the problem. Lastly, as compared with the models, standards, and protocols that preceded it, the Semantic Web has placed a much higher barrier to entry on producers, a stark change from the very accessible WWW.
- Technical comparisons: While various technologies have an affinity with a consumer-first approach, they should not be confused as implementations of a consumer-first approach. For example, NLP may be used to transform a natural language statement into a formal, machine-readable structure. Under a content-first model, it may be used as a gateway into the exploration of existing knowledge models, while under a consumer-first model, it may be used in the creation of knowledge models. Similarly, tools such as ontology editors may be used by knowledge engineers or simplified for use by laypeople, depending on the overarching implementation strategy.
- User interactions: One of the key challenges of this consumer-first approach in the field of knowledge representation is that tactics are needed to manage the complexity of the activity. This is a key differentiator from existing Web 2.0-style approaches, in which consumers are collaborating directly and with full knowledge of the activity. Here, we must leverage implicit and indirect means to relate consumer activity to knowledge representation.
Tags: consumers, Data Web, personal semantics, Semantic Web, Semantics, software agents, synthesis, Thought networking, Web 2.0, Web 3.0





2 Responses to “Consumer-First to Build the Semantic Web”
November 2nd, 2009 at 6:59 pm
Thanks for this fantastic post! It is a breath of fresh air to read about an honest to goodness new idea in this space. I explored the idea of user-generated semantics for building the semantic web a little bit in grad school, but never considered this particular approach. I expect you will have many many imitators – starting with me
. One question. Do you have any plans for building/using aggregates of these consumer-created models? It seems that, once you get going, these are going to form a really interesting resource all by themselves. Thanks again, I’m excited.
November 4th, 2009 at 11:39 am
Benjamin, thanks for the feedback. We do have plans for building/using aggregates of these consumer-created models. I’ll try to provide more details here in a follow-on post.