Information Overload

Tagging to Increase Information Accessibility

Pori (FI), November 2009 - Harri Ketamo, PhD, is a Principal Lecturer at Satakunta University of Applied Sciences and Adjunct Professor at Tampere University of Technology. His research focuses on conceptual learning, complex adaptive systems, machine learning, user modeling, and game AIs. Currently, Harri is developing teachable agents that can be used to help people to deal with information overload. At ONLINE EDUCA, he will let the cat out of the bag as to how this is done.




Who suffers most from information overload: teachers, students, employees, or other groups?

Harri Ketamo: Nowadays, there is essentially an infinite number of pieces of content in web. The challenge today is not to find the information or content; the challenge is to find the most relevant pieces.

I think the largest group of "sufferers" includes all of us who seriously use social media, especially user-generated content, in our work. In my studies, I have focused on teachers and educators. However, this does not mean that the solutions are not valid for students and other information workers.

How can a media agent solve the problem?

Harri Ketamo: A common method to increase information accessibility in social media is tagging. However, when tags are used only as single keywords, we easily end up with information overload. Furthermore, in social media, we do not have a standardized way to tag content. In fact, tagging the content in an optimal way is a difficult task for several reasons. These include cultural background, educational background, community and its social behavior.


Another problem area is context: where tagging is constructed affects the selection of tags enormously. Tagging is very subjective, and more research is needed in order to improve user experience and information retrieval in social media. Unclear, or in the worst case misleading, tagging leads to information loss in social media.

The key to a solution is in the explanative power of networked tags: If a tag is very frequent, its explanative power is low. Furthermore, when the tag is used rarely, it is not useful for searches. By constructing a proximity-based network, which between the tags can be called semantics, we receive a more generalized view on tagging. By using this complex semantics, we can improve the usability of the whole tagging system in general. In fact, media agents focus on general tag semantics; they are not specialized into social media systems.

Media agents can learn the conceptual structures in terms of conceptual learning. Technologically and computationally, the agents are based on semantic neural networks. At the beginning, the end users get their own agents, with which they interact. The semantics learned by the personal media agent evolves through all assigned tasks: The learned semantic context is always a background for new tasks.

Why does the media agent have to be teachable?

Harri Ketamo: We all have our own individual goals for our information seeking. In traditional search systems, users can define their search in several ways and in most advanced search engines, learnable methods are also applied.

However, if users cannot evaluate the results, the parameters behind the search remain the same. In this case, users have to figure out a new kind of a search. In the teachable approach, the parameters - semantic networks in this case - are adjusted and fine tuned through every evaluation. By teaching the agents, users can define their goals in a computational way.

Teaching is not the only method to override the challenges in personalization and adaptation, but we have had really good experience with applying computational learning and adaptation in other educational contexts: namely, games.

Media agents, their brains in fact, are based on our previous work, AnimalClass, which was introduced at Online Educa Berlin 2006. In AnimalClass, learners can teach conceptual structures about mathematics, sciences, languages, and arts to their virtual pets (teachable agents). The main difference between teachable media agents and AnimalClass is in their respective philosophy of learning. Whereas teachable agents in AnimalClass were taught in terms of inductive learning, the teachable media agents are taught in both deductive and inductive means.

Is the any experience in regard to the amount of time I have to spend before the media agent works and how much time I can save using a media agent?

Harri Ketamo: First, I have to say that teachable media agents consist of computationally very heavy processes. Currently, the agents are running on a reasonably powerful, but single, server. Therefore, there are relatively long delays between user interactions, several minutes in the worst cases. After the research and development phase has been finished, the agents will be moved to a more powerful computing environment, which aims to reduce the delays to a tolerable few seconds.

But answer to the question: After the first search, the agent has learned a general semantics of one theme. This takes several minutes. The semantics is relatively sharp at that point, but in most cases, the rank-ordered list of contents requires some adjustments. Adjustments to semantics and the rank-ordered list are computed according to teaching (evaluations) the user makes. Depending on complexity of the theme, the rank-ordered list gets more relevant during every evaluation.


Normally, after tens of evaluations, the system has reached a sort of balance, and new evaluations do not affect results. Evaluation as an interaction takes only seconds, but viewing, reading, or listening to the content (before evaluation) may take more time. After evaluations, the semantics and rank-ordered list will be re-computed. As mentioned, the computing related to semantics is a relatively heavy process. Therefore, the semantics is not necessarily re-computed after every evaluation, though this takes place as often as possible.

How much time can be saved? If I'm looking for only few clips to show in the classroom, I may actually be able to find them much faster by using traditional searches. However, if I'm looking for content about some well-defined theme and I want to find the most relevant pieces of content, I may save several hours.

Are there other advantages than saving time?

Harri Ketamo: My favorite additional advantage is the maps of the search contexts: The most relevant tags and their relations are visualized as a tag cloud. In addition to a traditional tag cloud, this one is also computed by applying cluster analysis in determining the places and neighborhood of the tags. This ensures that neighbor tags are strongly related to each other. Furthermore, the tags that have strong explanative power are placed against a red background. In traditional tag clouds, the tags are in random order, and you cannot figure out the real explanative power of a certain tag.

By reading theses context maps, users can quickly understand the semantics of the search context. From the maps, users can see the most relevant tags and their neighbors. Furthermore, users can immediately find the tags that are not relevant to their aims, but which can lead to unexpected results. Only by reading the maps can users make more precise searches with, e.g., YouTube's search engine. In fact, the maps can be considered as analysis reports of the search context.

More details in the session "Artificial Intelligence for eLearning" on Friday, 04 December
2009, 16:30 - 18:00, room "Köpenick I"