This piece was written for IS 260: Information Structures, taught by Professor Gregory Leazer as a requirement for the Moving Image Archive Studies Program at UCLA during the Fall of 2011. Professor Leazer’s course examined a variety of ways and models of information structures and how they were at play historically and currently in the information science world. Within this class, we looked into all manner of cataloging, descriptive work and the development of the information systems that currently inform our digital and analogue worlds. My interests became quite intent on the manner in which we, as humans, were able to interact with the machinations of each system and work together in order to create more interactive systems, such as folksonomies and the usefulness of controlled or uncontrolled vocabularies in certain informational structures. This work was one of my final pieces for the class. This research for and writing of this piece really solidified my interest and dedication to ideas of user access and content and its direct relationship with technological systems.
Folksonomy, folksonomy, what is a folksonomy? Aside from a funny sounding word, it is a specialized kind of classification system that gives primary labeling control to the user. Technically, this tagging system is a method of categorizing all kinds of data into a format that individuals find usable and findable. Is it cataloguing? In a sense, it could be seen as a modern kind of cataloguing. Does it maintain the same kinds of principles that traditional cataloguers use to synthesize and organize information? Not exactly, but it has created an entirely new way of looking at libraries, literacy and the user; one that cannot be ignored. Within this study, we will look at the way that folksonomies have affected the very concept of information retrieval, as well as what that means to traditional, formalized classification systems. We will gauge not only its effectiveness, strengths and limitations, but also its ability to evolve. Finally, through looking at various live examples such as LastFM, Flickr, and LibraryThing, we will be able to understand the plethora of ways in which folksonomies operate and be able to understand the dynamic nature of this modern tagging system and see that, while still problematic, it provides real use for real people on a daily basis.
Although tagging itself has been around for far longer, the word “folksonomy” came into being as of July 24, 2004 due to one, Thomas Vander Wal. According to the blogs of both Gene Smith and Vander Wal himself, a question came up during an online conversation on the Asylomar Institute for Information Architecture closed list (now simply called the Information Architecture Institute). Smith himself had asked what the members had thought about “the social classification happening at Furl, Flickr and Del.icio.us. In each of these systems people classify their pictures/bookmarks/web pages with tags (e.g. wedding), and then the most popular tags float to the top (e.g. Flickr’s tags or Del.icio.us on the right).”(Smith) After Eric Scheid from Australian-based Ironclad Information Architecture suggested the term “folk classification,” Vander Wal’s immediate response was to suggest the term “folksonomy.”(Vander Wal)
Vander Wal’s concept for the term was simple. It was based on the idea that he would enmesh the word for the people doing the tagging (folks) and the Latin suffix for rules and laws on a particular topic (onomy). Much academic work states that the term “folksonomy” was created from the word “taxonomy.” While this is not a false account, it is important to look at the structural linguistics involved, seeing as folksonomy is quintessentially based in a careful and meticulously posed use of language. Vander Wal notes,
I am a fan of the word folk when talking about regular people…I was also thinking that if you took “tax” (the work portion) of taxonomy and replaced it with something anybody could do you would get a folksonomy. I knew the etymology of this word was pulling two parts from different core sources (Germanic and Greek), but that seemed fitting looking at the early Flickr and del.icio.us. (Vander Wal)
The very methodology that was used to create the term “folksonomy” could be read as the philosophy behind tagging itself. Instead of relying on familiar vocabulary, Vander Wal decided that it would be more advantageous to reflect the hybridic classification system by creating an entirely new term. While folksonomic systems have now grown from those early infant days into strong adolescent information structures (containing just as many problems as your average teen), the linguistic engagement Vander Wal invented has also grown into itself, gaining more meaning as more “folks” use it.
The Non-Traditional Tradition: Formal Classification and User Tagging
While concepts of classification have been in use since information studies began and Paul Otley grabbed his first index card, this unconventional history for the term “folksonomy” is heavily reflected in the non-traditional structure of tagging itself. As a rule, the systematic labeling of data has been left to professionally trained individuals, and the normative state of affairs has been very orderly and disciplined. Information has been strictly organized within terms of controlled vocabularies like subject heading lists, thesauri, or ontologies and by categories and classifications. All in all, this is not a bad way of conducting business. As has been stated, “this orderly approach to cataloguing allows for both the validation and quality control of known terms to be registered within an information system. “(Hammond, Hannay and Lund) However, upon the introduction of non-traditional technological structures such as the internet came non-traditional methods of inputting information. Instead of the same top-down ways of developing metadata, technology delimited the space and turned it into a more intimate bottom-up way of doing things. Tagging and the world of the folksonomic were placed within the hands of the people.
In their 2006 article, Marieke Guy and Emma Tonkin discuss that folksonomies could never replace traditional forms of information classification. But they see this very fact as “the core quality that makes folksonomy tagging so useful.”(Guy and Tonkin) Truly, it is not that tagging has come to replace formal standards of classification as much as it has organically grown out of what is already here and will now add to our previous concepts and definitions. One has to consider that with the exponential boom of the internet, could we really have kept up with the onset and flood of information that it brought? The sheer amount of information on the web is so vast that it dwarfs our abilities to keep up with it in a traditional sense. Tagging and community/shared classification systems are quite possibly one of the only ways that we may be able to keep up with technology that changes, literally, every minute. The idea of “social sharing” may be the thing that saves us and keeps our information, as problematic as tagging is.
Formal, trained classification is not to be left behind. It is necessary. But it has its own problems, especially when looked at within the spectrum of tagging. The main drawback (and what should always be considered when looking at the traditional methods of handling data classification) is the very thing that is its greatest asset: the trained eye. What is troublesome about the top-down approach is that information is being handled by individuals who have been schooled in how to handle information. While that may not sound like an issue, it can be very problematic when it comes to the internet and the world of pop-culture and media. The controlled vocabularies that are used within an academic context or traditional cataloguing world do not always apply to the latest Lady Gaga video or Scorsese trailer. Even if they were to be used or adjusted to fit, the information retrieval rate on their search terms may not produce the same kind of results that folksonomic search terms would produce. Trained professional classification is necessary, but it cannot cover all the bases at this point in our high-tech media-saturated world.
The ball then lands back in the court of the user. While the user has not always been at the center of the informational discourse, he/she certainly is so at this stage in the game, and for good reason. While the documentalists of the 19th century may have thought that they were going to take care of organizing all the information for everyone, and then it could be accessed, today’s computerized world is a bit different. Amongst other technological things, the actualization of hypertext changed everyone’s relationship to informational elements. As W.B. Rayward writes,
To speak only a little hyperbolically, documents can be considered to be, as it were, free-floating text reordered and altered at the will of the user or the system designer or both. The text has no final, pre-determined shape but is endlessly re-created and changed by the user as he or she interacts with the system. Moreover, comments and criticisms of a text may be linked directly to it and any one text may be linked in all sorts of predictable and unpredictable ways to any other in the system. (Rayward)
While Rayward might be specifically referring to the hypertext element of technological life, he may just as well be discussing tagging. It is, after all, the user’s involvement and interactivity, the nature of the participation, which causes the jump from order to chaos.
Gauging the System: The Folksonomic Spectrum
The advantages of a folksonomic system are palpable. In fact, that is one of the main benefits to a folksonomic system: visual immediacy. Instead of having to fool around on the back-end of a database, your input is direct. There is no controlled vocabulary, nothing to draw from. When you tag an element (photo, video, song, etc) the only thing that you are drawing from is you. If you want to label that Replacements song as bestsongever, that label will stick and it is right there for you and everyone else to see. Tags serve as instant virtual post-it notes that you can stick on items and share with the world or keep as personal reminders. However, that is also the deficit. Everyone has a different idea of what something should be labeled or noted as.
The concept of tagging is to gather information together in order to be able to better access it. So, it’s a database, right? Yes, but for whom? The fine line distinction between user-based folksonomies and all other kinds of information bundling is that tagging in this context generally functions for personal reasons or for the sake of a social community, both which end up nourishing the online world at large. To put it bluntly, folksonomies serve “two masters at once; the personal collection, and the collective collection.”(Guy and Tonkin). Folksonomic classification could be seen in some ways as a kind of “workingman’s database collective.” Not only is it being created by the everyman, but it is being accessed by the everyman. Unlike the OCLC or the Library of Congress, sites like LastFM, Flickr and LibraryThing are being used on an hourly basis by people of every age range, technology skill set and ability. The input of these individuals is invaluable as far as looking at the way modern communication and language is used and created.
On the other hand, the democracy of folksonomic classification is also the fly in the ointment. The biggest criticisms of tagging are based upon its personal and casual nature. Folksonomic classification has many benefits; however the defects are just as prevalent. Linguistically, there are massive problems to giving the WORLD wide web permission to tag and edit elements. Classification can be a delicate thing and it has been shown to be so as folksonomies have developed over the years. While the freedom to tag-and-run is nice, as studies have shown, it can be messy. As Guy and Tonkin note, “the major flaw of current folksonomy systems – and the number one gripe for those happier with more formal classification systems – is that the tagging terms used in those systems are imprecise. It is the users of a folksonomy system who add the tags, which means that the tags are often ambiguous, overly personalized and inexact.”(Guy and Tonkin) Users are adding the tags and are drawing from what they know which is a natural language vocabulary. It only stands to reason that there will be some difficulties due to this very basic issue.
According to Golder and Huberman, tagging systems “are beset by many problems that exist as a result of the necessarily imperfect, yet natural and evolving process of creating semantic relations between words and their referents. Three of these problems are polysemy, synonymy, and basic level variation.”(Golder and Huberman) Beyond these three concerns, there is one other: term composition, which can vary from tag spelling to whether it is in plural or singular form. While folksonomies are theoretically fantastic, it is this uncontrolled vocabulary component that creates the most waves. As we look at these four different linguistic dilemmas problems, it should be noted that while language is currently serving as a kind of threshold to the ways in which folksonomies can grow, database-wise, the democratic frame and ideologies surrounding it give it extensive possibility for future options.
Polysemy In Action
In film theory, a polysemic text is a piece of cinema that has a variety of meanings. For example, there have been a plethora of discussions as to the “true meaning” of David Lynch’s Lost Highway, but Lynch’s response was quite simple. It was up to the viewer, not him. “The beauty of a film…is everybody has a different take. Nobody agrees on anything in the world today. When you are spoon-fed a film, more people instantly know what it is. I love things that leave room to dream and are open to various interpretations…Film is what it means.”(Biodrowski) Cinema has much in common with folksonomies in that each audience member has the freedom to establish their own dataset based upon what they see up on the screen; each individual moviegoer “tags” the film with their own bundle of meaning centered on the images that they have just observed. However, like the Lost Highway example, they are likely to be very different.
On the LastFM site, which engages specifically with music files, you can see the tag “Hardcore.” If you were to search under that term, you would come up with a slew of options that have quite different meanings. You can listen to a station that consists of music from the music genre that is simply known as “hardcore.”
You can listen to the album by artist Lil Kim entitled Hardcore.
And if the differences between those two weren’t disparate enough, then select a station based on everything that listeners, worldwide, have tagged as “hardcore.” What you will get is a station that plays everything from straight-edge punk to heavy-duty techno to high-intensity metal. The tag “hardcore” is a polysemic tag within the music world. It is highly doubtful that the Lil Kim fans were aware of the hardcore punk music genre that has since come to be known as “hardcore.” Thus, tagging the album as such was not complicating anything in their minds. Aside from that, it is the proper name of her album. Additionally, there is a very good chance that many of the fans of hardcore-techno are unaware of Lil Kim’s album, thus simply tagging their favorite techno song as “hardcore” works, as, within their musical vocabulary, it is altogether appropriate. Do these things all end up becoming somewhat confusing? Possibly, if you were to be searching for the band Pulp’s album This is Hardcore and came up with the Lil Kim one instead. The two artists are quite dissimilar, so if you were to enjoy one, you may not enjoy the other.
One of the other problems raised within the world of the folksonomy in relation to polysemy is that of homonymy. By definition, it is clear that the two terms do not refer to the same kind of referent, as a polysemic word is one word with multiple meanings that have some relation to each other and a homonym is one word whose multiple definitions do not share connectivity. In the LastFM example, the polysemic term hardcore refers to something with extreme or intense energies put behind it. However, that ended up being applied to a large quantity of items that, while sharing some referent, were most certainly not within the same group. On the other hand, a homonym proves to be less of an issue when it comes to information retrieval. As Golder and Huberman note, “homonymy is less a problem because homonyms can be largely ruled out in a tag-based search through the addition of a related term with which the unwanted homonym would not appear.”(Golder and Huberman)
Looking At Synonymy
Synonymy is another concern for folksonomic classification. If you have a subject that all users know by the same name, and yet it gets tagged in a plethora of different manners, things can get complicated quickly. Tagging clouds contain all tags created and makes it clear as to which ones are more popular. It also shows the tags that have been created by users that, according to some people, may not have needed to be created. For example, if one visits the Flickr site, and selects the “most popular tags” area, there will be the word cloud that depicts all the tags that users have given to the data that they have uploaded to the site. The tags “blackandwhite” and “bw” are descriptors of the same kind of data, thus would be considered synonyms. Yet they are both very amongst the most popular tags.
There is another way of looking at the synonym issue, however. Sometimes, synonyms carry different cultural baggage and, while they might technically mean the same thing, the users that tagged those pieces of data might have had very specific ideas when placing that terminology upon those pictures. For some tagging individuals, there might be a very big difference between “NYC” and “newyorkcity.” Clay Shirky talks about this when he’s discussing controlled vocabularies. He uses the example of the words “cinema” and “movies” and refers to how Livejournal made no effort to condense the two words into one unified term. He writes,
The cataloguers’ first reaction to that is, “Oh my god, that means you won’t be introducing the movies people to the cinema people!” To which the obvious answer is “Good. The movie people don’t want to hang out with the cinema people.” Those terms actually encode different things, and the assertion that restricting vocabularies improves signal assumes that that there’s no signal in the difference itself, and no value in protecting the user from too many matches… You can’t do it. You can’t collapse these categorizations without some signal loss. The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize. They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus. (Shirky)
This loss of signal that Shirky mentions is something that needs to be addressed. While it may seem problematic, it really isn’t, seeing as folksonomies are, at their center, based on bottom-up logistical schematics. The intimate nature of the tagging system allows for these informational “hiccups” to occur, so long as the main target is still being worked towards. In a situation such as this, where all terms tagged end up in the word cloud and thus all information is still gathered, there is advancement. His point is a good one. Instead of being concerned about differences between the two tags and which one is the “right” one, our concern should perhaps focus on tag cloud visibility and user familiarity. As Shirky states, “With a multiplicity of points of view the question isn’t “Is everyone tagging any given link ‘correctly'”, but rather “Is anyone tagging it the way I do?” As long as at least one other person tags something they way you would, you’ll find it — using a thesaurus to force everyone’s tags into tighter synchrony would actually worsen the noise you’ll get with your signal.”(Shirky)
Let’s Get Specific: Basic Level Variation
Basic level variation is a concept that deals almost directly with ideologies surrounding information retrieval and hierarchies of information categorization. Per the definition by Golder and Huberman, “the ‘basic level’ problem is that related terms that describe an item vary along a continuum of specificity ranging from very general to very specific.”(Golder and Huberman) Looking at this dilemma, it is also one of vocabulary. While some people choose to label things as carefully as they can, still others go for the more general approach, leaving the tag cloud filled with varying descriptions of a given element, some of which may be referents to the same object.
In their work on improving tag clouds, Yussef Hassan-Montrero and Victor Herrero-Solana note that within most folksonomies, the tags are of the broad variety, not the narrow. They also remark that a good portion of the reason behind this is that tagging itself takes very little time and effort, but the more specific a user gets with the tagging, the longer and more labor-intensive it becomes, making it a less attractive task.(Hassan-Montrero and Herrero-Solana) While this may be true, it may be site-dependent and object dependent. On a site like Flickr, where you are primarily tagging photos, broad tags are likelihood. When you get into other sites where the files themselves are narrower and specific, the specificity of the tags may increase.
For instance, a site like LibraryThing has just as many narrow tags for many of its items as broad tags. If you searched for Hellboy: the Right Hand of Doom, you would find tags as specific as “BPRD” and as broad as “graphic novel.” The specificity of the tags here are quite high. On Flickr, you might find “flower” to describe a picture and that may be the most popular tag. Then again, the most popular tag on that picture may also be “rose.” However, on LibraryThing, due to the kind of files that you are handling, simply tagging something with “graphic novel” or “crime” is generally not enough for the users of this site. For the most part, the elements here tend towards much narrower tags, which, upon information retrieval in the search function, come up with a more specific set of data.
No matter how broad or narrow the tag might be, the item still has to be found. The tagged element still must be located within a search. The problem here is that for some users, tagging an item with a term that is too broad will not turn up the very specific item they are looking for (they are looking for “rose” not “flower”) while tagging an item with a term that is too narrow will result in nothing under the broader term, as everything will be under a narrow term. It’s a dangerous road to be traversed both by the users and those trying to access the elements that have been tagged.
Specificity runs the full gamut and it is about organizing information. In order to do this, the users make sense (be it for personal or social gain) of the objects that they put into the larger system of objects which is directly correlated to tagging. It is the locale of categorization and labeling for the individual and the masses, whether we like it or not. Applying the work of Weick, Sutcliffe and Obstfeld to folksonomies, tagging is really just a way of making sense of things. As they state, “sensemaking occurs when a flow of organizational circumstances is turned into words and salient categories.”(Weick, Sutcliffe and Obstfeld) . The intention is to make sense of things, whatever way we can. If the level of specificity prevents us from doing so, it can become problematic.
Term Composition: Getting it Right
The final problematic area of folksonomies is term composition. As mentioned earlier, one of the strengths to this method of information organization is that it is a quick and dirty method to categorize personal collections of data and share them within a larger context. However, due to this process, there is a higher contingency for error. Misspellings run rampant within the world of tagging, and if a tag is misspelled…it loses its veracity or worthiness. Unless multiple people come along and also tag the element with the same misspelled term (a possibility, but not a likely one), the element will exist within a kind of “tag limbo” and not be able to be located. In addition, due to the unstable nature of natural language and thus the linguistic choice for terms, there are problems such as plurals versus singulars, adjectival terminologies, intimate terms and “poorly encoded” terms. All of these things complicate the metadata that makes up tagging to the point that traditional cataloguers want very little to do with the folksonomic arena, due to its seemingly disorganized layout.
The plural/singular argument is a simple grammatical issue: some users will tag an item as “monster” while others will tag it as “monsters.” Clearly the differences between the two will lead to quite different results within searches. The other problems are a bit more complex and are based upon more personal linguistics, but also have to do with the way that the words are composited. The adjectival terminologies are tags such as “happy” “sad” or “scary” which are generated solely upon each individual user’s opinion. These are a specific section of tags which, as Alireza Noruzi writes, “do not seem to stand alone and, rather than establish categories themselves, refine or qualify existing categories.”(Noruzi)
Finally, the last area is intimate terms and “poorly encoded” terms. Intimate terms are the kinds of tags that, not unlike the opinion-based adjectival terms, are entirely predicated upon the user’s intimate relationship to the tagged object. Tags like “mycat” or “ourhouse” would be fine for a private dataset. However, on a larger level, these tags do not work very well. Possessives (once again, heading into the grammatical, not dissimilar to the plural/singular argument) drive a big wedge into the ability to retrieve information. They are related to the other subcategory here, which are terms that have been “poorly encoded.” Why have they been poorly encoded? Like the “mycat” issue, many of them have compound structures that just do not work as far as information retrieval is concerned. Where “mycat” doesn’t work because it is a personal/intimate tag and the only reason an individual might search for that tag would be if they were, indeed, searching for data about their cat, the badly coded items don’t work because the words just don’t work together. They are the best example of term composition gone wrong. Tags such as DinosaurIceCreamNewYork might make sense to the user tagging the element, but the likelihood that another individual will be searching that out is little to none. What the intimacy of these terms wind up doing is creating an incredibly narrow term that is even narrower than what was previously discussed within the concept of specificity. The more intimate a user’s tags, the less likely their work is going to contribute to the larger folksonomic world on the whole.
Conclusion: Chaos Reigns
Folksonomies and tagging systems never promised us a rose garden. What they did promise was an additional way to create metadata based upon a vast array of different users and natural language, not a controlled vocabulary. What this provides is something that professional cataloguers never could: a constant flow of cataloging, 24 hours a day, 7 days a week, all over the world. Is this method perfect? No, but as a supplement to the informational systems we already have in place, it is invaluable. Not only does tagging create a more democratic system by which the common person has a say in the ways in which worldly information is being processed, but it creates a mutual discourse by which the professionals are forced to consider the linguistic philosophies by which users operate. Clay Shirky stated, “It comes down ultimately to a question of philosophy….Critically, the semantics here are in the users, not in the system. This is not a way to get computers to understand things… It’s all dependent on human context.”(Shirky) When traditional ways of thinking about classification systems are being challenged on a philosophical level, based not solely upon technological advances but upon the operators of that technology, then it is not hyperbole to say that this is an essential element of current and future informational data structures.
There are clearly limitations to tagging in its current form. Folksonomies are not perfect. Yes, folksonomies look messy and problematic and they have an intimidating amount of linguistic issues to be resolved within their very backbone. But, they are still quite worthwhile. After all, if you look at the internet itself, it is actually “a somewhat messy affair (almost its defining characteristic). Rather, this is an altogether different – and, we would argue, complementary – form of classification. Compared to the traditional top-down approach, folksonomy data is much noisier but also more flexible, more abundant and far cheaper. Bear in mind also that the terms used are, by definition, the very terms that real users might be expected to use in future when searching for this information. “(Hammond, Hannay and Lund) But the effectiveness of tagging cannot yet be gauged on the whole because, as a system, it is still in flux. While people continue to tag photos on Flickr and books on LibraryThing, there are people who are trying to figure out ways to make it better. There are sites such as http://meta.cooking.stackexchange.com , which is a cooking site that has an entire section devoted to how to make their tags better. WordPress.com, a blogging site has an informational area that helps differentiate between good and bad tags to use when asking questions. Indeed, beyond the actual sites, there are professionals who have ideas based on statistical data on how to improve the problems discussed earlier, such as improving tag literacy through user education and allowing the systems to implement “better” tags. (Guy and Tonkin) While it would be much cleaner and simpler to be able to say that one information classification system is more beneficial than the other, it is virtually impossible. There are benefits to controlled vocabularies and there are distinct advantages to natural language, and what each contributes to the world of classification the other could not. Where there is strength in traditional indexing processes, we can find dynamic new methodologies and philosophies in the chaos of folksonomies, something that can only help create a better system overall.
Biodrowski, Steve. “Lost Highway: The Solution.” HollywoodGothique.com. http://www.hollywoodgothique.com/losthighwaysolution.html. 2004/2005.
Golder, Scott A. and Bernardo A. Huberman. “The Structure of Collaborative Tagging Systems.” http://arxiv.org/ftp/cs/papers/0508/0508082.pdf. 18 August 2005.
Guy, Marieke and Emma Tonkin. “Folksonomies: Tidying Up Tags?” D-Lib January 2006.
Hammond, Tony, Timo Hannay and Ben and Joanna Scott Lund. “Social Bookmarking Tools.” D-Lib Magazine April 2005.
Hassan-Montrero, Yussef and Victor Herrero-Solana. “Improving Tag-Clouds as Visual Information Retrieval Interfaces.” International Conference on Multidisciplinary Information Sciences and Technologies. Merida, 2006. 1-6.
Noruzi, Alireza. “Folksonomies: (Un)Controlled Vocabulary?” Knowledge Organization 33 (2006): 199-2003.
O’Reilly, Tim. What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. 9 September 2005.
Rayward, W.B. “Visions of Xanadu: Paul Otlet (1868-1944) and Hypertext.” Journal of the American Society for Information Science (1994): 235-250.
Shirky, Clay. “Ontology is Overrated: Categories, Links, and Tags.” http://www.shirky.com/writings/ontology_overrated.html. March, April 2005.
Smith, Gene. Folksonomy: Social Classification. Edmonton, 24 July 2004.
Vander Wal, Thomas. Folksonomy. 2 February 2007.
Weick, Karl E, Kathleen M Sutcliffe and David Obstfeld. “Organizing and the Process of Sensemaking.” Organization Science 16.4 (2005): 409-421.