This article is Chapter 5 of the O’Reilly book Information Architecture for the World Wide Web (2nd Edition).
In recent years, increasing attention has been focused on the challenge of organizing information. Yet this challenge is not new. People have struggled with the difficulties of information organization for centuries. The field of librarianship has been largely devoted to the task of organizing and providing access to information. So why all the fuss now?
Believe it or not, we're all becoming librarians. This quiet yet powerful revolution is driven by the decentralizing force of the global Internet. Not long ago, the responsibility for labeling, organizing, and providing access to information fell squarely in the laps of librarians. These librarians spoke in strange languages about Dewey Decimal Classification and the Anglo-American Cataloging Rules. They classified, cataloged, and helped you find the information you needed.
As it grows, the Internet is forcing the responsibility for organizing information on more of us each day. How many corporate web sites exist today? How many personal home pages? What about tomorrow? As the Internet provides users with the freedom to publish information, it quietly burdens them with the responsibility to organize that information. New information technologies open the floodgates for exponential content growth, which creates a need for innovation in content organization.
And if you're not convinced that we're facing severe information-overload challenges, take a look at an excellent study conducted at Berkeley. This study finds that the world produces between 1 and 2 exabytes of unique information per year. Given that an exabyte is a billion gigabytes (we're talking 18 zeros), this growing mountain of information should keep us all busy for a while.
As we struggle to meet these challenges, we unknowingly adopt the language of librarians. How should we label that content? Is there an existing classification scheme we can borrow? Who's going to catalog all of that information?
We're moving toward a world in which tremendous numbers of people publish and organize their own information. As we do so, the challenges inherent in organizing that information become more recognized and more important. Let's explore some of the reasons why organizing information in useful ways is so difficult.
Classification systems are built upon the foundation of language, and language is ambiguous: words are capable of being understood more than one way. Think about the word pitch. When I say pitch, what do you hear? There are more than 15 definitions, including:
A throw, fling, or toss.
A black, sticky substance used for waterproofing.
The rising and falling of the bow and stern of a ship in a rough sea.
A salesman's persuasive line of talk.
An element of sound determined by the frequency of vibration.
This ambiguity results in a shaky foundation for our classification systems. When we use words as labels for our categories, we run the risk that users will miss our meaning. This is a serious problem. (See Chapter 6 to learn more about labeling.)
It gets worse. Not only do we need to agree on the labels and their definitions, we also need to agree on which documents to place in which categories. Consider the common tomato. According to Webster's dictionary, a tomato is "a red or yellowish fruit with a juicy pulp, used as a vegetable: botanically it is a berry." Now I'm confused. Is it a fruit or a vegetable or a berry?
If we have such problems classifying the common tomato, consider the challenges involved in classifying web site content. Classification is particularly difficult when you're organizing abstract concepts such as subjects, topics, or functions. For example, what is meant by "alternative healing," and should it be cataloged under "philosophy" or "religion" or "health and medicine" or all of the above? The organization of words and phrases, taking into account their inherent ambiguity, presents a very real and substantial challenge.
Heterogeneity refers to an object or collection of objects composed of unrelated or unlike parts. You might refer to grandma's homemade broth with its assortment of vegetables, meats, and other mysterious leftovers as heterogeneous. At the other end of the scale, homogeneous refers to something composed of similar or identical elements. For example, Ritz crackers are homogeneous. Every cracker looks and tastes the same.
An old-fashioned library card catalog is relatively homogeneous. It organizes and provides access to books. It does not provide access to chapters in books or collections of books. It may not provide access to magazines or videos. This homogeneity allows for a structured classification system. Each book has a record in the catalog. Each record contains the same fields: author, title, and subject. It is a high-level, single-medium system, and works fairly well.
Most web sites, on the other hand, are highly heterogeneous in many respects. For example, web sites often provide access to documents and their components at varying levels of granularity. A web site might present articles and journals and journal databases side by side. Links might lead to pages, sections of pages, or other web sites. And, web sites typically provide access to documents in multiple formats. You might find financial news, product descriptions, employee home pages, image archives, and software files. Dynamic news content shares space with static human-resources information. Textual information shares space with video, audio, and interactive applications. The web site is a great multimedia melting pot, where you are challenged to reconcile the cataloging of the broad and the detailed across many mediums.
The heterogeneous nature of web sites makes it difficult to impose any single structured organization system on the content. It usually doesn't make sense to classify documents at varying levels of granularity side by side. An article and a magazine should be treated differently. Similarly, it may not make sense to handle varying formats the same way. Each format will have uniquely important characteristics. For example, we need to know certain things about images, such as file format (GIF, TIFF, etc.) and resolution (640x480, 1024x768, etc.). It is difficult and often misguided to attempt a one-size-fits-all approach to the organization of heterogeneous web site content. This is a fundamental flaw of many enterprise taxonomy initiatives.
Have you ever tried to find a file on a coworker's desktop computer? Perhaps you had permission. Perhaps you were engaged in low-grade corporate espionage. In either case, you needed that file. In some instances, you may have found the file immediately. In others, you may have searched for hours. The ways people organize and name files and directories on their computers can be maddeningly illogical. When questioned, they will often claim that their organization system makes perfect sense. "But it's obvious! I put current proposals in the folder labeled /office/clients/green and old proposals in /office/clients/red. I don't understand why you couldn't find them!"
The fact is that labeling and organization systems are intensely affected by their creators' perspectives. We see this at the corporate level with web sites organized according to internal divisions or org charts, with groupings such as marketing, sales, customer support, human resources, and information systems. How does a customer visiting this web site know where to go for technical information about a product they just purchased? To design usable organization systems, we need to escape from our own mental models of content labeling and organization.
We employ a mix of user research and analysis methods to gain real insight. How do users group the information? What types of labels do they use? How do they navigate? This challenge is complicated by the fact that web sites are designed for multiple users, and all users will have different ways of understanding the information. Their levels of familiarity with your company and your content will vary. For these reasons, even with a massive barrage of user tests, it is impossible to create a perfect organization system. One site does not fit all! However, by recognizing the importance of perspective, by striving to understand the intended audiences through user research and testing, and by providing multiple navigation pathways, you can do a better job of organizing information for public consumption than your coworker does on his or her desktop computer.
Politics exist in every organization. Individuals and departments constantly position for influence or respect. Because of the inherent power of information organization in forming understanding and opinion, the process of designing information architectures for web sites and intranets can involve a strong undercurrent of politics. The choice of organization and labeling systems can have a big impact on how users of the site perceive the company, its departments, and its products. For example, should we include a link to the library site on the main page of the corporate intranet? Should we call it The Library or Information Services or Knowledge Management? Should information resources provided by other departments be included in this area? If the library gets a link on the main page, then why not corporate communications? What about daily news?
As an information architect, you must be sensitive to your organization's political environment. In certain cases, you must remind your colleagues to focus on creating an architecture that works for the user. In others, you may need to make compromises to avoid serious political conflict. Politics raise the complexity and difficulty of creating usable information architectures. However, if you are sensitive to the political issues at hand, you can manage their impact upon the architecture.
The organization of information in web sites and intranets is a major factor in determining success, and yet many web development teams lack the understanding necessary to do the job well. Our goal in this chapter is to provide a foundation for tackling even the most challenging information organization projects.
Organization systems are composed of organization schemes and organization structures. An organization scheme defines the shared characteristics of content items and influences the logical grouping of those items. An organization structure defines the types of relationships between content items and groups.
Before diving in, it's important to understand information organization in the context of web site development. Organization is closely related to navigation, labeling, and indexing. The hierarchical organization structures of web sites often play the part of primary navigation system. The labels of categories play a significant role in defining the contents of those categories. Manual indexing or metadata tagging is ultimately a tool for organizing content items into groups at a very detailed level. Despite these closely knit relationships, it is both possible and useful to isolate the design of organization systems, which will form the foundation for navigation and labeling systems. By focusing solely on the logical grouping of information, you avoid the distractions of implementation details and can design a better web site.
We navigate through organization schemes every day. Telephone books, supermarkets, and television programming guides all use organization schemes to facilitate access. Some schemes are easy to use. We rarely have difficulty finding a friend's phone number in the alphabetical organization scheme of the white pages. Some schemes are intensely frustrating. Trying to find marshmallows or popcorn in a large and unfamiliar supermarket can drive us crazy. Are marshmallows in the snack aisle, the baking ingredients section, both, or neither?
In fact, the organization schemes of the phone book and the supermarket are fundamentally different. The alphabetical organization scheme of the phone book's white pages is exact. The hybrid topical/task-oriented organization scheme of the supermarket is ambiguous.
Let's start with the easy ones. Exact organization schemes divide information into well-defined and mutually exclusive sections. The alphabetical organization of the phone book's white pages is a perfect example. If you know the last name of the person you are looking for, navigating the scheme is easy. "Porter" is in the P's which is after the O's but before the Q's. This is called known-item searching. You know what you're looking for, and it's obvious where to find it. No ambiguity is involved. The problem with exact organization schemes is that they require the user to know the specific name of the resource they are looking for. The white pages don't work very well if you're looking for a plumber.
Exact organization schemes are relatively easy to design and maintain because there is little intellectual work involved in assigning items to categories. They are also easy to use. The following sections explore three frequently used exact organization schemes.
An alphabetical organization scheme is the primary organization scheme for encyclopedias and dictionaries. Almost all nonfiction books, including this one, provide an alphabetical index. Phone books, department store directories, bookstores, and libraries all make use of our 26-letter alphabet for organizing their contents. Alphabetical organization often serves as an umbrella for other organization schemes. We see information organized alphabetically by last name, by product or service, by department, and by format. Figure 5-2 provides an example of a departmental directory organized alphabetically by last name.
Certain types of information lend themselves to chronological organization. For example, an archive of press releases might be organized by the date of release. Press release archives are obvious candidates for chronological organization schemes (see Figure 5-3). The date of announcement provides important context for the release. However, keep in mind that users may also want to browse the releases by title, product category, or geography, or to search by keyword. A complementary combination of organization schemes is often necessary. History books, magazine archives, diaries, and television guides tend to be organized chronologically. As long as there is agreement on when a particular event occurred, chronological schemes are easy to design and use.
Place is often an important characteristic of information. We travel from one place to another. We care about the news and weather that affects us in our location. Political, social, and economic issues are frequently location-dependent. With the exception of border disputes, geographical organization schemes are fairly straightforward to design and use. Figure 5-4 shows an example of a geographical organization scheme. Users can select a location from the map using their mouse.
Now for the tough ones. Ambiguous organization schemes divide information into categories that defy exact definition. They are mired in the ambiguity of language and organization, not to mention human subjectivity. They are difficult to design and maintain. They can be difficult to use. Remember the tomato? Do we put it under fruit, berry, or vegetable?
However, they are often more important and useful than exact organization schemes. Consider the typical library catalog. There are three primary organization schemes: you can search for books by author, by title, or by subject. The author and title organization schemes are exact and thereby easier to create, maintain, and use. However, extensive research shows that library patrons use ambiguous subject-based schemes such as the Dewey Decimal and Library of Congress classification systems much more frequently.
There's a simple reason why people find ambiguous organization schemes so useful: we don't always know what we're looking for. In some cases, you simply don't know the correct label. In others, you may have only a vague information need that you can't quite articulate. For these reasons, information seeking is often iterative and interactive. What you find at the beginning of your search may influence what you look for and find later in your search. This information seeking process can involve a wonderful element of associative learning. Seek and ye shall find, but if the system is well designed, you also might learn along the way. This is web surfing at its best.
Ambiguous organization supports this serendipitous mode of information seeking by grouping items in intellectually meaningful ways. In an alphabetical scheme, closely grouped items may have nothing in common beyond the fact that their names begin with the same letter. In an ambiguous organization scheme, someone other than the user has made an intellectual decision to group items together. This grouping of related items supports an associative learning process that may enable the user to make new connections and reach better conclusions. While ambiguous organization schemes require more work and introduce a messy element of subjectivity, they often prove more valuable to the user than exact schemes.
The success of ambiguous organization schemes depends upon the quality of the scheme and the careful placement of individual items within that scheme. Rigorous user testing is essential. In most situations, there is an ongoing need for classifying new items and for modifying the organization scheme to reflect changes in the industry. Maintaining these schemes may require dedicated staff with subject matter expertise. Let's review a few of the most common and valuable ambiguous organization schemes.
Organizing information by subject or topic is one of the most useful and challenging approaches. Phone book yellow pages are organized topically, so that's the place to look when you need a plumber. Academic courses and departments, newspapers, and the chapters of most nonfiction books are all organized along topical lines.
While few web sites are organized solely by topic, most should provide some sort of topical access to content. In designing a topical organization scheme, it is important to define the breadth of coverage. Some schemes, such as those found in an encyclopedia, cover the entire breadth of human knowledge. Research-oriented web sites such as About.com (shown in Figure 5-5) rely heavily on their topical organization scheme. Others, such as corporate web sites, are limited in breadth, covering only those topics directly related to that company's products and services. In designing a topical organization scheme, keep in mind that you are defining the universe of content (both present and future) that users will expect to find within that area of the web site.
Task-oriented schemes organize content and applications into a collection of processes, functions, or tasks. These schemes are appropriate when it's possible to anticipate a limited number of high-priority tasks that users will want to perform. Desktop software applications such as word processors and spreadsheets provide familiar examples. Collections of individual actions are organized under task-oriented menus such as Edit, Insert, and Format.
On the Web, task-oriented organization schemes are most common in the context of e-commerce web sites where customer interaction takes center stage. Intranets and extranets also lend themselves well to a task orientation, since they tend to integrate powerful applications or "e-services" as well as content.
You will rarely find a web site organized solely by task. Instead, task-oriented schemes are usually embedded within specific subsites or integrated into hybrid task/topic navigation systems, as we see in Figure 5-6.
In cases where there are two or more clearly definable audiences for a web site or intranet, an audience-specific organization scheme may make sense. This type of scheme works best when the site is frequented by repeat visitors who can bookmark their particular section of the site. It also works well if there is value in customizing the content for each audience. Audience-oriented schemes break a site into smaller, audience-specific mini-sites, thereby allowing for clutter-free pages that present only the options of interest to that particular audience. The main page of dell.com, shown in Figure 5-7, features an audience-oriented organization scheme (on the right) that invites customers to self-identify.
Organizing by audience brings all the promise and peril associated with any form of personalization. For example, Dell understands a great deal about its audience segments and brings this knowledge to bear on its web site. If I visit the site and identify myself as a member of the "Home & Home Office" audience, Dell will present me with a set of options and sample system configurations designed to meet my needs. In this instance, Dell makes the educated guess that I probably need a modem to connect to the Internet from my home. However, this guess is wrong, since I now have affordable broadband access in my community. I need an Ethernet card instead. All ambiguous schemes require the information architect to make these educated guesses and revisit them over time.
Audience-specific schemes can be open or closed. An open scheme will allow members of one audience to access the content intended for other audiences. A closed scheme will prevent members from moving between audience-specific sections. This may be appropriate if subscription fees or security issues are involved.
Metaphors are commonly used to help users understand the new by relating it to the familiar. You need not look further than your desktop computer with its folders, files, and trash can or recycle bin for an example. Applied to an interface in this way, metaphors can help users understand content and function intuitively. In addition, the process of exploring possible metaphor-driven organization schemes can generate new and exciting ideas about the design, organization, and function of the web site.
While metaphor exploration can be useful while brainstorming, you should use caution when considering a metaphor-driven global organization scheme. First, metaphors, if they are to succeed, must be familiar to users. Organizing the web site of a computer hardware vendor according to the internal architecture of a computer will not help users who don't understand the layout of a motherboard.
Second, metaphors can introduce unwanted baggage or be limiting. For example, users might expect a digital library to be staffed by a librarian that will answer reference questions. Most digital libraries do not provide this service. Additionally, you may wish to provide services in your digital library that have no clear corollary in the real world. Creating your own customized version of the library is one such example. This will force you to break out of the metaphor, introducing inconsistency into your organization scheme.
In the offbeat example in Figure 5-8, Bianca has organized the contents of her web site according to the metaphor of a physical shack with rooms. While this metaphor-driven approach is fun and conveys a sense of place, it is not particularly intuitive. Can you guess what you'll find in the pantry? Also, note that features such as Find Your Friend don't fit neatly into the metaphor. While these metaphor-driven "sitemaps" were popular in the early days of the Web, they have become a dying breed, as coolness loses ground to usability.
The power of a pure organization scheme derives from its ability to suggest a simple mental model that users can quickly understand. Users easily recognize an audience-specific or topical organization. And fairly small, pure organization schemes can be applied to large amounts of content without sacrificing their integrity or diminishing their usability.
However, when you start blending elements of multiple schemes, confusion often follows, and solutions are rarely scalable. Consider the example in Figure 5-9. This hybrid scheme includes elements of audience-specific, topical, metaphor-based, task-oriented, and alphabetical organization schemes. Because they are all mixed together, we can't form a mental model. Instead, we need to skim through each menu item to find the option we're looking for.
The exception to these cautions against hybrid schemes exists within the surface layer of navigation. As illustrated by eBay (see Figure 5-6), many web sites successfully combine topics and tasks on their main page and within their global navigation. This reflects the reality that typically both the organization and its users identify finding content and completing key tasks at the top of their priority lists. Because this includes only the highest priority tasks, the solution does not need to be scalable. It's only when such schemes are used to organize a large volume of content and tasks that the problems arise. In other words, shallow hybrid schemes are fine, but deep hybrid schemes are not.
Unfortunately, deep hybrid schemes are still fairly common. This is because it is often difficult to agree upon any one scheme, so people throw the elements of multiple schemes together in a confusing mix. There is a better alternative. In cases where multiple schemes must be presented on one page, you should communicate to designers the importance of preserving the integrity of each scheme. As long as the schemes are presented separately on the page, they will retain the powerful ability to suggest a mental model for users. For example, a broader look at the Dell home page in Figure 5-10 reveals a geographical scheme, an audience-oriented scheme, and a topical scheme. By presenting them separately, Dell provides flexibility without causing confusion.
[The following is the conclusion of our pair of excerpts from chapter 5 of the O'Reilly & Associates title, Information Architecture for the World Wide Web, 2nd Edition.]
Organization structure plays an intangible yet very important role in the design of web sites. While we interact with organization structures every day, we rarely think about them. Movies are linear in their physical structure. We experience them frame by frame from beginning to end. However, the plots themselves may be nonlinear, employing flashbacks and parallel subplots. Maps have a spatial structure. Items are placed according to physical proximity, although the most useful maps cheat, sacrificing accuracy for clarity.
The structure of information defines the primary ways in which users can navigate. Major organization structures that apply to web site and intranet architectures include the hierarchy, the database-oriented model, and hypertext. Each organization structure possesses unique strengths and weaknesses. In some cases, it makes sense to use one or the other. In many cases, it makes sense to use all three in a complementary manner.
The foundation of almost all good information architectures is a well-designed hierarchy or taxonomy. In this hypertextual world of nets and webs, such a statement may seem blasphemous, but it's true. The mutually exclusive subdivisions and parent-child relationships of hierarchies are simple and familiar. We have organized information into hierarchies since the beginning of time. Family trees are hierarchical. Our division of life on earth into kingdoms and classes and species is hierarchical. Organization charts are usually hierarchical. We divide books into chapters into sections into paragraphs into sentences into words into letters. Hierarchy is ubiquitous in our lives and informs our understanding of the world in a profound and meaningful way. Because of this pervasiveness of hierarchy, users can easily and quickly understand web sites that use hierarchical organization models. They are able to develop a mental model of the site's structure and their location within that structure. This provides context that helps users feel comfortable. Figure 5-11 shows an example of a simple hierarchical model.
Because hierarchies provide a simple and familiar way to organize information, they are usually a good place to start the information architecture process. The top-down approach allows you to quickly get a handle on the scope of the web site without going through an extensive content-inventory process. You can begin identifying the major content areas and exploring possible organization schemes that will provide access to that content.
When designing taxonomies on the Web, you should remember a few rules of thumb. First, you should be aware of, but not bound by, the idea that hierarchical categories should be mutually exclusive. Within a single organization scheme, you will need to balance the tension between exclusivity and inclusivity. Taxonomies that allow cross-listing are known as polyhierarchical. Ambiguous organization schemes in particular make it challenging to divide content into mutually exclusive categories. Do tomatoes belong in the fruit, vegetable, or berry category? In many cases, you might place the more ambiguous items into two or more categories so that users are sure to find them. However, if too many items are cross-listed, the hierarchy loses its value. This tension between exclusivity and inclusivity does not exist across different organization schemes. You would expect a listing of products organized by format to include the same items as a companion listing of products organized by topic. Topic and format are simply two different ways of looking at the same information. Or to use a technical term, they're two independent facets. See Chapter 9 for more about facets and polyhierarchy.
Second, it is important to consider the balance between breadth and depth in your taxonomy. Breadth refers to the number of options at each level of the hierarchy. Depth refers to the number of levels in the hierarchy. If a hierarchy is too narrow and deep, users have to click through an inordinate number of levels to find what they are looking for. The top of Figure 5-12 illustrates the narrow-and-deep hierarchy in which users are faced with six clicks to reach the deepest content. In the (relatively) broad-and-shallow hierarchy, users must choose from ten categories to reach ten content items. If a hierarchy is too broad and shallow, as shown in the bottom part of Figure 5-12, users are faced with too many options on the main menu and are unpleasantly surprised by the lack of content once they select an option.
In considering breadth, you should be sensitive to people's visual scanning abilities and to the cognitive limits of the human mind. Now, we're not going to tell you to follow the infamous seven plus-or-minus two rule. There is general consensus that the number of links you can safely include is constrained by users' abilities to visually scan the page rather than by their short-term memories.
Instead, we suggest that you:
Recognize the danger of overloading users with too many options.
Group and structure information at the page level.
Subject your designs to rigorous user testing.
Consider Microsoft's main page, shown in Figure 5-13. It's one of the most visited (and tested) pages on the Web, and the portal into a fairly large information system. Presenting information hierarchically at the page level, as Microsoft has done, can make a major positive impact on usability.
There are roughly 50 links on Microsoft's main page, and they're organized into several key groupings:
|Global Navigation||The top right global navigation bar (e.g., All Products, Support, Search) has only 4 links.|
|Local Navigation||The local navigation bar (e.g., Home, Training/Events, Subscribe) has 8 links.|
|Primary Taxonomies||There are 3 primary taxonomies (Product Families, Resources, Information For), each with 6-8 links.|
|Marketing||In the central marketing panel, there are 5 links.|
|Downloads||Under Downloads, there are 5 links.|
|News||Under News, there are 4 links.|
These 50 links are subdivided into 8 discrete categories, with 4-8 links per category.
In considering depth, you should be even more conservative. If users are forced to click through more than two or three levels, they may simply give up and leave your web site. At the very least, they'll become frustrated.
An excellent study conducted by Microsoft Research suggests that a medium balance of breadth and depth may provide the best results.
For new web sites and intranets that are expected to grow, you should lean towards a broad-and-shallow rather than a narrow-and-deep hierarchy. This allows for the addition of content without major restructuring. It is less problematic to add items to secondary levels of the hierarchy than to the main page for a couple of reasons. First, the main page serves as the most prominent and important navigation interface for users. Changes to this page can really hurt the mental model users have formed of the web site over time. Second, because of its prominence and importance, companies tend to spend lots of care (and money) on the graphic design and layout of the main page. Changes to the main page can be more time consuming and expensive than changes to secondary pages.
Finally, when designing organization structures, you should not become trapped by the hierarchical model. Certain content areas will invite a database or hypertext-based approach. The hierarchy is a good place to begin, but is only one component in a cohesive organization system.
A database is defined as "a collection of data arranged for ease and speed of search and retrieval." A Rolodex provides a simple example of a flat file database (see Figure 5-14). Each card represents an individual contact and constitutes a record. Each record contains several fields, such as name, address, and telephone number. Each field may contain data specific to that contact. The collection of records is a database.
In an old-fashioned Rolodex, users are limited to searching for a particular individual by last name. In a more contemporary, computer-based contact management system, we can also search and sort using other fields. For example, we can ask for a list of all contacts who live in Connecticut, sorted alphabetically by city.
Most of the heavy-duty databases we use are built upon the relational database model. In relational database structures, data is stored within a set of relations or tables. Rows in the tables represent records, and columns represent fields. Data in different tables may be linked through a series of keys. For example, in Figure 5-15, the au_id and title_id fields within the Author_Title table act as keys linking the data stored separately in the Author and Title tables.
So why are database structures important to information architects? After all, we made a fuss earlier in the book about our focus on information access rather than data retrieval. Where is this discussion heading?
In a word, metadata. Metadata is the primary key that links information architecture to the design of database schema. Metadata allows us to apply the structure and power of relational databases to the heterogeneous, unstructured environments of web sites and intranets. By tagging documents and other information objects with controlled vocabulary metadata, we enable powerful searching and browsing. This is a bottom-up solution that works well in large, distributed environments.
Figure 5-15: A relational database schema (this example is drawn from an overview of the relational database model at the University of Texas at Austin; see http://www.utexas.edu/cc/database/datamodeling/rm/)
The relationships between metadata elements can become quite complex. Defining and mapping these formal relationships requires significant skill and technical understanding. For example, the entity relationship diagram (ERD) in Figure 5-16 illustrates a structured approach to defining a metadata schema. Each entity (e.g., Resource) has attributes (e.g., Name, URL). These entities and attributes become records and fields. The ERD is used to visualize and refine the data model before design and population of the database.
We're not suggesting that all information architects must become experts in SQL, XML schema definition, the creation of entity relationship diagrams, and the design of relational databases, though these are all extremely valuable skills. In many cases, you'll be better off working with a professional programmer or database designer who really knows how to do this stuff. And for large web sites, you will hopefully be able to rely on Content Management System (CMS) software to manage your metadata and controlled vocabularies.
Instead, information architects need to understand how metadata, controlled vocabularies, and database structures can be used to enable:
Automatic generation of alphabetical indexes (e.g., product index)
Dynamic presentation of associative "see also" links
Advanced filtering and sorting of search results
The database model is particularly useful when applied within relatively homogeneous subsites such as product catalogs and staff directories. However, enterprise controlled vocabularies can often provide a thin horizontal layer of structure across the full breadth of a site. Deeper vertical vocabularies can then be created for particular departments, subjects, or audiences.
Hypertext is a relatively recent and highly nonlinear way of structuring information. A hypertext system involves two primary types of components: the items or chunks of information that will be linked, and the links between those chunks. These components can form hypermedia systems that connect text, data, image, video, and audio chunks. Hypertext chunks can be connected hierarchically, non-hierarchically, or both, as shown in Figure 5-17. In hypertext systems, content chunks are connected via links in a loose web of relationships.
Although this organization structure provides you with great flexibility, it presents substantial potential for complexity and user confusion. Why? Because hypertext links reflect highly personal associations. As users navigate through highly hypertextual web sites, it is easy for them to get lost. It's as if they are thrown into a forest and are bouncing from tree to tree, trying to understand the lay of the land. They simply can't create a mental model of the site organization. Without context, users can quickly become overwhelmed and frustrated. In addition, hypertextual links are often personal in nature. The relationships that one person sees between content items may not be apparent to others.
For these reasons, hypertext is rarely a good candidate for the primary organization structure. Rather, it can be used to complement structures based upon the hierarchical or database models.
Hypertext allows for useful and creative relationships between items and areas in the hierarchy. It usually makes sense to first design the information hierarchy and then identify ways in which hypertext can complement the hierarchy.
Experience designer Nathan Shedroff suggests that the first step in transforming data into information is exploring its organization. As you've seen in this chapter, organization systems are fairly complex. You need to consider a variety of exact and ambiguous organization schemes. Should you organize by topic, by task, or by audience? How about a chronological or geographical scheme? What about using multiple organization schemes?
You also need to think about the organization structures that influence how users can navigate through these schemes. Should you use a hierarchy, or would a more structured database model work best? Perhaps a loose hypertextual web would allow the most flexibility? Taken together in the context of a large web site development project, these questions can be overwhelming. That's why it's important to break down the site into its components, so you can tackle one question at a time. Also, keep in mind that all information retrieval systems work best when applied to narrow domains of homogeneous content. By decomposing the content collection into these narrow domains, you can identify opportunities for highly effective organization systems.
However, it's also important not to lose sight of the big picture. As with cooking, you need to mix the right ingredients in the right way to get the desired results. Just because you like mushrooms and pancakes doesn't mean they will go well together. The recipe for cohesive organization systems varies from site to site. However, there are a few guidelines to keep in mind.
In considering which organization schemes to use, remember the distinction between exact and ambiguous schemes. Exact schemes are best for known-item searching, when users know precisely what they are looking for. Ambiguous schemes are best for browsing and associative learning, when users have a vaguely defined information need. Whenever possible, use both types of schemes. Also, be aware of the challenges of organizing information on the Web. Language is ambiguous, content is heterogeneous, people have different perspectives, and politics can rear its ugly head. Providing multiple ways to access the same information can help to deal with all of these challenges.
When thinking about which organization structures to use, keep in mind that large web sites and intranets typically require all three types of structure. The top-level, umbrella architecture for the site will almost certainly be hierarchical. As you are designing this hierarchy, keep a lookout for collections of structured, homogeneous information. These potential subsites are excellent candidates for the database model. Finally, remember that less structured, more creative relationships between content items can be handled through hypertext. In this way, all three organization structures together can create a cohesive organization system.
1. [Back] "How Much Information?" is a study produced by the faculty and students at the School of Information Management and Systems at the University of California at Berkeley. See http://www.sims.berkeley.edu/research/projects/how-much-info/index.html.
2. [Back] The tomato is technically a berry and thus a fruit, despite an 1893 U.S. Supreme Court decision that declared it a vegetable. (John Nix, an importer of West Indies tomatoes, had brought suit to lift a 10 percent tariff, mandated by Congress, on imported vegetables. Nix argued that the tomato is a fruit. The Court held that since a tomato was consumed as a vegetable rather than as a dessert like fruit, it was a vegetable.) "Best Bite of Summer," by Denise Grady, Self, July 1997.
3. [Back] It actually gets even more complicated because an individual's needs, perspectives, and behaviors change over time. A significant body of research within the field of library and information science explores the complex nature of information models. For an example, see "Anomalous States of Knowledge as a basis for information retrieval" by N. Belkin. Canadian Journal of Information Science, 5 (1980).
4. [Back] For a fascinating study on the idiosyncratic methods people use to organize their physical desktops and office spaces, see "How Do People Organize their Desks? Implications for the Design of Office Information Systems" by T.W. Malone. ACM Transactions on Office Information Systems 1 (1983).
1. [Back] In recent years, the business world has fallen in love with the term "taxonomies." Many biologists and librarians are frustrated with the exploding abuse of this term. We use it specifically to refer to a hierarchical arrangement of categories within the user interface of a web site or intranet. If you can't beat them, join them.
3. [Back] "Web Page Design: Implications of Memory, Structure and Scent for Information Retrieval," by Kevin Larson and Mary Czerwinski, Microsoft Research. See http://research.microsoft.com/users/marycz/chi981.htm.