Classes of Information in Structured Cyberspace
by Celeste Newbrough


International Conference on Internet Technology
August 15, 1998
Seattle, Washington, November 19, 1998.

Keywords: information access theory, internet technology, information science, indexing, knowledge-based systems, search engine design, metatext or metadata

Extended Abstract

Efforts to render information accessible over the Internet will be increasingly challenged as the totality of Internet information indexed by search facilities promises to increase geometrically, producing a plethora of information and a parallel increase in information glut.

In addition to the geometrically expanding information base, advertising strategies of search engines providing preferential-access clients and web interfaces, distort the structure of information in a manner that increasingly occludes the trajectory of information access points from publisher to user, and vice versa, through the search process.

Given the amount of information glut and distortion, even a minor operational gain in structuring of information can provide an important edge in the efficacy of a search facility or the effectiveness of a web publisher. Current information access programming needs to be more informed by the full spectrum of classic techniques of textual analysis and classification that can serve as the foundation of a theory of structured access referencing (SA) for the Internet. Programmers are not information specialists of the MLS variety; if they were, Information Glut would not have reached the current state of unmanageability.

Articulate theories of Internet information (I) are needed to assist in the design of more effective search and retrieval.

Several useful theories of Internet search and retrieval have been proposed. The strengths of these theoretical models will be tested by their adequacy in practical information access design issues, specifically in dealing with information glut (IG), and access distortion (AD). The current theory most resembles the Information Relevance Model, though methods proposed by and derived from these two models (therefore the underlying premises) are quite different.

SAR
My own theory of structured-access referencing (SAR) is proposed and discussed in the current study.

  • Structured-access referencing (SAR)assumes that development has preceded articulated theory, a sure sign of the applicability of this strategy.The solution to information glut will be forged by the creation of meta-indexing search facilities that will act as screens for both user and publisher. Recessed realms of metadata, will enhance metacodes and the use of thesauri, indexes, and references. Web sources as well as information must be coded into non-valued (eg. nonmoralized) typologies, and search strategies should be codeveloped to type the kind of information needed. For example, a scholar who is interested in strategies of sexual selection is not interested in accessing porn or dating sites, and vice versa.

    The differences seem obvious between a traditional back-of-book index and a search and retrieve operation over the Internet. What is not so obvious is the similarity of the underlying principles and mental processes involved. For example, a rich field of theory exists on textual information structure based on identifying scarce information (foreground) from information glut (background) in terms of potential user needs. (4). A particular word, phrase, or name, may be mentioned on numerous occasions in a text, yet on some occasions such mention is pertinent to the user's search for information while on other occasions the term or phrase imparts no information of use within the context of the user's search. Further, a term most likely to be used in searching for an information field may not exist at all in the text. Information classification practice has thus incorporated a set of principles and mental processes by which foreground is distinguished from background, involving analysis of the structure as well as the content of the information. Such processes can well inform Internet software developers as well as publishers.

    The more complex the material, the more structured access referencing (SAR) is required as a guide to the information contained. Structure is indeed the attribute differentiating an index from a concordance (a list of words and the pages they appear on). The latter may be useful for technical materials of limited scope but is cumbersome and distracting as a guide to information-rich materials. Thus far, search engine design has been largely based on content analysis rather than structural analysis. It is precisely this orientation that produces a great deal of nonsensical or irrelevant retrievals in response to a user's search query. The typical Internet information or document retrieval system consists of an evolving archive (neoarchive) of documents; a population of users each of whom makes use of the system to satisfy their information needs; and a set of instructions that compare the representation of each user's query with the representations of all the documents in the neoarchive, retrieving "relevant" documents.

    Current Internet information designs are mostly based upon the conceptualization of non-coded or content-related information as data, ie., as equally valued contiguous points in a monolithic realm. Though some theorists mention the idea of "valued" or "relevant" information sought by the user, few efforts have been made to distinguish the characteristics of such information from the general background of Internet information-as-data. An exception is the information relevance model of Mathe and Chen.(5)

    We attempt a schematic map of the structure of cyberspace. Cyberspace is conceptualized as a realm consisting of various classes of information. The key to effective search facility design is the identification of which class a given information access point belongs to in terms of a given user. In other words, we will attempt to structure Internet information in light of classical indexing principles. The current effort is clearly only a beginning effort in an evolving cathexis of theory and practicum.

    We begin with a evaluation of the current state of information access over the Internet, including a typology of Internet Actors (IA's). We will review some of the most important theories informing current Internet information retrieval technologies. We will move on to propose an abstract and intuitive classification of Internet information, identifying several functional classes, among them: Coded information (CI); Information other than code (I); Scarce Information; (SI); Infoglut (IG). We will further identify operational subsets of these classes. For example, an important potential subset of coded information is Metatext (MT) which is a dual membership subset of CI and I. A rich field for analysis concerns developing strategies for defining metatext when it is not stipulated, and for validating alleged metatext. Access distortion (AD) can be considered a subset of IG.

    We will go on to analyze the needs and conflicting informational interests of classes of Internet players (IP). This includes at least the classic seeker of information, the user(s) (U), and publishers who can be assigned bisectoral classes based on their goals (eg. commercial/noncommercial; instrumental vs. informative, etc.) and their identity (eg., adult vs. child). Much of infoglut involves inadequate systems of identification of IPs in a given informational interaction.

    Finally, an effort will be made to propose functional and operational methods by which such structural classification can be incorporated by both information publishers and search facilities.

    To summarize, we propose a structured-access reference theory of Internet information based on a structural analysis of the Internet and guided by classical principles of indexing. SAR identifies several major classes of information residing on the Internet and analyzes how in the realm of cyberspace, the user's search for scarce information (SI) intersects with various strategies of other internet players (IP) including information publishers and search facilities. Finally our desire is to construct a model of cyberspace and associated classificatory schema that will guide in creating and evaluating strategies of information access design.

    Copyright Celeste Newbrough, August 15, 1998; revised July 14, 1999

    Notes:
    (1) Pirolli, Peter and Stuart Card, "Information Foraging in Information Access Environments (1995).
    (2) Pirolli, Peter, Patricia Schank, Marti Hearst, Christine Diehl; and Cutting, D. R., Karger, D. R., Pedersen, J.O., and Tukey, J. W. (1992).
    (3) Tonta, Yasar, Analysis of Search Failures..." (1992).
    (4) Newbrough, Celeste, "The User Friendly Index." (1990)
    (5) Mathe, Nathalie & James Chen, "A User-Centered Approach to Adaptive Hypertext based on an Information Relevance Model"

    *Classes of Information in Structured Cyberspace is the property of Academic Indexing Service.

  • For more information contact Celeste Newbrough (510) 601-6039


    or contact the author via e-mail

    More information on the author.