Chapter 1. Introduction
The overall aim of this project is to develop a framework for the evaluation of Internet Search Engines with an emphasis on a user-centered perspective. Towards this end we adopt the perspective that user satisfaction is a complex and multidimensional construct which is determined by the user's task requirement. Measures based on the resulting criteria provides a conceptual framework for system evaluation in which user satisfaction is characterised as a function of system-task fit expressed in a moderating context of the user requirement. The evaluation framework was developed based on a theoretical understanding of previous approaches to evaluation and some empirical work was undertaken to test its feasibility. The main objective of this feasibility study thus was to understand how users are satisfied and on what criteria. By focusing the measures for each criterion on the features of the system designed to support users in retrieving information, use of the evaluation framework may provide system designers with further insight into areas for development. In addition, the incorporation of a moderating context of user and task in the framework, as a possible influence on user satisfaction with system performance, is intended to provide a better understanding of why a system can receive varying evaluations across differing contexts.
The structure of the report is as follows. Chapter 1 provides a general introduction to search engines and their evaluation and provides the rationale for our stated aims and objectives. Chapter 2 charts the development of search engines to highlight the major factors which may impact on their performance. General observations on search engine usage leads us to focus on the more novel features which concentrate on helping the user phrase more effective queries and navigate through the results displayed, those which, in other words, may help the inexperienced searcher get to the information requested. This categorisation of features, especially those with which the user interacts during the retrieval process, provides us with important clues towards an evaluation methodology. Chapter 3 reviews methodologies for IR system evaluation ranging from a system perspective of evaluating performance to a user perspective of gauging satisfaction. The resulting broad definitions of the criteria on which evaluations are based structure our review of methodologies for the evaluation of search engines. The intention is not to provide a comprehensive review but to set the goal of evaluation. That is, we explore the why and what of evaluation, against the criteria on which the evaluation is to be based, which in turn defines the system and context parameters in the design. In Chapter 4 we construct a framework for evaluation based on the dimensions of the information retrieval task and define the criteria on which we develop a set of user satisfaction measures with system-task support. Testing of the framework is described in a small-scale implementation and initial results presented. We conclude the report with recommendations for the refinement of the multidimensional framework and its use as a methodology for the evaluation of search engines from a user perspective.
1.1 Rationale for a user-evaluation of search engines
1.1.1 Internet Search Engines
Internet Search Engines have proliferated with the growth of the Internet itself. There are a growing number of major general purpose search engines from both established commercial firms in the industry and from new up-and-coming technology firms, often emerging from university departments. A small number of these may ultimately become the winners, but prior to reaching this status the current state of play is of development and competition (advances in search engine technology are reviewed annually at the Infonortic's Search Engine Conferences: Wiggins and Matthews, 1998; Wiley, 1998; Feldman, 1999; Sullivan, 2000).
Search engines are often categorised as robot-driven which respond to a user query or directory-based systems that guide users through classified lists. Whilst this distinction is increasingly blurred with catalogues and full text indexes coming together in a single service, the popularity of the query-based approach is evident. A recent survey commissioned by RealNames (Sullivan 2000a) revealed that 75% of frequent Internet users use query based search engines and 70% of those surveyed said they know specifically what they are searching for when they use a search engine. Given their popularity, in this preliminary investigation of a framework for evaluation we focus on query-based search engines. This decision does not preclude development of the framework for the evaluation of subject-based (or combined) services. However some adaptation would be required to re-define the criteria for evaluation within the framework to reflect the general browsing task which subject- based services support with associated features such as visualisations of the information space. Focusing on query-based search engines alone allows us to delimit a range of particular key features in defining the criteria for their evaluation.
In general, Internet search services are built using 'spiders' or software programs to create and maintain a proprietary index of web documents, and a search engine, the underlying technology for retrieval, and the interface for users' search specification. Search Engines exhibit a number of key characteristics which have enabled them to develop rapidly and gain popularity for accessing global networked information. They are fast, robust, scalable, sustainable and use a variety of techniques derived from 30 years of research in IR to achieve their performance levels. However, considerable variation exists between the engines - in the techniques used for indexing, ranking, the search features, and the display of retrieved results - all of which can affect performance. Indeed, in such a context, it is not surprising that each engine is developing characteristics which may allow it to stand out from the others. Excite touts its concept search capability targeting the consumer market; Infoseek with its emphasis on company information targets the business user; NorthernLight offers serious information searching and has received much attention in the literature with its custom folders offering a visual overview of a search (Feldman, 1998). Suggestions are still being made for the next generation of search engines, and it is in this context that we can state that it will be some time before the technology reaches its users' expectations for finding precise information.
However, as Evans (in Wiggins and Matthews, 1998) and others (such as, Larsen, 1997; Berghel, 1997) have noted search engine developers may be approaching a fundamental limit in terms of the capabilities of their systems. Evans describes an uncertainty principle, holding that IR systems cannot automatically accommodate all idiosyncratic viewpoints saying "the best we can expect is for systems to be tuned to the expectations of the masses, with rapid adaptability to a given individual's viewpoint" (p. 16). Feldman (1998) warns that the problem facing developers is more fundamental, stating that the "Web searching market is extremely fluid and undefined. Hard as it is to design a television set or a car that everyone will want, at least manufacturers of reasonably standard products know why people will want them and what they will do with them. The situation is much less certain in the online world, in fact it is downright murky" (p.3).
The uncertainty which surrounds users' expectations and usage of search engines gives rise to the question as to how we can evaluate the impact of their developments on performance. More specifically, it is critical that we have some means to measure the impact system features have on users' satisfaction with respect to what they want to do or achieve with these systems.
1.1.2 Evaluation
Evaluation is a process by which the effectiveness of a service or system is assessed, in particular to establish the degree to which the goals and objectives are accomplished (Harter & Hert, 1997). The general objective of an IR system is to retrieve relevant documents for a given query, whilst at the same time to minimise the user effort in locating needed information. Thus the evaluation of a retrieval system can be seen to encompass many different viewpoints, from the mechanical (does it retrieve relevant documents for a given query, including the impact of design such as the use of natural language or controlled language indexing?); to the human (does it provide useful, usable tools, and how should the interface be designed to simplify user-system interaction?); through to the utility perspective for a given group of clients (does it deliver the information in a convenient form, in a timely fashion?) (Large et al., 1999). Evaluation from a user perspective is so broad, however, that it must embrace all these viewpoints. Further given the situation described above, in which systems are designed to meet a spectrum of users, information needs, and search behaviours the impact of these evaluation views on user satisfaction is likely to vary considerably across different contexts.
A broad comparison of the criteria and measures of user satisfaction proposed for the evaluation of retrieval systems set against the possible system and external (or contextual) parameters illustrates the potential for a highly complex evaluation situation. This is done in Table 1 which compares the following researchers' criteria in the corresponding typeface:
- the six criteria for the evaluation of a retrieval system, as identified in Cleverdon (1978)
- the criteria for the evaluation of interactive retrieval systems from a user perspective, as identified in Su (1992)
- the criteria for an evaluation methodology of web search engines, as identified in Chu and Rosenthal (1996)
- the recent recommendations of criteria for the evaluation of search engines in Oppenheim, et al., (2000).
Whilst this is a broad comparison, it highlights a number of important points with respect to the aim of this project. First, these criteria and their measures have been consistently used in evaluations spanning four decades. Second, in proposing the criteria for the evaluation of search engines certain adjustments or indeed alternative measures are recommended. These are discussed in more detail in the review where we focus specifically on the difficulties which arise in validating the use of traditional recall and precision measures when computed from an Internet retrieval situation as distinct from the test conditions of their origin. Third, while relevance based measures dominate, other factors such as the utility of the retrieved results, and the user interface may affect user satisfaction and thus have an important role to play in users' selection of systems. Further, the table sets a range of system components and user/context parameters against each criterion to attempt to show the role of each. For example, the technology comprising indexing techniques and retrieval algorithms could impact on retrieval performance.
These parameters become increasingly complex as the measures become more user-oriented as not only do they define what is evaluated but equally the parameters impact on the measures for the criteria. It is obvious, for example, that the content of the database or index searched will partly determine the items retrieved, and thus impact on a user's perception of the usefulness of the service in meeting the objective to retrieve useful items. However, the user judgement of utility, based on the value of the retrieved items, is distinguished from the criteria of aboutness used in the relevance measures of recall and precision. A user's judgement of system success based on utility may be influenced by a number of user factors, such as the context of the query, and the psychological state of the user. Thus such a judgement could be partially determined by a range of system factors, such as the speed of operation, the quality assurance of results or the presentation of results. The evaluation of the usability and functionality of search engines likewise must involve the user in some investigation of the search process the system supports and the impact the system features have on search behaviour as well as the retrieval outcome. The effectiveness of retrieval is partially dependent on the searcher use of the search features to formulate a query statement facilitating its intended interpretation. For example, a lack of precision may be caused by a searcher's reluctance to expend effort in narrowing a search. Indeed, the interface (and non-retrieval devices) may affect the whole mode of interaction for the user and hence influence the demands the user indirectly puts on the back end search technology. A further indication of the layer of complexity added as we move from the more abstract performance measures to those which involve the user lies in the consideration that the characteristics of the users' tasks may also influence their search behaviour
Table 1 Comparison of evaluation criteria and system/context parameters
| Evaluation Criteria | System parameters | User context | |
| 1. Coverage (proportion of literature on a topic) |
|
Composition of the index will affect the performance of the search engine | |
|
2. Recall (retrieve relevant items) Precision (hold back non-relevant Items)
|
|
The indexing language, exhaustivity and specificity, and retrieval mechanism will affect performance | Query formulation, and search strategy |
|
4. Response time (from request to results)
|
|
As above, and organisation of stored documents, size of collection, file format will affect response time | As above, and type of query |
|
5. Utility (worth of search results, and value of search results as a whole)
|
|
As above | As above, anduser/ information need context |
| 6. Format (presentation of the search results)· USER SATISFACTION with output format |
|
Type of display of output will affect performance in an interactive system | As above, and specifically user ability to judge document relevancy |
| 7. User effort (expended to achieve a satisfactory response· USER SATISFACTION with search interface and online documentation |
|
Interface, facilities for interaction with system and guidance | As above, and specifically the usage of interactive functions and user search behaviour |
A possible consequence of the complexity of such interrelations among system and contextual parameters is the use of satisfaction as an evaluation concept. The construct of user satisfaction used in system evaluation aims to achieve such a summary expression of users' perceptions based on the usefulness of a system. Its appeal lies in its use as a surrogate measure of system effectiveness where a system is deemed to be successful if users' evaluations along various scales of satisfaction are at a maximum. Research into user satisfaction and its relationship (as a dependent or independent variable) to system acceptance and actual use and behaviour is extensively covered in the information systems management literature (Gatian, 1994; Parasuraman, et al., 1985, 1988; Goodhue, 1995). Yet relatively little work has come from the IR community in the definition of a satisfaction construct and the validation of user satisfaction scales and surveys (Harter and Hert, 1997. p38). A possible reason is that, in the context of user information searching, how users themselves evaluate system performance may be on multiple dimensions. Thus an expression of satisfaction on which system evaluation from a user perspective is based is a complex construct determined not only by a range of system influences (both the performance output and mode of interaction) but also influenced by a range of user contexts and requirements. In 1977 Tessier et al. made the assumption that user's satisfaction will be a function of how well the product fits their requirement and is experienced within the framework of their expectations. While this implies how satisfaction should be measured, our aim is to develop these assumptions to maxims on which to base the development of a framework for the evaluation of search engines.
1.1.3 Development of a framework for user-centered evaluation of search engines
A conceptual framework for the evaluation of search engines from a user perspective is thus proposed which is based on the notion that user satisfaction with a retrieval system can be characterised as a function of system-task fit. To this end, we identify a general task model of the retrieval process which it is assumed all retrieval systems will aim to support. Each of the steps in the process model provides some statement of user-requirement, what the goal-directed user is trying to do with system, and suggest the dimensions of user satisfaction criteria. By linking the evaluation criteria of system effectiveness, efficiency, utility and interaction to the task dimensions the measures devised for each were identified from the system components or features which supported the user in their task. Finally user-context parameters were identified for the analysis of their impact as a moderating context for the evaluation framework.
To test the framework the following measures were used:
- User satisfaction measures of system effectiveness, efficiency, utility, and interaction. Users were asked to rate the system based on degree to which the system supported them in the associated task dimensions
- Users and their information tasks were characterised based on questions which captured the nature of the information query and users' intent, amount of prior knowledge, and expectations.
The data collected was analysed with a view to understanding user evaluations of system satisfaction and thus addressed the following propositions:
- User satisfaction is expressed as a multidimensional construct based on user requirement (what the user is trying to do). Correlations of user assigned system ratings on various scales were analysed to find which dimensions and measures appear to be the most important in defining user satisfaction.
- User satisfaction is a meaningful evaluation of system characteristics. Within the constraints of a feasibility study we analysed satisfaction ratings across search engines to speculate on a possible link between user satisfaction and the system features which support users in their tasks.
- User satisfaction is expressed in the context of an individual's information need. The impact of the contextual characterisation of user and information query was explored to find the extent to which the importance of given system features is determined by user requirement and thus lead to a difference in the system evaluations obtained.
[ Previous Section - Abstract ] [ Next Section - Chapter Two ]
