Full-text search capability not without problems, scholars say

While full-text resources have become ubiquitous in the seemingly infinite library of book digitization projects, the ability to search full-text sources is not the magical tool some scholars, laypersons and even some librarians would have us believe, say two University of Illinois experts in information science.

The perception that plugging a few keywords into Google will yield a universe of relevant information is somewhat mistaken, say Kathryn LaBarre and Carol L. Tilley, professors of library and information science at Illinois.

"There's a lot of excitement about the availability of full text," LaBarre said. "But the perception often is, 'If you have full text, why do you need to do anything else other than providing good search capabilities?' Well, it turns out that full-text search isn't always king, especially for something iterative like folklore literature."

According to the scholars, the result is that too often the value of providing systematic, reliable and meaningful access to the contents of the texts is negated. Libraries and archives themselves play a role in this negation when, in an effort to save time and money, they are forced to rely on "good enough" records to provide access to the resources in their collections.

Unfortunately, those records of convenience are frequently too minimal to provide adequate access, and "typically don't reflect the complexity of the resources," Tilley said.

"For example, descriptive metadata for Google books is harvested from many different sources, and thus reflects various inconsistencies both from original sources and from the process of aggregation," LaBarre said. "This can make it difficult to find specific things in an easily searchable way."

Some of these difficulties stem from relying on a primarily computational approach to creating full metadata for digital resources, LaBarre said.

"There are advantages to having a human involved in such a process, although people certainly can't do everything," she said. "I know Google is hard at work to address this, and our project, Folktales and Facets" – a project to develop an enhanced bibliographic record format for folktale resources – "is working on this as well."

In order to provide better access to materials like folktales, LaBarre and Tilley argue that alternative models for access and discovery systems need to be developed.

"There are better ways to help people find what they're looking for in those collections," Tilley said. "In folktales, the classic example is where you seldom get any acknowledgment of the geographic distribution of the tales. You might think that there's a trickster story associated with a certain region, but you have no way of knowing without an extra layer of classification. So our goal is to make things more findable in ways that people want to be able to find it."

According to the scholars, understanding the users' information-seeking goals and tasks can strengthen the design of search and discovery systems and enhance access for these resources.

"For the practitioners, the storytellers and librarians, they would like to be able to not just search for the words in a story that might appear, but they would also be able to judge whether a certain text is appropriate to use with 5-year-olds," Tilley said. "They would be able to tell whether a certain story would be useful if they were putting together a program about, for example, compassion. If not, they would be able to find examples of those types of tales."

The iterative nature of folktales – that is, when the narrative grows larger in each telling of the tale – can also present unique challenges to scholars and librarians.

"They'll try Google searches for phrases or terms that they know are particular to a particular region or story type," Tilley said. "So not just 'Once upon a time …' but all the variations on 'Once upon a time …' that might be indicative of a regional approach. And that makes it that much more difficult because it requires a much higher level of knowledge and expertise to even know how to look for those things."

Folktales are but one example of resources now increasingly available in full text format. But because the most salient aspects of these resources are often not well supported by available search mechanisms, which must then interact with "good enough" records, the net effect may well be the opposite – decreased availability, the researchers say. Oral histories, archival materials, museum artifacts, musical scores and other types of texts are similarly difficult for users to locate.

"The number of available digitized texts is steadily increasing along with the pressure to digitize more and more content," LaBarre said. "So the question becomes, 'How do you create an entry to this material that is approachable for folklorists, archivists and scholars'? Those are different constituents with very different needs but with some fundamentally similar approaches to searching for material."

"Even though we're focused on print materials right now, more scholarly users will have an interest in field recordings, images and multimedia resources," Tilley said. "There are certainly non-textual media that will also be interesting to consider."

Source: University of Illinois