Project Description
Cline Library’s Special Collections and Archives (SCA)
Cline Library’s Special Collections and Archives (SCA) is home to unique collections that help tell the stories of the Colorado Plateau. These images, manuscripts, diaries, maps, and other items provide a valuable resource to researchers, students, and others with an interest in the many topics contained in their physical and digital collections.
Whether your interests are in river guides of the Colorado River (before the dam) or stories about trading posts across the Navajo Reservation, the Cline library can serve as a unique source to aid in your research.
Challenges of Navigating SCA's Digital Archives
SCA archives contain a vast array of digital images that serve as a valuable resource for research and various other educational purposes. Over 100,000 items have been digitized and made available to the Cline Library, allowing for continued growth and preservation of their digital collections up to this point. As it currently stands, CONTENTdm manages all of this, providing traditional filtered search capabilities to the library. The Cline library's current search interface can be found here.
While the traditional search method provided by CONTENTdm can serve as valuable tool for navigation, many instances still arise where a more intuitive method of searching would work best to retrieve sought out info in such an extensive collection of historical images.
It's not uncommon to be unsure of the best keywords to enter or even the the correct catagorization for your query. As a researcher it can often make the most sense to build off of an image you already have in mind, or even an image that you've just taken to quickly retrieve images that might have a similar catagorization or feel.
Proposed Solution: Image Similarity Search Application
This project proposes the development of an image similarity search application that enables users to upload an image and find similar images within the SCA digital archives. Snapping a quick photo of Old Main and uploading it to this application would return other pictures of Old Main in the collection in addition to any buildings that might share similar characteristics. It's our hope that this application would serve as a valuable tool that could be used in conjunction with filtered to searching to enhance the user experience for faculty, researchers, and students alike.
Technical Approach
Recent strategies to accomplish this often involve the creation of “image embeddings” or a “vectorization” of the images which are then stored in a database. These vectors can be created from image metadata or the images themselves or both.
We envision a responsive, web-based application that would give users the ability to upload images as the basis for whatever queries they may have within the SCA archives. Search results would be those images most similar to the uploaded image and could be presented in a grid. Similarity would be determined by comparing the calculated embedding of the uploaded image to that of all the other images from the archive using a similarity measure such as Cosine Similarity. Clicking on an image would automatically reorder the grid, so that the “seed” image would be in the top left hand side position, with the next most similar beside it, and so on, in left to right reading order. As an extension, the inverse of this type of interface could be used to remove groups of related “unwanted” images. We would be able to provide a dataset of images from which embeddings could be created.
The initial concept for this project was provided by our sponsor, in the form of a Capstone project proposal.