Our Solution
Team INSIGHT is building an Image Similarity Search Tool to transform how users explore Northern Arizona University's Cline Library Special Collections and Archives (SCA). The tool provides a modern alternative to traditional keyword-based search by enabling users to submit an image and retrieve visually similar images from the archive.
How It Works
- Users upload an image directly on our web interface.
- The image is processed using a machine learning model to generate a vector embedding, a mathematical representation of its visual features.
- This vector is then compared against precomputed embeddings of over 100,000 archived images.
- The system returns the number of selected most similar images, displayed in a clean, interactive grid, along with metadata from CONTENTdm.
Screenshot of the Tool
Below is a preview of our functional web interface in action:

As seen above, users upload a photo (e.g., the Grand Canyon) and receive a grid of visually similar historic images from the SCA archives. Users can choose to tag the search as Similar or Non-Similar to refine results. A toggleable Explore Mode and adjustable result count slider offer advanced search customization.
Demo Walkthrough
Watch this walkthrough video to see the tool in action, from uploading an image to exploring similar results:
Key Features
- Intuitive UI: A clean, easy-to-use React-based frontend that lets users interact naturally with the tool.
- Powerful Backend: Flask and Express-based services handle image embedding, search processing, and database interaction.
- Embedding Comparison: Uses vector databases and cosine similarity metrics for fast, accurate search results.
- Scalable Infrastructure: Cloud-native design using AWS (S3, EC2, RDS) ensures performance at scale.
Why It Matters
Current search systems in the SCA rely heavily on text metadata and keywords, which are often incomplete or inconsistently applied. This restricts discovery and slows down research. Our tool enables users to visually explore the archive, even if they don’t know what keywords to use, making the rich history of the Colorado Plateau more accessible to students, researchers, and the public.
Impact
This project not only modernizes the archive search experience but also lays a foundation for future tools built around visual data exploration. By combining machine learning, vector databases, and modern web technologies, we’re helping bridge the gap between NAU's historical assets and the digital future.