This project was completed during the first semester of my graduate HCI program at Georgia Tech. I worked on a team of four, acting as a researcher and designer. We explored the retail space and developed a set of high impact user needs for purchasing produce. The user needs informed the design of a system to help shoppers select produce with desired ripeness. We tested our design with users and found it to be successful in several key metrics. All research was conducted under the Georgia Tech IRB
Our solution solves several key problems with produce selection. We believe that if the Ripenow system were implemented, shoppers would be more confident in their produce selections and would be empowered to try new produce. We also suspect that by giving shoppers another method to judge ripeness, produce food waste would be mitigated. Further testing is needed to determine true effects.
In the beginning our only direction was to solve a problem in the “retail space”. My team and I became interested in the grocery store context during a brainstorming session. We identified self-checkout, store navigation (and how information is displayed), and produce selection as potential directions for the project. We were also cognizant of the other ways people shop for retail and groceries. The market is saturated with apps and similar services that provide people alternatives to physical stores (or enhance the physical store experience). We included farmer’s markets in this “other” group that may have interesting problems.
The goal of this phase was to narrow the scope. We achieved this by speaking to a diverse set of people about how they purchase groceries, reading everything we could about modern retail, and observing all of the grocery options in our area (this includes apps like Instacart).
We first developed a script based on the major questions we had about grocery shopping behavior. This included things like pre-shopping and planning, frequency, experience, where, cost, food selection process, new info, and checkout. We recruited eight participants in our area (and attempted to form the group as diverse as possible in terms of gender, age, race, experience).
We analyzed the qualitative interview data with an affinity diagramming session.
Participants were aligned on some things, but varied in several areas. Most participants liked self-checkout but noted it's difficulty with large grocery purchases. There was a significant number of people who purchase "staples" each week. By and large people purchased their groceries in physical stores (as opposed to online). There was a divide in pre-shopping behaviors with some doing no preparation, and others taking stock and making a list. There was an interesting lack of common methods for selecting produce. All participants had their own ideas about if a fruit is ripe or not. We also observed frustration with the produce purchasing process, the lack of knowledge being the primary source.
Physical stores were our most observed space, with this being the primary source of produce for all shoppers interviewed. We visited stores to better understand the store layout and observe how produce is presented. By taking notes, pictures, and video, we pieced together the common themes while also noting interesting differences between grocery stores. We observed Kroger, Target, Walmart, Whole Foods, Publix, Sprouts, Aldi, Hmart, Trader Joe’s, and the Dekalb Farmer’s Market.
Large grocery stores tend to dedicate significant floor space to produce sections. The produce sections are typically found in the front corner of the store (as you walk in), but are sometimes in the back corner. Produce is organized in angled rectangular containers to allow customers to select from dozens of options. At Publix, the produce bins were stacked to save floor space.
The signage in each store varied from prices and names to educational information about how to use or store the produce. One sign at Publix read "SELECTING: Select firm Strawberries with bright, even coloring and green stem caps." The use of the word firm is interesting because it may not be socially acceptable (or sanitary) to open strawberry packaging and feel the fruit. The shopper is then left with only visuals, which can sometimes be misleading or may not be the best way to determine ripeness.
After reviewing articles and relevant literature, we found that retailers are investing heavily in the checkout experience. Technology trends opened the door to checkout innovation, but continued investment may be driven by a desire to cut labor costs. While checkout is a popular problem and receives most of the attention, our data indicates that produce selection can be equally frustrating. The lack of attention produce selection receives made it a more interesting problem to our team. Just as technology opened the door to checkout innovation, it may do the same for produce selection (and it may be driven by improved customer experience and a reduction in food waste).
This is not to say that nothing has been done on produce selection. Researchers C. Lang and T. Hubert worked to develop a sticker that had chemical color changing sensitivity that reacted to the emission of ethylene gas in apples that is present in different stages of the ripening process. This color changing sticker along with a color recognition sensor allowed for a sensory test that is able to estimate an apple’s level of ripeness. Other sticker-based ripeness indicators have been used to provide a visual indication of ripeness as shown below.
A 2012 article titled The Everyday Information-Seeking Behavior of Grocery Shoppers provided insight into shopping behavior with regards to information seeking, information gathering, and information acquisition. It was noted that shoppers seek out information, regardless of what information is provided.
We identified our problem area (produce selection) and setting (physical grocery stores), but lacked information about the mental process shoppers use to select produce. Our goal became to understand this process better. We also sought to categorize shoppers (to narrow the scope further) and ultimately identify user needs and fill those gaps. To do this we conducted a survey and contextual inquiry interviews.
We conducted an online survey using the platform Qualtrics. Our goal with the survey was to better understand our users produce selection habits. Initial interviews gave us rich qualitative data about a few specific individuals, but we didn't feel we could make generalizations about who are users were. We used the interview data to build a more specific survey. We narrowed our scope to selection habits. The survey also asllowed us to reach a broad audience. The artifact we hoped to come out of these surveys were personas with our various user types, and a greater understanding of the factors shoppers consider when selecting what to purchase.
While crafting questions, we found ourselves saying "If they select this, I want to know more.", which is one downfall of an online survey. We were cognoscente of the number of times we asked a follow up "why?," question, because we wanted to ensure a high completion rate. One way we circumvented this problem was by asking participants to rank their opinions.
We distributed our survey via social media, and through a subreddit on the platform Reddit. Our choice of distribution platform did bias our data, as younger people were more likely to find out survey. We received 161 complete responses from people across the United States. The analysis of the survey was conducted with Excel, Qualtrics, and Tableau.
We received respondents from all age ranges from 18-24 to 65-74, with individuals 18-34 making up the majority of our feedback. When asked, 140 out of 161 shoppers frequently shop at the grocery store for produce. Demographic data came along with a large amount of data in regards to shopping habits, methodologies, and preferences.
Possessing a large amount of nominal and ordinal data, we went to visualization in order to parse out useful insights. Once the data was in a workable form, it was imported into Tableau for visualization. In Tableau, we created a variety of tables and charts in order to see possible correlations and trends in the data.
There are many decisions that are made on-the-fly that cannot be captured in surveys or interviews. Contextual inquiries allow us the valuable opportunity to ask questions about selection decisions in real time. We can obtain the who, what, where, why, and how from one method in a short amount of time. The drawbacks are that the subject's behavior is modified because they are being recorded and asked questions. We told participants it was important to strictly follow what they would normally do.
We provided the participants with tasks and closely observed behavior. Two of the tasks given were shopping lists based on recipes one might find online. Some items on the lists were uncommon, and some items called for specific amounts (ex. cups, ounces, tablespoons). We were interested to see if participants would locate the items, determine the amount they should buy, and select the item they felt was best. In a seperate task, we asked participants to shop as they would normally. In total we completed four "recipe" based and three "normal shopping behavior" based contextual inquiries for a total of seven.
Raw data was primarily in the form of pages of semi-organized notes. Immediately after each of our contextual inquiries, we converted our notes into "I" statements on yellow sticky notes. We followed the standard affinity mapping technique by placing the yellow sticky notes on a wall and grouping similar ones. We quickly obtained higher level ideas from these groups and added these ideas to the map via sticky notes with a different color (green).
Problem statements were another important analysis technique we employed. Our goal was to obtain user needs, so we put our problem statements in the perspective of personas (Aware Shoppers and Experienced Shoppers). The created statements are shown below on the multi-colored sticky notes (above the green notes).
The sixteen problem statements with respect to personas are shown below.
These statements were later used in the 2x2 matrix activity. We placed the previously mentioned problem statements on a grid where the y-axis represents user need and the x-axis is frustration level. Ultimately this activity allowed us to determine the user needs that simultaneously have high impact and that cause a large amount of frustration.
Although our focus was on the top right quadrant, we still had takeaways from the rest of the matrix. For example, we do not want our solution to disrupt the upper left quadrant (e.g. high user need but low frustration).
With consumers’ reporting their use of search engines to supplement in-store information, we chose to investigate the online query landscape using Google Trends.
Seen above are the results to various queries over the past five years, as rated on Google’s proprietary “popularity” rating system. We chose a vague search statement, along with queries regarding two of the most popular (as provided by Google) products to search for online. These items were paired with the search situations that were most commonly provided by our interview respondents. This line chart shows the common trend of individual search items, “how to tell a (blank) ripe” commonly peaks yearly in the US during July, mid-summer. The “pineapple ripe” search peaks at the end of April to the beginning of May, which is at the midpoint when pineapples become in-season (March-July). “Avocado recipe” came up with a unique double hump, with peaks appearing in both late January and July. The unexpected January peak could be explained by healthy eating habits spurred by new year’s resolutions and the Avocado's popularity as a Super Bowl party food. This information provides useful trend data regarding consumer interests and consistent annual patterns that could be useful during the design process.
Moving into design and prototyping, our problem statement became: To improve the current produce selection process for young produce shoppers aged 18-34 who shop in-store, by providing necessary information and reducing negative affectivity. After a brainstorming session, we settled on four different concepts to explore further. We sketched the concepts, conducted quick usability testing, and settled on one concept. After increasing the fidelity to something just above a wireframe, we did more usability testing. Findings and procedures are described below.
Finally, we iterated once more and conducted another round of testing. This time, we choose a comination of methods including A/B testing, reaction cards, cognitive walkthroughs, and the System Usability Scale or SUS questionnaire.
After brainstorming, we chose a selection of our favorite ideas and each ideated on the topic. We marked part of the designs we liked with sticky notes, and talked about the strengths of each design. There were clear designs we found interesting from our discussion, so we decided to focus our efforts there. We moved into the initial testing with phase four possible solutions that could solve our prioritized pain points for our two user types. We refer to these designs as “SpookyCam”, “App”, “Kiosk” and “Physical Models” for easy identification and communication within the group (SpookyCam is named after the spooky-looking images hyperspectral imaging creates).
In this write up, I only include findings from the SpookyCam sessions, because it is the concept we ended up choosing. SpookyCam is designed to be a tablet-based produce selection tool using University of Washington and Microsoft Research’s HyperCam Technology. HyperCam has the ability to capture near-infrared wavelengths and uses this ability to assess the density, and therefore the composition, of produce in order to assess freshness. This system would ideally be implemented in grocery stores with the touch screen directly in front of produce bins and the live feed camera placed directly above the produce bin. Users are able to indicate their desired time of use for a produce item and then, using the HyperCam technology and a little computer vision, the SpookyCam system highlights the most applicable produce for the user based on their given information.
In our prototype, we did not use actual hyperspectral imaging or computer vision, but "wizard of oz'd" it with a tablet and fake produce rack.
We recruited 5 participants for this feedback session. During this session we had each participant test all 4 designs (for a total of 20 sessions) with a cycle of individual tests lasting 5-7 minutes. Participants provided feedback based on given tasks, the individual designs, and comparing the designs overall. Between each user session we randomized the order in which we presented the concepts in order to avoid fatigue, primacy, recency, and learning biases. All five of our participants were young adults within our defined age demographic (18-34), each with various domain knowledge regarding the produce buying process. While each session was being led by one team member, the rest of the group variably delegated the tasks of taking notes and pictures.
Here are some results, 4 out of 5 users stated that there is not enough information provided. Some offered information they would like to know regarding the apples: “I might want to know where the apples came from”.
The provided information was disregarded by 3 of 5 users, this may be skewed due to the task based nature of the situation however further testing would be needed to verify that hypothesis. “If I was in a rush I wouldn’t really read the information… I would grab and go.” Information needs to be salient enough in the display to be easily accessed, but also needs to not impede task completion. It will be avoided by users looking for the quickest process possible. Accessibility issues regarding readability of text need to be considered along with text length.
In this round of testing only singular produce items are highlighted, which raises the question: How will quantity indication be handled? One questioned whether they would need to go through multiple rounds with the system if they were interested in apples for both now and later. Produce such as apples are not necessarily a single use item, you can eat some now and some later. Our system needs to take this into account when prompting the user.
Concerns were made by one user in regarding to using color as such an important marker and how that would work with color blind individuals
5 out of 5 users stated little to no difficulty in regards to locating the single apple amongst the bin of 25 apples. Out of the 8 trials (2 per user) there was a 100% success rate of properly identifying the individual apple on their first try.
Further testing is needed to determine if ease of use is maintained when scale is increased and the produce selection environment is more accurately replicated.
For this phase, we updated the previous mock-ups to increase the fidelity. We implemented some of the changes recommended from the sketches, such as reordering the items on the side panel (about, buttons, price, etc.), adding some colors and making it look cleaner, and changing some of the wording. We also changed the produce item to kiwi, which we thought would be an item that participants would be familiar with, but not too familiar so that some of the participants would not know how to select it. Kiwi was also useful for its even color and shape, allowing us to further test how easy or difficult it was for participants to locate items on the bin.
Participants found the system exciting and surprising. Even if at first it was unclear what the system would do, users explored the screen, made a selection and were happily surprised to see their fruit selected on screen. Participants also found the system easy and simple. The system requires single-tap interactions to obtain a result, so users don’t have to go through lengthy menus to get to what they want.
We updated some of the wording for this iteration, but it could still be improved. Using the plural form of the product is not common in stores; one participant noted being confused by the use of “kiwi(es)” in the question. The use of "Now" for kiwi was also mentioned by several participants.
It was not obvious what the value and/or purpose of the system was. Some participants did not know what the point of using the screen was. They understood it, but couldn’t tell what it would do. It was also unclear what the highlighted item on the screen meant. Some participants saw the highlighted item and automatically knew that it was correct. Others took a minute to think and said it wasn’t clear enough. Even though the participants did not fully understand how the system calculates ripeness, they said they would trust the system more than they would trust themselves (unless the system pointed out a bad looking product).
In the previous phase of evaluation we created a produce rack prototype in order to test the difficulty of the visual mapping between the system and the physical rack. During this phase we created an additional rack modified with pegs in the center. We theorized that pegs would improve the mapping abilities of users based on their self reporting and observed produce locating techniques. Testing this was important to us because it relates to some of the core issues with our system. Issues like scalability, mapping between the screen and fruit, mental load related to selection. We also attempted to improve the word choice, onboarding, and other small details in this iteration.
We choose A/B testing in order to find if the addition of pegs changed the mapping techniques or had any noticeable effects, positive or negative, on a user’s confidence and difficulty levels when selecting pre-identified items. A/B Testing allowed this direct comparison between the two different physical models that we created.
Our A/B testing consisted of two elements along with follow up questions. This process occurred once for each of the two prototypes, while using the same tablet system. First the user was instructed to choose a piece of produce (symbolized by plastic balls). The interaction begins with the tablet interface and concludes with the participant pointing to or selecting the indicated item. Once complete, we asked follow up questions regarding the difficulty of finding the item, method of locating the item, and ease of use of the system.
For this evaluation we were interested in self reported qualitative data regarding confidence, cognitive process, and self perceived difficulty of the task. Participants spoke aloud during the process and were asked follow up questions about the task if further information was necessary.
Our participants met the same criteria as the previous set of usability tests. All were frequent produce purchasing individuals between the ages of 18-34. They came from a wide range of cultural backgrounds and had a variety of confidence levels in produce selection with a range of produce from different stores.
In order to retrieve useful insights from the A/B Testing evaluation, the team took the notes from the user tests and coded all recorded comments on usefulness, and those that involved affect, by their positive and negative affect. We also selected all questions that were posited during the evaluation process. Charting out this information allowed us to more easily evaluate and understand the opinions that were presented and that can affect our design process moving forward. One drawback of this method is the lack of a full transcript. The information coded is based on notes and therefore may not be complete with all comments and questions that were voiced by the participants.
Regarding the pegs, we received two primary types of information. About half of the users tested reported that the pegs assisted them in locating the preselected items, and the second half noted that the pegs were not a factor in the selection of the items. This second set often reported that the pegs were not easily viewed or unperceivable on camera and they were disregarded during the item selection process. Other opinions came regarding the system as a whole. Users gave comments both positive and negative towards the additional information provided on screen regarding the product. Users also voiced useful information regarding the learnability of the system. Having completed the tasks multiple times, users were quick to speak regarding the improvement following multiple attempts and the high learnability of the system.
To supplement the A/B testing, we facilitated a reaction card activity with our users. The activity is based on the 118 product reaction words developed by Microsoft as a part of their Desirability ToolKit. According to an article written by Kate Moran at the Nielsen Norman Group, the benefits of the activity allow for a controlled vocabulary amongst users when judging the aesthetics of an experience. Likewise while 118 words are on the original list, Moran suggests narrowing the number of cards down to not overwhelm the user, while keeping a mix of positive, neutral and negative words.
For both of our testing environments, with pegs and without pegs, positive words were chosen more often by our users. In fact there are some cross over words that appeared high on the ranks for both tests, such as Satisfying. From a high level system perspective, this indicates the conceptual idea of have a camera selecting fruit was generally been received as positive.
We picked the SUS in particular because it is a proven and valid method to measure these aspects even with small samples (and we were planning to have between 5 and 7 participants). We knew that although the confidence intervals are larger for small scale tests, the SUS yields results that are enough to give us an idea of how our system performed. Also, we wanted to keep the evaluation session short so that we were not tiring the users and because the SUS is quick and easy to fill out, it seemed like a good choice. We are aware that the SUS cannot determine why a user selected a particular response or what specifically in the system caused that reaction. To make up for this, we complemented the session with debriefing questions and allowed the user to expand on any of their responses and voice any outstanding opinions or observations.
To analyse the results of the SUS, we input the data into a spreadsheet and calculated the average score. We obtained a score of 82 (above 80.3 is in the top 10%).
The cognitive walkthrough is an expert evaluation method based on theories of learning by exploration (Nielsen et al., 1994). This method closely fit with our goal of evaluating learnability of the interface. With a cognitive walkthrough, we are able to move through common user tasks step-by-step and tell a story about how the user may succeed or fail a given task. The story focuses on the critical aspects of the design including those that link the user's expectations to the actual actions to be performed, and the feedback given.
Our cognitive walkthroughs provided us with a binary pass-fail result for each step based on the four questions we asked. For example, we found that users will try to achieve the right effect on our onboarding screen, and so this step passed that criterion. The goal is to fix those steps which fail one of the questions and produce a failure outcome for a hypothetical user. Our interface has only a small number of steps, and generally performed well, but there were some issues. For example, a potential user may not know what effect to try to achieve with the interface. In other words, upon first use, the value of the interface is not clear. Another issue noted in our cognitive walkthroughs concerned the feedback given when fruit has been selected. Users may not understand why the particular fruit is being selected for them - or what the criteria for selection were.
During the A/B Testing trials only half of the participants ended up using the pegs in order to locate the items preselected. This was reported as being due to the visual homogeneity of the rack, with the tan pegs blending in among the yellow and red balls. In further iterations of testing we can use actual fruit to strengthen the fidelity of the prototype, as well as using distinctly colored pegs in order to increase their visual salience on-screen. Real fruit is necessary in future testing in order to avoid any mapping that is due to the patterns created with multiple colors of plastic balls in a single display. Some individuals who were not interested in the information regarding the fruit skipped over it after seeing the heading, this is positive as it does not hinder or stunt their ability to use the system. The information did have positive feedback by those who were interested in learning more and went into reading the content. Some confusion was present regarding how the system worked. Providing non-technical information regarding the workings of the system could prove to build confidence in the users.
From the SUS, we can tell that overall, the system is fairly easy to use and users will learn through using it. Keeping the interactions short and the screens simple and clean could be contributing to this. Most users agreed that they would like to use the interface frequently and that they did not have to learn much or need assistance in order to use it. Some users scored the system low on “feeling confident” and how well the system is integrated. Through follow-up questions we found that they would like to know more about how the system makes a decision in order to increase their trust in it and that some of the information provided seemed conflicting. Also, a score in the SUS above 68 is considered above average, and our system scored 82.
The team! (L:R, Ethan Graves, Maria Wong, Cooper Colglazier, Morgan Ott)