What I learned conducting user research for an AI service

AI in Government

What I learned conducting user research for an AI service

Building AI services means new ways of working. CYB’s User Researcher Truly Capell writes about what she learned working on the FCDO’s first public-facing AI-enabled service

Truly Capell

User Researcher

24 Jul 2024

You may have seen that Caution Your Blast Ltd (CYB) recently launched the first ever public-facing Large Language Model (LLM) service for the Foreign, Commonwealth and Development Office (FCDO) - a triage tool that leverages LLM technology to answer questions from British nationals with pre-approved FCDO content.

It has been a huge team effort between CYB and FCDO, and one with plenty of lessons learned along the way. Building AI-enabled services is new for everyone - there is not yet a “best practice” way of doing things like there is with traditional online services. At the moment, we are learning as we go along, particularly when it comes to my job in user research. I am going to write about my journey with this service - my initial assumptions, the challenges and how I overcame them - to help anyone who might be interested in how user research can be conducted in the new world of AI.

Starting out - were my initial assumptions correct?

Reflecting back on this work, I remember how apprehensive I was about using AI. I had a lot of questions and assumptions that users didn’t want anything to do with AI - but was I right?

Using our CYB research philosophy to guide my work, I started with some desk research to see what others have been doing in this space. I hit a challenge here - I couldn’t find any real user problem type examples. Everything was centred around “cool new features” for commercial products to increase subscriptions.

This meant I didn’t have any ideas to springboard off, or baselines to compare to, or even lessons learnt from others. We were one of the first to do this, which meant we had a lot of experimentation, and learning from mistakes ahead of us.

As a user-centred design team we were left with the question of “how do you de-risk the service when you don’t have the bones of it?” I needed to learn from users about their expectations, perceptions and in which ways the service could be most usable, without having any functionality. That is not an easy thing to do!

Users expect AI to be everywhere

And we weren’t going to have functionality anytime soon, so we had to be innovative and make the most of the time and research participants we had. We decided to use a technique called Wizard of Oz testing, where you replicate how the AI enabled service would behave by using predefined scenarios - by designing a couple of screens that made it seem like the service was intelligently responding to your enquiries, you’re able to gain a lot of insight. This method is quick and low cost. Personally, my favourite insights were users’ knee jerk reactions to using the service - it turned out that users expect AI to be everywhere, which was most surprising to me.

Hearing things like “I thought everything had AI in it these days?” made my initial apprehension melt away. Users expect AI to be ingrained already, and I began to realise I was fretting over the wrong thing. We didn’t need to gently introduce AI to our users, or worry about telling them or not telling them we were using AI, we had something much more complex to untangle…

While users expect AI to be everywhere, they don’t necessarily trust it.

Our insights surfaced users do not trust AI to get them what they need, with some users seeing AI as a blocker to having contact with a human, and having to “fight with it until it passes you to a human”. This was our biggest hurdle to overcome. We had to play with design and content to give users confidence that they would first get a useful answer to their inquiry, and that the answer would be instant.

We iterated content and page layouts to toe the line between something that looks like a search, so the upcoming answer feels instant, and something that looks like a webform, so the upcoming answer feels detailed. On reflection, this was the start of our new design challenges, creating design patterns that do not exist yet.

My second most fascinating insight was that users write to AI like it’s a human being. Even once our users realised the service used AI, they were still including pleasantries such as please and thank you. I think this gives us a lovely insight into how our users think AI works, similar to a human brain, which was reinforced by one user who told us: “I better be polite, because when AI takes over the world I’ll want it to remember I was nice”. Anecdotal but informative too…

Underrepresented groups are worried about bias

My final insight proved some of my initial worries. We started to notice diverging insights when we did research with users from underrepresented groups - we found that they were significantly more apprehensive of using AI. Digging into this further, we learnt this is heavily linked to bias that AI might have, with these users being negatively affected from bias in their daily lives. If AI models have been trained on existing biased data, there is the potential to reinforce the bias these groups suffer from.

While I don’t have a sparkly solution that we came up with, I can share the mechanisms we put in place to mitigate this risk and increase confidence for users. First, it’s important to remember our service doesn’t generate text back to our users - we only use the brain of the LLM to understand users' complex intertwined questions. In our service, the LLM is used to retrieve pre-approved FCDO content, and doesn’t search the wider internet for answers. This eliminates the chances of it answering the user with “hallucinated” content (answers that could be bias and harmful).

We also monitor the AI to validate that it’s making the right decision on which knowledge articles to show users. Wherever it’s making the incorrect decision, we’ll know and we can improve this.

There are unintended consequences

We have considered the impact of our work from the start. We ran multiple consequence scanning workshops where we gathered as a team and wider stakeholders to consider the intended and unintended consequences of what we are doing. An example of one of our unintended consequences is a decrease in job satisfaction for the FCDO member of staff who needs to spend an hour per week validating the decisions a computer has made, which of course, can be very unsatisfying. However, we feel this is hugely outweighed by the increase in job satisfaction of not having to spend an entire shift manually selecting templates to send back to British nationals.

Overall, conducting user research for an AI enabled service has shown me that we’re at the start of a new digital era, we’re having to do things for the first time again, and of course it means learning through mistakes. We need to conduct research that builds on what we know, but experiments into the future. We’re starting to learn what the new AI era could look like, and most importantly what it needs to be like for our users - incremental in its approach, and trustworthy.