Is that you, dear? Why seniors may be worse at detecting voice-generated AI — and the scams that use it
Is that you, dear? Why seniors may be worse at detecting voice-generated AI — and the scams that use it.
Scientists suggest seniors may be less attuned to the emotion-free speak of AI. But what if it’s tied to our confusion over how younger humans speak?
Here’s my selected history of voice-generated AI scams in the last six months. If you’re a three-quarter lifer perhaps pay attention, since you’re the audio scamster’s ideal target. Not because you’re confused by new technology. Not because your open and generous nature makes you fork over thousands of dollars to strangers. And not even because — although this seems like the obvious answer — you don’t hear as well as you once did.
No. The problem with people over 60 listening to AI seems to be not what we hear, but how we hear, which changes as we age, according to scientists at Toronto’s Baycrest Centre. We concentrate to understand the words in a sentence, and miss the emotion attached to them. It may make us less attuned to the emotion-free speak of AI, goes the theory.
My own entirely unscientific theory is a bit different and has to do with our confusion not around how AI speaks but how younger humans speak. Before we get to explanations, here’s my selected history of voice-generated AI scams.
Back on March 8, centuries ago in Artificial Intelligence time, I watched CNN correspondent Donie O’Sullivan use an AI imitation of his voice to trick his parents in Ireland into thinking it was their real son. AI Donie flubbed O’Sullivan’s Irish accent, but Mom just thought he sounded a bit down and Dad chatted with AI Donie about the local football team. It was funny TV. But it boded darker things to come.
Later that same month, CBC News reported that seniors in St. John’s, N.L., were scammed out of $200,000 over three days by AI-generated voices imitating their grandchildren. A new grandparent, I suffered with them. (Although a scam call from AI Baby Stefan would go something like, “Car! Fish! Wow wow!”) The distress calls from the Newfoundland fake grandchildren, on the other hand, followed frightening and urgent scripts: they’d hit a pregnant woman, police found drugs in their car, they were in jail and needed money for a lawyer. Sometimes the AI grandchild implored their grandparent not to tell their parents.
As a woman who lost $58,000 said, “you’ll do anything for your grandchild.” Similar voice scams targeting seniors across Canada followed the same pattern. The Star reported an 81-year-old grandfather losing $100K, one of more than 50 victims in a Newmarket grandparent AI fraud, to a collective loss of almost $1 million. Seeing any of that money again is as likely as recovering a stolen bike in Toronto.
Jump to April in my selected voice-generated AI scam timeline, when Toronto’s Baycrest Hospital got in touch with me about new research that showed people 60 and older were less able to distinguish AI speech than people 30 and under. In Baycrest’s study, which screened for hearing issues and awareness of AI, researchers played sentences spoken by male and female humans — all the humans on the voice recordings were under 30, it bears noting — mixed in with male and female AI voices.
Participants were quizzed on which recordings were computer-generated and which were real. Eighty per cent of the younger people distinguished AI, compared to 20 per cent of the older people.
“That’s a big difference in perceptual sensitivity,” said Björn Herrmann, Baycrest’s research chair on auditory aging and the scientist leading the study. I was surprised older adults did so poorly, especially since back in April AI was still pretty robotic sounding, like weirdly monotone AI Donie.
“We know that as you age you speak and listen more slowly,” Herrmann explained on our call in April. I wasn’t aware of that particular infirmity, but added it to the list, between my clicking thumb and buckling knee. We also have a diminished ability to recognize emotions in speech, concentrating on the meaning of the words being spoken more than the way they’re spoken.
“So, one theory is if the older adult doesn’t hear variations in the prosodic structure or melody of the sentence,” Herrmann said — its rhythm, inflection and intonation — it makes them more likely to miss that the speaker is happy, sad or angry. Or, in the case of AI, none of those things, because it’s a robot and has no feelings.
After my call with Herrmann, I took the test. I did well recognizing AI: “THESE DAYS a CHICK-EN LEG is a RARE DISH,” emphasized almost every word. What’s important here, AI? The day, the juicy leg, its elusive rareness? I couldn’t tell, and neither could AI.
Where I stumbled was when I wasn’t sure it was a human talking. “See the cat glaring at the scared mouse” has built-in drama as a sentence but was spoken with the emotion of a housefly, the only emphasis a low downward slope on the last word, mouuuuuse. My finger paused over Human or AI until I incorrectly chose the latter.
But it wasn’t until a visit to my hairdresser last month that I began to develop my own theory about why older adults did poorly on the test.
My regular stylist was off sick, so Arden subbed in. As he dried my newly coloured hair, a very old man with grey hair and a beard almost to his waist came into the salon.
“I want to dye my hair navy blue,” said the very old man.
“We can do that,” said Arden. “But it will look black.”
The very old man asked to see colour swatches and how much it would cost, and after Arden quoted $120 (a steal for all that hair) the man left, saying, “I’ll think about it and let you know.” Arden turned his attention back to me.
“Your hair has come out so blond, Cath-rinnn,” he said, the low timbre of his voice descending into a vocal fry or voice creak on the last syllable. (Think of every sentence Jeff Bridges, an early fryer, has ever uttered, and you’ll get the sound.) I’d booked a touch-up on my dark brown hair, so I thought but didn’t say that only Arden might know why I seemed to be blond. But he had a different explanation.
Vocal fry, the way Arden spoke, started to get attention in 2011 and in the next few years became an everywhere story, when radio and TV listeners were suddenly crazed by the way younger people spoke on air. Their ire was mostly directed at women — what else is new — and their “unbearable,” “excruciating,” “annoyingly adolescent” and “very unprofessional” voices.
Those slurs are from letters to NPR about the station’s female staff. They were quoted on air by “This American Life” host Ira Glass, who said the problem was not with how the reporters spoke but with how the audience listened. A linguist from Stanford University subsequently tested NPR voices to show that listeners under 40 heard authoritative professionals talking. Listeners over 40 heard — see above. “So if people are having a problem with these reporters on the radio, what it means is they’re old,” said Glass.
Language change is generational. The quasi-British accent dubbed “Canadian Dainty” that ruled CBC airwaves in the 1950s — those deep, plummy voices of Lorne Greene and Christopher Plumber — was seen as educated, authoritative and elite. A voice designed to exclude rather than include.
“We’re moving away from the white, male, upper-class voice that’s dominated culture for decades.” I’ve called Carrie Gillon, a linguist working for the Squamish Nation in Vancouver and co-host of Vocal Fries Pod, a Canadian/American podcast that offers to teach you “how not to be an a–hole about language.” Many well-meaning people are racist and sexist around language, but if you “told them they linguistically discriminate, they’d be shocked,” she said. “My best advice is that if you’re having a reaction to someone’s voice, it is probably a you thing and not a them thing.”
Studies show language change is driven by the young, “and more specifically young women,” Gillon said. “We’re not sure why this is. Perhaps it’s because they’re the ones who spend time with children.”
Compelling evidence for female voice power is the newest vocal trend, dubbed “TikTalk.” It describes a way of talking on the TikTok app, the most democratic forum for voices of every description ever created. Instead of emphasizing the vocal fry at the end of a sentence, TikTalk puts EM-PHA-SIS on AL-most EV-ery WORD, in an upbeat, unpausing and usually female voice. It can sound so much like AI that Kat Callaghan, a TikTok voice-over professional and radio host from Kitchener, Ont., recently outed herself as a human. “It’s me, and I’m not AI!”
I called Herrmann back about my theory that older subjects did poorly on his test because they’re less attuned to the generational shift in the way people speak. “You’re thinking like a scientist,” he said. “The humans in the study were all under 30, so yes, it’s possible. Keep up the good work!”
Herrmann hoped his research could help Canadians recognize scam calls, although he noted the AI voices that have come out since are next-level. Cheap synthetic voice generators — there are thousands to choose from — can now precisely replicate our kids and grandkids using just a found sentence or two on Instagram, FaceTime and TikTok.
Or can they? I tried the Baycrest test again after I got off the phone with Herrmann. This time I noticed how the humans used both a downbeat vocal fry and an upbeat TikTalk that had made me question if they were AI the first time I took the test. But one sentence, spoken unmistakably by a human male, stood out.
“The boss ran the show with a watchful eye.” The man spoke the sentence with a touch of vocal fry, and also of irony; aren’t bosses always trying to run the show? He sounded sad about this boss’s watchfulness. Why are they always on us with their watchful eyes? Can’t bosses just trust us to do the best we can? I felt the pathos of my whole life of work in the mildly sorrowful sentence. It seemed to me to be a sentence where the words and the emotions behind the words were perfectly aligned. Something AI could never figure out because it’s a robot and has no feelings.
And if your grandchild calls begging for money like their life depends on it? Hang up and call back. Or better yet, get off the devices and social media — they’re content grabs, not connections — and whenever possible talk to each other in person. That way we’ll always know who the human is. So far.
Source: Toronto Star
Article originally appeared at: https://www.thestar.com/opinion/contributors/is-that-you-dear-why-seniors-may-be-worse-at-detecting-voice-generated-ai-and/article_3fa76c2d-63fd-50db-a395-e7c9b0731491.html
Resource:
Learn more about the grandparent scam here: https://eapon.ca/the-grandparent-scam/
Watch the Webinars recordings on how to identify and protect yourself from the scams:
Grandparent Scam: Education is your best defense
Romance Scam: Signs and Protection
Avoiding the TRAPS of Frauds and Scams