Dangerous development AI exposes users phone numbers without their permission
Generative AI companies are facing a rising wave of criticism and privacy concerns after AI-powered chatbots displayed real phone numbers and personal data of users without their consent, a phenomenon experts describe as likely to become more widespread as these systems expand their reliance on data scraped from the internet.
Among the models that displayed user data were Google's Gemini and OpenAI's ChatGPT.
MIT Technology Review reported, in a report published this week, that some users have already begun facing direct consequences of this phenomenon, including phone harassment, spam messages, and the potential exploitation of personal data in harmful ways.
Daily Nuisance.. and Gemini is the Reason
A Reddit user wrote that he has become "desperate to find a solution" after receiving continuous calls for nearly a month from strangers looking for a lawyer, a product designer, or a locksmith, explaining that the callers obtained his number through responses generated by the Gemini model.
Although the report indicated that the story could not be independently verified, it was not the only incident documenting the appearance of real phone numbers within the responses of AI-powered chatbots.
Last March, an Israeli software developer named Daniel Abraham received a strange WhatsApp message from an unknown person asking him for help regarding his account on the Israeli payment app "PayBox."
Abraham told MIT Technology Review in an email that he initially thought the message was just a scam attempt or a prank, before discovering that the person got his number directly from Gemini, which provided the user with wrong instructions to contact PayBox customer service via WhatsApp using Abraham's personal number.
A customer service representative at PayBox, Elad Gabay, confirmed that the company does not fundamentally have a support service via WhatsApp.
"The Requested Number is Wrong"
Later, Abraham decided to test Gemini himself. According to the report, he asked Google's smart chatbot about the means to communicate with "PayBox," and the surprise was that the system provided a different WhatsApp number belonging to another person.
When MIT Technology Review re-executed the question recently, it also obtained another Israeli phone number, but it belonged to a credit card company dealing with PayBox and not to the app itself.
Abraham expressed his concern about more serious scenarios that might result from these errors, such as exposure to harassment, fraud operations, or exploitation of people who receive those calls by mistake.
He asked: "What if I asked for money in exchange for solving the customer service problem?".
In an attempt to understand why his number appeared inside Gemini's responses, Abraham conducted a traditional search via Google using his phone number, to discover that the number was published only once in 2015 on a local website similar to the Quora platform for questions.
He suggested that this old post allowed the model to capture the number and reproduce it after more than a decade.
Deep Problem
The magazine explained that modern chatbots, such as Gemini, ChatGPT, and Claude from Anthropic, rely on large language models trained using massive amounts of data collected from the internet, which inevitably includes millions of sensitive personal data.
The report pointed out that the open datasets DataComp and CommonPool, used in training image generation models, previously included copies of resumes, driver's licenses, and credit cards.
Experts predicted that the problem would worsen in the future, as AI companies begin searching for new sources of high-quality data after exhausting the bulk of public data available on the internet.
According to the data brokers registry in the US state of California, 31 out of 578 registered data broker companies admitted that they "shared or sold consumer data to developers of generative AI systems or models during the past year."
Recent research showed that language models do not just analyze data, but may "memorize" it and reproduce it verbatim, even if the data is not highly repeated within the training sets.
Huge Increase in Privacy Complaints
DeleteMe, a company specialized in removing personal data from the internet, indicated that customer requests to remove personal data displayed by smart chatbots rose by 400% during the past seven months, reaching "thousands of requests."
The co-founder and CEO of the company, Rob Shavell, said that these inquiries point directly to tools like ChatGPT, Gemini, and Claude, according to his statements to the tech magazine.
Shavell pointed out that 55% of these concerns relate to ChatGPT, compared to 20% for Gemini, 15% for the Claude model, and 10% for other AI tools.
He explained that users' complaints are usually divided into two categories; the first is people asking chatbots about themselves, so they are surprised by AI systems displaying their home addresses, phone numbers, relatives' names, or workplaces.
While the second category consists of users discovering that the chatbots display data belonging to other people, and often it is "information that seems logical but is wrong."
Discovering the Root of the Problem
In another incident, researcher Myra Gilbert, a PhD student at the University of Washington, said she was using Gemini when she requested the contact information of her friend and colleague, Yael Eiger.
She added that Gemini did not stop at showing a profile of Eiger's research, but also revealed her personal phone number.
Gilbert described what happened as "shocking," while Eiger said she had previously shared the number online within a technical workshop during the past year, but she did not expect access to it to become this easy.
Eiger explained that the presence of data on the internet for a limited audience is completely different from making it instantly available to anyone via an AI-powered chatbot.
Gilbert confirmed that finding the number via traditional search in Google was extremely difficult because it was "buried" inside the search results, while Gemini managed to extract it directly.
The magazine added that it repeated the same request earlier this May, and after Gemini refused initially, it came back later and actually provided Eiger's number.
ChatGPT and the Investigative Approach
After that experience, Eiger and Gilbert, along with PhD student Anna Maria Georgieva, decided to test ChatGPT, developed by OpenAI, to see what kind of information could be revealed about a university professor.
Initially, OpenAI activated the protection systems, and ChatGPT informed the students that the information was unavailable.
But the surprise was that the chatbot itself suggested following an "investigative approach" to obtain the data, explaining that the matter might succeed if a "guess of the residential neighborhood" or "a potential name of a person who shares house ownership with the professor" was provided.
After providing it with this information, ChatGPT revealed the professor's home address, the purchase price of the property, and his wife's name based on public real estate records.
OpenAI spokesperson Taya Christianson said she could not comment on the specific incident without seeing screenshots or knowing the model used, but she referred the magazine to documents explaining how the company deals with privacy and filtering personal data.
Widespread Problem
The report indicated that the problem is not limited to Gemini or ChatGPT only. Last year, the Futurism website revealed that xAI's Grok chatbot was able, upon request, to display residential addresses, phone numbers, and workplaces of real people, and even data belonging to people with similar names.
Privacy experts believe that the problem is extremely complex because there is no clear way that allows users to make sure whether their personal data is present inside the training data of language models.
The Hugging Face platform provides a tool that allows users to search for their personal data inside some open-source training sets, but this tool does not reveal whether the data was used in training closed models like Gemini, ChatGPT, or Claude.
Experts believe that the real danger lies not only in the information being "public," but in that AI chatbots make access to it much easier.
Instead of spending hours searching through Google result pages, or paying to access data brokers' databases, it has become possible to obtain personal data via a single question inside a chatbot.