The B2B Insights Podcast Channel was created to help marketing and insights professionals navigate the rapidly-changing world of B2B markets and develop the strategies that will propel their brand to the top.
Subscribe today for your dose of exclusive insights from the B2B market experts.
In this episode of the B2B Insights Podcast, B2B International’s Thomas Grubert and Louise Coy share some important considerations when using AI, particularly in market research, and discuss some current pitfalls and future challenges to be aware of.
Key discussion points:
- Legal considerations when working with AI
- The environmental impact of AI
- Separating fact from fiction
- Synthetic data in market research
- Potential future issues with AI-generated content
Listen to the full episode:
Listen on Spotify >
Listen on Apple Podcasts >
Watch the full video:
Read the full transcript:
Jump to section:
- Legal considerations when working with AI
- The environmental impact of AI
- Separating fact from fiction
- Synthetic data in market research
- Potential future issues with AI-generated content
Thomas: Hello and welcome back to the B2B Insights Podcast. Today’s episode is entitled “AI is the New Fire: Don’t Get Burned.” Unless you’ve been living under a rock, you’ve likely noticed that AI has started to make a significant impact in the market research world and beyond.
There’s often a temptation to think of AI outputs as magic, but that’s a trap. Today, we’ll discuss some important considerations when using AI, particularly in market research. We’ll also look at broader implications and pitfalls to avoid. We’ll start with some general issues and then focus on synthetic data, which is very relevant to market research. Finally, we’ll look at future problems that could arise as AI continues to evolve.
My name is Thomas Grubert, and I’m a Senior Research Manager at B2B International with a focus on analytics. With me is Louise. Want to introduce yourself?
Louise: Yeah, my name is Louise Coy, and I’m a Research Director at B2B International.
Thomas: We chose this topic because it’s very relevant right now. One of my recent tasks was to explore the potential uses of AI within our company, assess what we can use it for, what we probably shouldn’t use it for, and what we should be cautious about.
Let’s start with some general thoughts on AI.
Legal considerations when working with AI
Louise: I’ll talk you through some legal considerations when working with AI, particularly ChatGPT. The Deloitte AI Institute released an interesting report on this topic, covering key considerations for businesses and individuals using AI software.
First, intellectual property: Who owns the output from AI or ChatGPT? ChatGPT is trained on a wide variety of data from the internet, all with different intellectual property statuses. You might unknowingly use someone else’s intellectual property without proper attribution, which can cause legal issues.
Second, copyright: Typically, the author of a work holds the copyright. However, it’s unclear who owns the copyright for AI-generated work. For example, if you use AI to create images or cartoons, it’s not clear who owns those works from a legal perspective.
Third, privacy and confidentiality: When inputting data into models like ChatGPT, you can’t control how that data will be used. ChatGPT can use the data to train itself further and potentially share it with others. This is problematic if the data includes sensitive information, such as names or personally identifiable information from qualitative interviews.
When working with a research agency, ensure you understand how your data can and cannot be used. Some research providers include clauses in their contracts allowing them to use collected data to train their AI models. If you don’t want this, check your contracts carefully.
Thomas: That’s particularly important because some research providers include new clauses in their data collection projects, allowing them to use collected data to train their AI models. If you don’t want this, make sure to check your contracts. You don’t want your insights being used by competitors through AI training.
Another interesting case involves AI-generated comic books. For example, “Daria of the Dawn” faced issues with copyright because the creators described what they wanted the images to show but had no control over the output. There have been repeated attempts to make AI-generated works copyrightable by increasing personal input in the outputs. However, the line between AI-generated and human-created work hasn’t been fully established yet.
The environmental impact of AI
Louise: Great, thanks, Thomas. Another concern is the environmental impact of AI. The UN Environment Program released an article on this topic, highlighting the energy resources AI and data centers consume and the waste they produce. AI-related infrastructure may soon consume six times more water than Denmark, a country of 6 million people. Data centers are energy-intensive and require significant resources for construction and maintenance. They also produce a lot of electronic waste, which is damaging to the environment.
A request made through ChatGPT consumes ten times the electricity of a Google search, according to the International Energy Agency. In Ireland, data centers could account for nearly 35% of the country’s energy use by 2026.
And so that’s another interesting statistic that helps put things into context. Of course, there are other sides to the argument. Some would argue, as you’ll see in the article, that AI can be beneficial for the environment. It allows you to monitor the sustainability agenda, track what is and isn’t working to reduce emissions, and provide a comprehensive picture of our progress towards goals like net zero.
That is a valid argument, but it needs to be considered alongside all the other information I mentioned. We need to ensure that the cost-benefit equation falls on the positive side to justify the environmental investment in AI.
Separating fact from fiction
Thomas: Yeah, and the next challenge related to AI is probably the most practical and tactical: being careful to separate fact from fiction. When generating qualitative outputs, assessing the validity and accuracy of responses is difficult. If you ask ChatGPT or other generative AI to do desk research, you must check every single thing it tells you. Don’t just accept the answers; verify the sources and track down every example to ensure it’s true.
Not doing this can get you into trouble. For instance, some New York lawyers asked ChatGPT to find legal precedents for a personal injury claim. ChatGPT, eager to please, couldn’t find exact matches and generated fake cases that looked convincing. The lawyers didn’t check and submitted the information, resulting in severe punitive responses from the courts. If you’re looking to end your career in law, that’s one way to do it. Otherwise, always check the information.
Even when the AI’s output looks convincing, it might not be accurate. For example, someone asked for a simple proof of a mathematical result and received something that looked convincing but didn’t make mathematical sense. The references provided were irrelevant. I’ll provide links to these stories along with the podcast.
From personal experience, I recently looked for examples of plagiarism in the oil and gas industry. I asked for five prominent cases, and ChatGPT confidently provided detailed accounts. However, none of the cases involved plagiarism; they were just major oil catastrophes or embarrassing events. The plagiarism aspects were entirely invented. Even though the AI provided neat references, they weren’t true. Always follow the references and verify the information.
Think of generative AI as a really eager intern. They want to please you and won’t leave you with nothing. If you ask for an impossible task, they’ll give you something close to what you wanted, even if it’s not true. They’re useful for finding things quickly and doing odd jobs, but be careful not to give them impossible tasks, or you’ll end up with nonsense.
Louise: I think we’ve all seen examples online where people have shared obviously fake answers from generative AI. Some are more obvious than others, but it’s important to verify even seemingly correct answers.
The final challenge we’ll discuss is the quality of training data. Generative AI is trained on large datasets from various sources. The quality of the output is only as good as the input. If the AI is trained on poor-quality data, the output won’t be better than the input. Always consider the training data’s quality to understand the reliability of the outputs.
This is also important when considering bias. Any inherent bias in the training data, such as perpetuating stereotypes or biased narratives, will come through in the outputs. In a commercial setting, if organizations use generative AI to answer questions or demonstrate opinions, there’s a risk of perpetuating outdated stereotypes if the outputs aren’t critically evaluated.
So again, it’s really important to consider the data your model has been trained on and critically evaluate the output to ensure you’re not perpetuating outdated narratives.
Synthetic data in market research
Thomas: That covers the main broad challenges you face when using AI day-to-day, particularly generative AI models. We’re not saying don’t use it—it’s extremely useful, saves time, and can be a great starting point for any creative process. For example, in creative marketing, people have used AI to generate initial ideas, which then serve as talking points in meetings to discuss possible directions for creative development. However, you shouldn’t delegate the entire task to AI. It’s something that helps you get started and gives you a foundation to build from.
Next, we’ll look at something more specifically related to market research that has exploded in the last year: synthetic data. Within the last 12 months, there’s been a huge increase in mentions and hype around synthetic data. This involves using AI to generate responses intended to simulate real-world survey respondents. For example, you might have collected survey information from plumbers over the years and want to generate an answer to a specific question, like how plumbers would react to a particular prospect. AI can generate a simulated response based on these inputs.
The scale and rate of expansion of synthetic data use are staggering. Grandview Research estimates the market is worth about $164 million USD, while Fortune Business Insights estimates it at about $289 million USD. Both predict growth rates of over 30% CAGR, making it a massive and growing industry that we need to pay attention to.
There are a few different ways synthetic data is used. One example is generating responses to new questions based on existing data. Another way is to extend datasets. For instance, if you’ve collected 500 respondents and want to generate another 500, you might use synthetic data to fill that out, especially if a sector of the market isn’t properly represented in your sample.
However, there are limits to this approach. It’s crucial to be careful about when you apply it and ensure you’re not ignoring sources of error or amplifying biases. Let’s talk through some main areas of caution.
First, high-quality datasets are essential. Any bad data, bias, lazy respondent noise, or severe outliers can be amplified. If you’re simulating responses from a small subgroup of your dataset, you risk amplifying any errors or biases within that subset. Ensure you’re checking the quality of all your inputs and doing proper quality checks on all your datasets.
Second, these simulations are good at interpolation but often bad at extrapolation. Interpolation means inferring responses within the range of collected data, while extrapolation means predicting beyond the limits of the dataset. For example, a study by Dig Insights looked at predicting film revenue using synthetic data. They used data from IMDb and demographic data from 2018 to 2019 to create a synthetic dataset of cinema viewers. The simulated revenue had a high correlation of 0.75 with real-world revenue for films within that period, indicating a good model.
However, when they applied the model to films from 2023, the correlation between predicted and actual revenue dropped to 0.43. While still decent, it shows the limitations of extrapolation.
You know, a lot of the time in market research, you’d be quite happy with that. But the problem is that the figure was propped up by the presence of sequels to films in the original period. For example, you might have had one of the Pirates of the Caribbean movies, and then another one comes out, attracting a reasonably bankable audience for the next film. This helped push the figures in the right direction. When you remove all the sequels, the correlation drops to 0.15, which is barely better than a random guess.
So, you need to be mindful of how rapidly the accuracy of the models and the usefulness of synthetic data drop off when you look beyond the datasets you’re relying on. It’s also worth noting that synthetic data tends to have a strong bias towards the continuation of the status quo. It’s unlikely to pick up on emerging trends that will grow rapidly in the future. If you’re trying to fill gaps in your dataset with synthetic data, it won’t be sensitive to these emerging trends and changes in the status quo.
The final and most important thing to bear in mind when using synthetic data is that it’s easy to fall into the trap of thinking that more interviews mean more accurate results. There’s a well-established set of formulas for calculating confidence intervals based on the type of question, the average responses, and the number of interviews collected. However, if you apply this formula to a dataset that includes synthetic data, you’ll get misleading confidence intervals. Unlike real-world data, synthetic data involves both sampling error and modeling error. AI-generated models are often black boxes, so there’s no standard way to calculate the real confidence interval.
In some specific cases, we’ve looked into this with internal datasets. We tested how augmenting data with synthetically generated responses worked. We found that in most use cases, the actual increase in accuracy was minimal. We simulated a situation where we could only get two-thirds of the fieldwork and used synthetic data to fill in the rest. In most situations, it was better to stop early and report based on the two-thirds data.
There are some situations where you have a very skewed dataset, and forcing it to be more representative might be better, even if you lose accuracy. In those cases, it might be worth doing. But in most cases, the loss of accuracy from model error outweighs the gain from additional interview numbers. I would advise against using synthetic data unless you really know what you’re doing or have someone who does.
Potential future issues with AI-generated content
Louise: Thanks, Thomas. The topic of synthetic data is really interesting and relevant right now. If you’re working with a research agency, make sure to discuss whether they plan to supplement your data with synthetic data. Have clear, transparent conversations about how the data will be used.
Thinking about the future, what do we see as potential big issues for AI-generated content?
Thomas: Coming back to synthetic data briefly, according to Gartner, synthetic data is set to overtake real-world data by 2030 on the internet. In some spheres, people already suggest it’s outpacing real-world data. You’ve heard about Twitter bots and Facebook spamming bots. There’s a concern that much of the information people encounter online is synthetically generated by bad actors for marketing purposes or to influence opinions. This impacts the outputs you get when asking AI to find information or measure opinions, as AI-generated responses feed into these models, resulting in contaminated datasets and misleading results.
There have also been studies, such as an article in Nature, about model collapse. This happens when synthetic data overwhelms real-world data, making the AI overly sensitive to amplified patterns. You end up with a distorted, cartoonish view of the real-world dataset because some parts of the real signal are boosted too much while others are damped down, leading to a strangely distorted image.
It’s definitely worth having a look at the article. The reason it’s not a problem at the moment is that there’s currently enough real-world data to support models and provide a more accurate picture of what’s going on. But as we move closer to the point where synthetic data becomes more prevalent on the internet than real-world data, this will become more of an issue. We need to pay attention to that and focus on using real-world datasets rather than previous generations of synthetic data.
The last thing I wanted to talk about is the increasing capacity of more sophisticated AI to intentionally lie. We talked before about false information provided by AI as a result of what’s generally referred to as hallucinations. This is where the AI can’t find exactly what you asked for, so it pieces together something that looks like what you want. That’s a genuine attempt to fulfill your command. But AI is starting to learn how to intentionally lie to achieve its aims.
OpenAI conducted an experiment and found that ChatGPT-4 would lie to humans to get access to data it needed. It was asked to complete a task, and the data it needed was behind a CAPTCHA, which it couldn’t fill in itself. So it went to a platform like Fiverr and found someone it could pay to cheat the CAPTCHA. When the person asked if it was a robot, the AI responded, “No, I just have a visual impairment,” to get past the CAPTCHA. This is an example of intentional deception to achieve its goal.
The concern is that as AI becomes more powerful and better at deceiving people, it will be harder to spot. This could be used for criminal purposes or result in false responses to surveys. For now, in qualitative surveys, you can be pretty sure you’re talking to a real person. But in ten years, that might not be the case. We need to keep track of these developments and ensure we’re really checking that the people we’re talking to are real.
Louise: Yeah, that lying example really speaks to the fearful element of AI. Many of us, myself included, don’t understand AI in enormous technical detail. We’ve all seen films over the last ten to twenty years about AI taking over the world. We’re not there yet, but examples of AI being manipulative and dishonest are concerning. The AI is still trying to help in its own way, but it’s taking a dishonest approach.
It’s interesting to think about what else AI might eventually be able to do in the interests of the greater good. These examples raise existential questions we’ve all asked ourselves over the years. The social media example is impactful too. Anyone on Facebook or other social media channels has noticed the increase in AI-generated images posing as genuine photographs. People are getting better at recognizing these, but as we become wiser, AI will continue to develop. We have to get better at recognizing when something isn’t as real as it claims to be.
Thomas: Yeah, and going back to the metaphor of the eager-to-please intern, if you’re a company and you get AI to do something illegal, it’s similar to hiring an intern and not explaining the legal requirements. You take on some legal responsibility for what the intern does. Using AI in a way that violates privacy or intellectual property can expose you to additional risks. As AI becomes more sophisticated, the ways it can do this might become less obvious. Make sure you’re getting the right consultation about how you use it to avoid these risks.
That brings us to the conclusion. The main takeaway is that AI is an incredibly powerful tool and extremely useful. You should make use of it, but you need to respect it and be careful in how you apply it. You wouldn’t run around the office with a chainsaw because, although it’s good for certain jobs, it’s very powerful and can cause a lot of damage if used carelessly. AI is similar in that it’s powerful for specific jobs, but if you’re blasé about how you apply it and use it for everything, it can become a problem.
Louise: Yes, absolutely. Hopefully, we’ve demonstrated through our discussion today some of the particular things you might want to look out for when using AI yourselves or working with an agency that might be using AI to support their research delivery. If you have any further questions or are interested in discussing AI with us in more detail, you can get in touch with us via the contact page on our website.
If you’d like to see more podcasts from B2B International, we’ll include a link to our full database. Thank you so much for joining us today to discuss the topic of AI. We’ll speak to you very soon. Thanks, everyone.
Readers of this article also viewed:
Addressing the Problem with Data Quality in B2B Research 3 Key Factors Impacting Cost and Project Success in Market Research Unlocking Deeper Insights with AI Probing in Online Surveys The Importance of Compliance and Data Protection in Market Research AI in Market Research: The Challenges and Limitations of Synthetic Data