Artificial intelligence (AI) is already here and it’s in use in business and cyber security every day, including for spam filtering, malware detection, and intrusion detection and prevention systems.
The recent flurry of activity around AI is around generative AI, in particular ChatGPT and GPT-4, as these tools are becoming publicly available and easy to use.
Even in these relatively early stages of availability, generative AI is very likely to affect you and your business, so it is best to be prepared.
What is generative AI?
Generative AI processes large amounts of information to generate responses to requests and questions. While earlier versions of ChatGPT were text-based, GPT-4 can process images as well as text, it’s more sophisticated and accurate, and has stronger security controls.
This kind of tool can be very useful in business for data analysis, creating draft documents, or completing automated and routine tasks, such as; answering frequently asked questions, inventory management and lead-generation. However, there are limitations and risks to bear in mind.
How does generative AI work?
Generative AI such as ChatGPT (which we will use as an example), is a very complex software. Huge amounts of text data have been fed into it, training the software to predict the next word in a sequence, based on the input prompt it receives. With enough training, it is able to produce something that sounds as though a human might have written it.
There are three important components here:
- the training data,
- the algorithm (the prediction engine),
- and the input prompt.
An initial algorithm is set up; training data is fed in; a test prompt/request is made and the software tries to predict the next word. The trainer evaluates the predictions, makes adjustments to the algorithm and repeats the process. And just like a human learning a language, it takes many, many iterations before the tool ‘learns’ how to come up with good results. As the model has improved over time, the results sound more and more human.
What are the issues around generative AI?
There are a number of general issues and potential problems to bear in mind when using generative AI, as well as security issues.
If any of the training data, the algorithm or the prompt are biased or wrong in any way, the output will also be biased, inaccurate or wrong.
The prompt questions you ask need to be well structured, too, or the prediction engine can get confused and wander off on a tangent—remember, ChatGPT, for example, is just software trying to predict the next word.
The training information it holds is not up to date; it’s not a search engine, so it can’t be relied on to provide current information. However, this kind of tool is now being integrated into search engines, and the currency of information held is likely to improve rapidly.
Within any given chat session, ChatGPT remembers what you’ve already told it, making the interaction more conversational. It also stores your email address, your IP address, the questions you ask, and the content that it generates. This data is used to improve the way it works.
As a result, there may be privacy, copyright and intellectual property issues around the use of generative AI such as ChatGPT. Remember that it creates new content based on the content it has received, so if the input was subject to copyright, then the output (images or code, for example) may not belong to you.
And if an employee uses sensitive information as part of the input prompt, there is a risk that this data could leak. In May, Bloomberg reported that Samsung have restricted the use of ChatGPT by employees following the leak of sensitive source code. Samsung are not alone in this, but other companies are embracing the use of this kind of tool to enhance their workflows.
Reducing the risk
New options and tools are appearing so fast that this post will be out of date by the time you read it. However, if you want to use generative AI securely, options for consideration include using:
- An offline version of ChatGPT – GPT-X, for example – which would keep all chat data on your device. If you’ve secured your device, your data should remain confidential. This option would use the training data already provided to ChatGPT, so issues of copyright and IP may still be present.
- An offline version of GPT-4 – PrivateGPT, for example – would allow you to use the GPT-4 algorithm using your own data, offline. Again, by keeping your data on your own secured device, it reduces the risk of data leakage while providing better control over the training data. For some internal applications, particularly where there is a great deal of data to be interrogated, this option might be ideal.
If this kind of tool is something you might use in your business, we recommend planning for it carefully, and updating your policies accordingly. For example:
- If using an online service such as an AI chatbot, ensure that no private customer data or company confidential data, is entered.
- If using an offline version, ensure that the device on which the data is stored and processed is secured.
- Think about user access controls and account management for such tools.
- Develop policy and training on the appropriate uses for your business and awareness campaigns to communicate risks and limitations.
- Consider guidelines for avoiding discrimination and bias.
- Think about vendor management if a third-party vendor is using such AI on your behalf.
- Consider consulting a lawyer on potential issues that might arise in your particular circumstances.
Did we use ChatGPT to write this post? No, we did not, though using generative AI to create content will inevitably become commonplace. Of course, there are new tools that will check for AI generated content, just as there have been tools to identify plagiarised text for a while. The next few years are going to be interesting.
If you’d like help with writing an AI policy—or with other cyber security tasks—give the Click and Protect team a call on 0113 733 6230 and let us know how we can help.