Risks of Using Speech Recognition for Transcription of Confidential Data

Automated speech recognition (ASR) is a technology that can convert spoken words into text. However, automated speech recognition poses significant risks for the privacy and security of the data it processes. Especially, when it involves confidential or sensitive information.

Data Breaches by Major Tech Companies

In recent years, several major tech companies have faced public backlash and legal consequences for violating the privacy of their users through their automated speech recognition services.

It was reported by The Guardian that “Google workers can listen to what is said on its AI home devices“. This became apparent after certain recordings were leaked.

Again, The Guardian reported that Amazon could well be invading privacy in its article: ‘Alexa, are you invading my privacy?‘. In the US, Amazon is being sued over child recordings.

And there have been further press reports highlighting similar issues:

Google was fined €50 million by France’s data protection watchdog for failing to provide adequate information and consent to its users about how their voice data was collected and used by its ASR service.
Apple suspended its practice of using human contractors to listen to and grade Siri recordings after a whistleblower revealed that they had access to private conversations, such as medical information, drug deals, and sexual encounters.
Amazon was hit with an $886 million fine by Luxembourg’s data protection authority for allegedly breaking the EU’s General Data Protection Regulation (GDPR) by processing personal data without proper consent. The fine was partly related to its ASR service Alexa, which was found to store voice recordings indefinitely and share them with third parties.

These incidents show that ASR services are not always transparent about how they handle the voice data they collect from their users. They also expose the users to potential data breaches, identity theft, fraud, or blackmail.

Why Speech Recognition Should Not Be Used by the Legal or Medical Sectors

The legal and medical sectors deal with highly confidential and sensitive data. This requires strict protection and compliance with various laws and regulations. Using automated speech recognition for transcription purposes in these sectors can pose serious risks.

For instance, automated speech recognition can introduce errors or inaccuracies in the transcription. It can affect the quality and validity of the documents. A study by Hodgson and Coiera found that ASR had an average error rate of 1.3 per document, with 15% of them being clinically significant. Another study by Topaz et al found that physician-created notes using ASR had four times the rate of errors compared to non-ASR notes.

Moreover, automated speech recognition can compromise the confidentiality and security of the data. This can occur when storing it on cloud servers or sharing it with third parties without proper consent or encryption. Obviously, this can violate the privacy rights of the clients or patients and expose them to legal liabilities. For example, a lawyer who uses automated speech recognition to transcribe a client’s testimony may inadvertently disclose privileged information to an unauthorised party or a malicious actor. Similarly, a doctor who uses ASR to transcribe a patient’s medical history may unintentionally reveal personal health information to an unauthorised third party.

The GDPR Issues with Using Automated Speech Recognition for Transcription

The GDPR is a regulation that aims to protect the personal data of individuals in the EU. It gives them more control over how their data is collected, processed, stored, and shared. GDPR considers voice as personally identifiable information (PII) as voice recordings provide information on gender, ethnic origin, or potential diseases. Therefore, any entity that uses automated speech recognition for transcription purposes must comply with the GDPR requirements, such as:

Obtaining explicit and informed consent from the users before collecting, processing, or storing their voice data
Providing clear and accessible information about how their voice data is used, who has access to it, and how long it is kept
Implementing appropriate technical and organisational measures to ensure the security and integrity of their voice data
Respecting the rights of the users to access, rectify, erase, restrict, or object to the processing of their voice data
Reporting any data breaches involving their voice data to the relevant authorities and users within 72 hours

Failing to comply with the GDPR can result in hefty fines up to 4% of annual global turnover or €20 million, whichever is higher. Additionally, it can damage the reputation and trust of the entity among its users and stakeholders.

Why Human Transcription is Better than Using ASR for Transcription Purposes

Human transcription is the process of converting speech into text by a human transcriber. They listen to an audio file and type what they hear. Human transcription has several advantages over automated speech recognition for transcription purposes, such as:

Higher accuracy:

Human transcribers can capture context, homophones, accents, jargon, and nuances better than ASR systems. They can also mark inaudible or unclear speech as such rather than guessing or making errors.

More flexibility:

Human transcribers can adapt to different formats, styles, standards, and preferences according to the needs and specifications of the clients. They can also handle complex or specialised content that may be beyond the scope or capability of automated speech recognition systems.

More privacy:

Human transcribers can ensure the confidentiality and security of the data they transcribe by following strict protocols and policies. These could be signing non-disclosure agreements, deleting the audio files after transcription, and using encrypted platforms or devices.

Therefore, human transcription is a more reliable and customisable. It is a more secure option than ASR for transcription purposes, especially when it involves confidential or sensitive data.

Conclusion

Speech recognition is a powerful and convenient technology that can facilitate and enhance the transcription of speech into text. However, it also poses significant risks for the privacy and security of the data it processes, especially when it involves confidential or sensitive information. Therefore, users of automated speech recognition should be aware of these risks and take appropriate measures to protect their data and comply with the relevant laws and regulations. Alternatively, users of ASR can opt for human transcription, which is a more accurate, flexible, and secure option than automated speech recogniton for transcription purposes.

About OutSec

OutSec is the UK’s leading online transcription company whose business has grown substantially since its inception in 2002. We are now one of the most successful transcription companies in the United Kingdom.

OutSec provides secure outsourced transcription services to the medical, legal, property and surveying, universities, media and interviews, advisory boards, conferences & seminars, inventories, financial, corporate, HR, recruitment and Executive Search sectors.

Accounts are free, you pay on a per-minute basis (rounded to the nearest minute) on a pay-as-you-go basis, with no contracts or minimum spend. So why not open an account today?

We also provide a boutique Virtual Personal Assistant Service, Crystal Clara, for those who require a more personal and tailored service.

Why is Dictation More Efficient than Typing?

Well, interestingly it is because we can all speak faster than we can type:

“The average person types between 38 and 40 words per minute”.

A “good rate of speech ranges between 140 -160 words per minute.”

In other words, dictation is up to four times faster than typing. Therefore, simply dictating a document is more cost-efficient, giving you more time to dedicate your efforts elsewhere in your business.