Internationalizing closed captions and subtitles

Simon Mulquin
5 min readAug 22, 2024

--

Without causing harm to people with attention or auditory impairment

Photo by Jake Hills on Unsplash

Nowadays, most people have experienced closed captions (CC) or subtitles, but you might also have noticed a drawback in their quality recently.

Closed captions are a support for people with attention or auditory impairment. They are issued by machines or humans in the original language of a video.

Subtitles are a support for people who don’t understand the language of the video. Basically, they are translated captions

I should also mention video-description which enhance the previous ones translating visual elements in a narrative text-based form. This text can be read to people who suffer visual impairment or blindness by voice actors or a machine generated voice.

Supporting people with hearing impairment

If you want to make audio content accessible to people who have difficulties to focus on sounds or hear them, it’s a common — and I believe a good — practice to add closed captions.

The issue is that nowadays, you’ll find more and more videos where closed captions or subtitles overlap open captions, the later being a text contained in the images of the video and therefore impossible to erase without extensive processing of the video’s file (which would be a serious legal and content quality mistake)

That is something you should consider before you activate a caption and subtitles feature or use a platform that is using it.

If the content you create already embeds captions, not only no one will be able to read it, but it will also prevent them from reading those issued by the platform.

Now I am not saying you shouldn’t do it at all, you would actually rather make sure the caption is accurate than trying the approximative quality of the machine generated ones and embedding these captions in the source image is a good way to deliver to multiple platforms if they don’t support meta data specifically designed to embed captions.

While it is obviously something to improve on platforms side, to not take this into account usually translates into a big piece of text impossible to read and taking a lot of place in the bottom of the screen.

Supporting multilingualism

One implementation of subtitles I see a lot is to set an option for a unique language.

The language is used to translate anything that is not recognized or configured as this very same language.

The issue with this design is that a lot of people nowadays speak at least two languages and they might actually prefer to get the content translated in a second language than their native one. It is very common for people daily speaking multiple languages to set up English as their first language in order to ensure a better translation quality or to keep their mind used to their professional language.

This reality is just not supported by the design above as we can watch a video in another language we speak and still get translated text we don’t need. This actually harms our focus on words we do understand.

This design goes against web internationalization standards but as been widely adopted.

While it improves internationalization in a lot of ways -most importantly by enabling people to even use it to learn new languages while scrolling ! -it has enshittified the experience of watching a video for very wide audience.

Meta data and smart positioning

The latest issue has already been addressed but let’s look into it as a specific problem.

It was never smart at the beginning to put both closed captions and subtitles in the bottom of the screen if we cannot make sure the video doesn’t use open captions.

While this makes complete sense as a one out of a system decision, because that’s where people are used to find it and they can standardize video production by always leaving a space for it in the bottom, it didn’t acknowledge that this very same familiarity with this design would make it bad.

First, some content creators would like to set up their own description and would indeed use this area, they might do it to ensure human issued text, which will always best auto generated description at the current state of these technologies.

Second, they might use tools to publish on multiple platforms and the same asset might expect different capacities and rules from different platforms but still be published just the same without any context, in our case, overlapping another text.

Third, they might need to locate the content where it makes more sense, you all probably experienced looking at a text in the bottom of the screen while no one speaks, just to understand later it was meant to translate something written elsewhere on the screen, this is just non sense and breaks focus, making it really hard to understand what is going on in the video.

Holistic design

Now we analyzed these issues, we can set a goal to improve audio-description and come up with a design more aware of the environment it is meant to be implemented in.

Here a the requirements:

  • A user can set a list of preferred languages (that’s web standard)
  • The subtitles are displayed in the first language of the list
  • For any language that is in the list, no subtitles are displayed
  • If anyone activates captions, they see captions where for these languages
  • They can see both subtitles and captions if they want to
  • A content creator can set meta data to help positioning the captions and subtitles or locate open captions.
  • The machine-generated captions or subtitles use these meta data to avoid overlapping.
  • The meta data are meant for interoperability with content creation or content delivery platforms.

While this is quite simple on paper, many design decisions can support these requirements and lead to diverse experiences.

The meta data can come in many different ways and support different complexity, as an exemple, do you simply locate the text on 8 axis or do you also locate this text on a time basis ?

Or might consider weird to display both captions and subtitles if there is no strong emphasis on language learning for the product being designed.

Or even decide to simply add several checkboxes to activate or not the features, or add even more checkboxes to have a per language setting, which would be very annoying to use but still enable to see either captions or subtitles for most languages, and both for one they are learning.

The interoperability part night be tricky and requires either sub processes or many organizations to partner to make a internationalization friendly specification for text layers in video content.

I can come up with a design but in the end that falls into your hands, depending the product or service your are building, these requirements support both minimalist and extensive designs as soon as there is a motivation to make it accessible regarding internationalization or auditory and focus impairment.

I am an European freelancer in data protection, internationalization and social uses of process driven technologies. I write on subjects at the intersection of information systems and general interest.

Follow me for more in depth content you wont find on Udemy 🙂

--

--

Simon Mulquin

Freelance in public interest; passionated by human sciences, territory development, internationalization and privacy; I write in french or english 🙂