Transcription Format

Overview

This guide provides detailed instructions on the proper formatting of transcriptions. It covers timestamping, text formatting, handling multiple speakers, segmentation, and redaction of sensitive information.

Format Guidelines

Start and Stop Timestamps

Each transcript should be segmented into intervals of no more than 15 seconds.
Include timestamps marking the start and stop times in H:MM:SS.m format.
Ensure timestamps correspond accurately to the audio file.

Text Formatting

Transcribe text in easily readable, properly capitalized, and punctuated format.
Avoid unnecessary line breaks.
Follow capitalization and punctuation standards as outlined in the respective sections.

Audio File Linking

Each transcript segment should be linked to the respective source audio file.
Multiple segments may be associated with the same audio file.

Speaker Identification (Optional)

If multiple speakers are present, include a speaker field.
If only one speaker is present, this field may be omitted.

Role Annotation (Optional)

Annotate speaker roles for clarity (e.g., M for medical professional, C for caller).
Use abbreviations or full role names consistently.

Example JSON Format

[
   {         
    "start": "0:00:00.4",         
    "stop": "0:00:09.7",         
    "text": "The doctor's office, John speaking. How can I help you?",         
    "audio_file": "file_name.wav",         
    "speaker": 1,         
    "role": "M"     
    },
    {         
     "start": "0:00:11.1",         
     "stop": "0:00:17.3",         
     "text": "Hi, my name is Jane, I would like to make an appointment.",         
     "audio_file": "file_name.wav",         
      "speaker": 2,         
      "role": "C"     
} ]

What to Transcribe and What to Leave Out

General Transcription Rules

Aim for coherent and grammatically correct sentences.
Use Corti’s seed transcript as guidance.
Correct mispronunciations and minor grammatical errors to enhance readability.
Exclude non-word fillers (um, uh, hmm) unless essential for understanding.

Examples

What was said:

can i get a pa-palacetamol thanks you

What should be transcribed:

Can I get a paracetamol? Thank you.

What was said:

uh i don't know hmm so i mean he just left

What could be transcribed:

I don't know, I mean, he just left.

Capitalization and Punctuation

Rules

Follow standard capitalization and punctuation rules.
Use common punctuation marks such as periods, commas, question marks, and quotation marks.
Avoid complex structures like semicolons; use simpler sentence separation instead.

Examples

Correct:

I asked him, "How are you doing?" and he answered, "I'm doing good."

Correct list formatting:

Please buy the following: eggs, milk, flour, and yeast.

Numbers

When to Use Numerals

Addresses: I live at Station Road 3.
Measurements, ages, amounts: He is 9 years old.
Proper names and identifiers: I’m going to the 7-Eleven store.

When to Spell Out Numbers

Numbers nine and below should be spelled out.
Use numerals for 10 and above.
Ordinal numbers: I came in second place. vs. I came in 39th place.

Example Formatting for Numbers

What was said:

his number is five hundred five hundred one two three four

What should be transcribed:

His number is (500) 500-1234.

Redaction of Sensitive Information

Replace sensitive data with curly bracket notation (e.g., {phone}).
Keep tags short and consistent to minimize errors.

Example

What was said:

her social security number is one two three four five six one two three four

What should be transcribed:

Her social security number is {CPR}.

Segmentation Guidelines

Best Practices

Segment by full sentences whenever possible.
If a sentence is too long, break it into smaller segments.
End segments at natural pauses to avoid sentence truncation.
Include slight padding (up to 0.5 seconds) to prevent audio cropping.
If multiple speakers overlap, create separate overlapping segments.

Example of Overlapping Speakers

...

     {
      "start": "0:01:02.3",
      "stop": "0:01:04.5",
      "text": "Well, I told him to...",
      "audio_file": "file_name.wav",
      "speaker": 2,
      "role": "C"
      },
      {
      "start": "0:01:04.1",
      "stop": "0:01:08.2",
      "text": "Hold on a minute! Who are you talking about?",                                                   "audio_file": "file_name.wav",
      "speaker": 1,        
      "role": "M"
      },
...

By following these guidelines, transcribers can ensure high-quality, accurate, and readable transcripts for Intercom use.

Tuning Automatic Speech Recognition Models