Skip to content

Stream-based library for parsing and manipulating subtitle files

License

Notifications You must be signed in to change notification settings

gsantiago/subtitle.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

subtitle

Build Status Code Climate Coverage Status downloads npm

Stream-based library for parsing and manipulating subtitle files.

"Thanks for this rad package!" John-David Dalton, creator of Lodash

โœ… Stream API
โœ… Written in TypeScript
โœ… SRT support
โœ… Partial support for WebVTT (full support comming soon)
โœ… 100% code coverage
โœ… Actively maintained since 2015

Installation

npm

npm install subtitle

yarn

yarn add subtitle

Usage

This library provides some stream-based functions to work with subtitles. The following example parses a SRT file, resyncs it and outputs a VTT file:

import fs from 'fs'
import { parse, resync, stringify } from 'subtitle'

fs.createReadStream('./my-subtitles.srt')
  .pipe(parse())
  .pipe(resync(-100))
  .pipe(stringify({ format: 'WebVTT' }))
  .pipe(fs.createWriteStream('./my-subtitles.vtt'))

It also provides functions like map and filter:

import { parse, map, filter, stringify } from 'subtitle'

inputStream
  .pipe(parse())
  .pipe(
    filter(
      // strips all cues that contains "๐… "
      node => !(node.type === 'cue' && node.data.text.includes('๐… '))
    )
  )
  .pipe(
    map(node => {
      if (node.type === 'cue') {
        // convert all cues to uppercase
        node.data.text = node.data.text.toUpperCase()
      }

      return node
    })
  )
  .pipe(stringify({ format: 'WebVTT' }))
  .pipe(outputStream)

Besides the stream functions, this module also provides synchronous functions like parseSync and stringifySync. However, you should avoid them and use the stream-based functions for better performance:

import { parseSync, stringifySync } from 'subtitle'

const nodes = parseSync(srtContent)

// do something with your subtitles
// ...

const output = stringify(nodes, { format: 'WebVTT' })

API

The module exports the following functions:

parse

  • parse(): DuplexStream

It returns a Duplex stream for parsing subtitle contents (SRT or WebVTT).

import { parse } from 'subtitle'

inputStream
  .pipe(parse())
  .on('data', node => {
    console.log('parsed node:', node)
  })
  .on('error', console.error)
  .on('finish', () => console.log('parser has finished'))

Check out the Examples section for more use cases.

parseSync

  • parseSync(input: string): Node[]

NOTE: For better perfomance, consider using the stream-based parse function

It receives a string containing a SRT or VTT content and returns an array of nodes:

import { parseSync } from 'subtitle'
import fs from 'fs'

const input = fs.readFileSync('awesome-movie.srt', 'utf8')

parseSync(input)

// returns an array like this:
[
  {
    type: 'cue',
    data: {
      start: 20000, // milliseconds
      end: 24400,
      text: 'Bla Bla Bla Bla'
    }
  },
  {
    type: 'cue',
    data: {
      start: 24600,
      end: 27800,
      text: 'Bla Bla Bla Bla',
      settings: 'align:middle line:90%'
    }
  },
  // ...
]

stringify

  • stringify({ format: 'SRT' | 'vtt' }): DuplexStream

It returns a Duplex that receives parsed nodes and transmits the node formatted in SRT or WebVTT:

import { parse, stringify } from 'subtitle'

inputStream.pipe(parse()).pipe(stringify({ format: 'WebVTT' }))

Check out the Examples section for more use cases.

stringifySync

  • stringify(nodes: Node[], options: { format: 'SRT' | 'vtt }): string

NOTE: For better perfomance, consider using the stream-based stringify function

It receives an array of captions and returns a string in SRT (default), but it also supports VTT format through the options.

import { stringifySync } from 'subtitle'

stringifySync(nodes, { format: 'SRT' })
// returns a string in SRT format

stringifySync(nodes, { format: 'WebVTT' })
// returns a string in VTT format

map

  • map(callback: function): DuplexStream

A useful Duplex for manipulating parsed nodes. It works similar to the Array.map function, but for streams:

import { parse, map, stringify } from 'subtitle'

inputStream
  .pipe(parse())
  .pipe(
    map((node, index) => {
      if (node.type === 'cue') {
        node.data.text = node.data.text.toUpperCase()
      }

      return node
    })
  )
  .pipe(stringify({ format: 'SRT' }))
  .pipe(outputStream)

filter

  • filter(callback: function): DuplexStream

A useful Duplex for filtering parsed nodes. It works similar to the Array.filter function, but for streams:

import { parse, filter, stringify } from 'subtitle'

inputStream
  .pipe(parse())
  .pipe(
    filter((node, index) => {
      return !(node.type === 'cue' && node.data.text.includes('๐… '))
    })
  )
  .pipe(stringify({ format: 'SRT' }))
  .pipe(outputStream)

resync

  • resync(time: number): DuplexStream

Resync all cues from the stream:

import { parse, resync, stringify } from 'subtitle'

// Advance subtitles by 1s
readableStream
  .pipe(parse())
  .pipe(resync(1000))
  .pipe(outputStream)

// Delay 250ms
stream.pipe(resync(captions, -250))

parseTimestamp

  • parseTimestamp(timestamp: string): number

Receives a timestamp (SRT or VTT) and returns its value in milliseconds:

import { parseTimestamp } from 'subtitle'

parseTimestamp('00:00:24,400')
// => 24400

parseTimestamp('00:24.400')
// => 24400

parseTimestamps

  • parseTimestamps(timestamps: string): Timestamp

It receives a timestamps string, like 00:01:00,500 --> 00:01:10,800. It also supports VTT formats like 12:34:56,789 --> 98:76:54,321 align:middle line:90%.

import { parseTimestamps } from 'subtitle'

parseTimestamps('00:01:00,500 --> 00:01:10,800')
// => { start: 60500, end: 70800 }

parseTimestamps('12:34:56,789 --> 98:76:54,321 align:middle line:90%')
// => { start: 45296789, end: 357414321, settings: 'align:middle line:90%' }

formatTimestamp

  • formatTimestamp(timestamp: number, options?: { format: 'SRT' | 'vtt' }): string

It receives a timestamp in milliseconds and returns it formatted as SRT or VTT:

import { formatTimestamp } from 'subtitle'

formatTimestamp(142542)
// => '00:02:22,542'

formatTimestamp(142542, { format: 'WebVTT' })
// => '00:02:22.542'

Examples

Nodes

This is what a list of nodes looks like:

[
  {
    type: 'header',
    data: 'WEBVTT - Header content'
  },
  {
    type: 'cue',
    data: {
      start: 150066, // timestamp in milliseconds,
      end: 158952,
      text: 'With great power comes great responsibility'
    }
  },
  ...
]

For now, it only supports two types of node: header and cue. Soon, it will support more types like comment.

Convert SRT file to WebVTT

import fs from 'fs'
import { parse, stringify } from 'subtitle'

fs.createReadStream('./source.srt')
  .pipe(parse())
  .pipe(stringify({ format: 'WebVTT' }))
  .pipe(fs.createWriteStream('./dest.vtt'))

Extract subtitles from a video

The following example uses the rip-subtitles for extracting subtitles from a mkv video and save it as WebVTT.

import extract from 'rip-subtitles'
import { parse, stringify } from 'subtitle'

extract('video.mkv')
  .pipe(parse())
  .pipe(stringify({ format: 'WebVTT' }))
  .pipe(fs.createWriteStream('./video.vtt'))

Create subtitles

import { stringifySync } from 'subtitle'

const list = []

list.push({
  type: 'cue',
  data: {
    start: 1200,
    end: 1300,
    text: 'Something'
  }
})

stringifySync(list)

License

MIT