Introduction
We introduce TVB-HKSL-News, a large-vocabulary signer-dependent HKSL dataset for continuous sign language recognition (SLR) and sign language translation (SLT). The dataset is collected from TVB News Report with Sign Language TV program. It consists of 16.07 hours of sign videos of two signers with a vocabulary of 7,160 glosses for SLR and 2,850 Chinese characters (or 18K Chinese words) for SLT.
Dataset Statistics
name | Duration (h) | Samples | SLR vocab | SLR tokens | SLR #singletons | SLT vocab | SLT tokens | SLT #singletons | Duration (%) | Samples (%) | SLR Tokens (%) | SLT Tokens (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | 16.07 | 7,160 | 6,515 | 121,817 | 2,820 | 2,850 | 232,310 | 462 | 100 | 100 | 100 | 100 |
- Signer1 | 11.66 | 5,350 | 5,476 | 88,788 | 2,362 | 2,695 | 173,027 | 450 | 72.59 | 74.72 | 72.89 | 74.48 |
- Signer2 | 4.41 | 1,810 | 3,280 | 33,029 | 1,479 | 2,100 | 59,283 | 465 | 27.41 | 25.28 | 27.11 | 25.52 |
Train | 14.71 | 6,516 | 6,515 | 111,204 | 2,925 | 2,816 | 212,108 | 466 | 91.53 | 91.01 | 91.29 | 91.3 |
- Signer1 | 10.66 | 4,863 | 5,449 | 80,972 | 2,430 | 2,666 | 157,800 | 463 | 66.34 | 67.92 | 66.47 | 67.93 |
- Signer2 | 4.05 | 1,653 | 3,234 | 30,232 | 1,520 | 2,069 | 54,308 | 468 | 25.18 | 23.09 | 24.82 | 23.38 |
Dev | 0.67 | 322 | 1,091 | 5,222 | 471 | 1,279 | 10,003 | 395 | 4.18 | 4.5 | 4.29 | 4.31 |
- Signer1 | 0.49 | 241 | 946 | 3,790 | 430 | 1,178 | 7,508 | 382 | 3.03 | 3.37 | 3.11 | 3.23 |
- Signer2 | 0.18 | 81 | 503 | 1,432 | 264 | 710 | 2,495 | 286 | 1.15 | 1.13 | 1.18 | 1.07 |
Test | 0.69 | 322 | 1,130 | 5,391 | 518 | 1,276 | 10,199 | 399 | 4.29 | 4.5 | 4.43 | 4.39 |
- Signer1 | 0.52 | 246 | 1,001 | 4,026 | 503 | 1,195 | 7,719 | 410 | 3.21 | 3.44 | 3.3 | 3.32 |
- Signer2 | 0.17 | 76 | 476 | 1,365 | 240 | 711 | 2,480 | 299 | 1.08 | 1.06 | 1.12 | 1.07 |
Example Data
Signer 1
Video | SLR Glosses | SLT Characters |
---|---|---|
肺 發炎 學校 分 一 房 最 多 四 學生 (Lung Inflamed School One Room Maximum Four Students) |
因應疫情將學生分流每個課室只安排最多四個學生 (In response to the epidemic situation, students will be divided into groups and only a maximum of four students will be arranged in each classroom) |
|
消息 說 他 香港 身份證 沒 但是 美國 護照 (News Say His Hong Kong ID Not But US Passport) |
消息指他無香港身份證但持有美國護照 (It is reported that he does not have a Hong Kong identity card but holds a US passport) |
|
兩 日 內 發生 一 二 三 有關 爆炸 爆炸 爆炸 情況 (Two Days In Happen One Two Three Related Bomb Bomb Bomb Situation) |
兩日內發生三宗涉及炸彈的案件 (Three cases involving bombs in two days) |
Signer 2
Video | SLR Glosses | SLT Characters |
---|---|---|
大埔 路 沙 田 坡 泥 傾瀉 因為 某些 地方 下雨 下雨 多 (Tai Po Highway Sha Tin Slope Mud Pour Because Some Place Rain Rain Much) |
大埔公路沙田段有護土牆塌下多處好似瀑布一樣 (On the Sha Tin section of Tai Po Highway, the retaining wall has collapsed in many places like a waterfall) |
|
國 衞生 健康 委員 說 湖 北 加 七 二 現在 全部 二 百 七 十 廣 東 一 四
仍然 北京 五 上海 二 (National Health Health Committee Say Hubei Add Seven Two Now Overall Two Hundred Seven Ten Guangdong One Four Still Beijing Five Shanghai Two) |
國家衛健委公布湖北省新增 72 宗個案現時當地總數 270 宗廣東省維持 14
宗首都北京 5 宗上海 2 宗 (The National Health and Medical Commission announced that there were 72 new cases in Hubei Province. The current local total is 270. Guangdong Province maintains 14 cases in the capital Beijing, 5 cases in Shanghai and 2 cases.) |
|
文 考試 日期 延後 學生 說 讀書 計劃 影響 混亂 某些 機構 說 語言 考試 這
取消 變 不公平 (Diploma Exam Date Postpone Student Say Study Plan Affect Disrupt Some Institution Say Language Exam Cancel Unfair) |
文憑試延期有考生稱打亂溫習計劃有升學機構擔心一旦取消語文科口試會影響公平性
(HKDSE Postponement. Some candidates said it disrupted study plans. Some education institutions worried that cancellation of language oral examinations would affect fairness) |
Baseline Results
We have obtained baseline results on the TVB-HKSL-News dataset with several popular SOTA SLR and SLT methods.
Continuous Sign Language Recognition
Method | Modality | WER (%) Dev | WER (%) Test |
---|---|---|---|
S3D [1,3] | Keypoints | 45.73 | 44.56 |
S3D [1,3] | Video | 39.59 | 38.63 |
VLT [2] | Video | 35.89 | 36.18 |
C2SLR [2] | Video+Keypoints | 35.43 | 35.78 |
TwoStream-SLR [3] | Video+Keypoints | 34.52 | 34.08 |
Sign Language Translation
Method | Rouge (Dev) | BLEU-1 (Dev) | BLEU-2 (Dev) | BLEU-3 (Dev) | BLEU-4 (Dev) | Rouge (Test) | BLEU-1 (Test) | BLEU-2 (Test) | BLEU-3 (Test) | BLEU-4 (Test) |
---|---|---|---|---|---|---|---|---|---|---|
S3D (video) [1,3] | 18.64 | 21.98 | 15.17 | 11.18 | 8.79 | 21.61 | 25.39 | 18.59 | 14.57 | 12.10 |
S3D (keypoints) [1,3] | 15.65 | 18.18 | 11.58 | 8.09 | 6.22 | 16.42 | 19.93 | 13.72 | 10.41 | 8.48 |
TwoStream-SLT [3] | 38.12 | 43.22 | 33.44 | 26.04 | 21.00 | 39.80 | 44.68 | 35.27 | 28.29 | 23.58 |
Download
To download the dataset, please fill in the request form and send to us via email. After receiving your request, we will send you the download link and password to access the dataset.
References
[1]: Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In ECCV, pages 305–321.
[2]: Ronglai Zuo and Brian Mak. 2022a. C2SLR: Consistency-enhanced continuous sign language recognition. In CVPR, pages 5131–5140.
[3]: Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, and Brian Mak. 2022b. Two-stream network for sign language recognition and translation. In NeurIPS.