冒险解谜游戏中文网 ChinaAVG

标题: 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译 [打印本页]

作者: shane007    时间: 2023-8-28 11:48
标题: 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译
多年之前,就有为不带字幕的游戏视频配上字幕的想法。
$ n  I1 e  Y# N1 K7 G' y" G( }4 _ 但是当时条件不成熟,但是目前来看,条件似乎成熟了
1 ]% A) b8 g" ?1 A/ o: @! _! `; \
: a0 V; R) F0 T0 Y  H' Z Whisper是openAI的开源语音识别软件。
4 h5 F7 I! `3 z% K6 x& M 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。1 r% c0 Y3 U2 J: s, K2 H
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。5 o# H, b3 r* d% }9 ?* v

8 {0 Z% h' o% T" w* J1 ]/ } 地址如下! K# R3 F# }, o0 n
https://github.com/sandrohanea/whisper.net
9 d, v& ?$ y' y$ W& s. @5 @( K' q' O; D5 e% Y# A+ V

4 }2 F/ \2 @" C9 T8 n3 h9 o; n3 I* i 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。  d+ x  q; |# X, c+ P+ i
  o1 p( ?+ `; ]" m
编译好之后,有几个注意点3 t' N( A3 o( M+ f2 K) I
6 n; U) C5 D6 P- l
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
8 {$ V. [4 s) N; J# j6 s    当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
! u3 T8 y+ W! J) X
, i& s1 X" |0 s$ v: ~ <1>Language要设定为"english"。3 M# e( P: n! N1 m( S9 `5 J
* H/ d$ Y; `8 i6 M/ Z
  1. /*    var builder = factory.CreateBuilder()# a0 `7 A9 C$ @1 ]
  2.         .WithLanguage(opt.Language);*/& k; d0 x; T5 L# R! m  u
  3.     var builder = factory.CreateBuilder(); |0 D3 |# F( x
  4.     .WithLanguage("english");
复制代码
1 A( ]+ W* w; N1 k# X
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
4 P  Q! f; Q1 e6 a# P
* }" o1 _$ H4 s/ m8 v. r, S <3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。! p# H6 Y9 q; q7 ~+ t+ V
   (遍历某个目录中的所有文件)8 Z- O+ c( O4 g6 X3 |% e, O( n
( b( R% g9 o8 P- q/ A5 e! f9 c; T) z
<4>输出的文件,需要稍加整理,以符合srt格式/ q5 A! h' J$ x; H# h' B- T! C

3 C; O2 X+ L% M( ?: S4 S7 z   以下是一个Wav文件的控制台输出(幽魂开场动画)
( L% p6 a* }) f5 p3 W3 T& k
4 W. [/ z3 }1 r2 A* r% ]

  1. # d$ p9 B: Z' `" j  F5 R6 x5 G0 Z  v
  2. whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
    & _8 b& T# w! t- y) N9 C  o4 I3 c
  3. whisper_model_load: loading model
    ! ], G( ?# k* G! f% F4 i/ v3 ?
  4. whisper_model_load: n_vocab       = 518654 y! W$ ~! R# |
  5. whisper_model_load: n_audio_ctx   = 1500
    5 Y- X$ d( z0 i
  6. whisper_model_load: n_audio_state = 1280
    3 J3 s& I. R( V& G
  7. whisper_model_load: n_audio_head  = 205 P9 N" O! Z9 d$ `6 ?4 Z7 g
  8. whisper_model_load: n_audio_layer = 32$ ~& N: D1 f: Q5 O
  9. whisper_model_load: n_text_ctx    = 4487 w1 {, ~* a& Z! P" x0 f7 y8 M$ F
  10. whisper_model_load: n_text_state  = 1280
    ( `8 |& e: B: W8 `2 d' k1 }. m
  11. whisper_model_load: n_text_head   = 209 ^. i; S. p8 J3 m) W* v
  12. whisper_model_load: n_text_layer  = 32
    7 Y: i0 a8 n8 ?5 G7 Y& x' l9 e! g
  13. whisper_model_load: n_mels        = 801 I1 H2 l' v. D5 Y& c
  14. whisper_model_load: ftype         = 1
    7 m( K9 n2 S4 b9 o: x4 ^& k% F; o
  15. whisper_model_load: qntvr         = 0
    6 r* V% v! {6 \3 f; o' h
  16. whisper_model_load: type          = 58 R8 J' M* e( B
  17. whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)
    - B+ }9 Y) l; }; O6 P
  18. whisper_model_load: adding 1608 extra tokens/ [% k9 u+ S1 p2 L/ L2 c
  19. whisper_model_load: model ctx     = 2951.27 MB, T7 \- Z! k& N: D" g
  20. whisper_model_load: model size    = 2950.66 MB0 X5 U) K/ p" K# D% b9 S
  21. whisper_init_state: kv self size  =   70.00 MB" F4 p* e/ J/ G  Z4 H
  22. whisper_init_state: kv cross size =  234.38 MB) L  B! c3 k" Z; P) O+ ~+ I! D
  23. New Segment: 00:00:00 ==> 00:00:02.7600000 :  (birds chirping)
    4 v3 F  k& \4 R7 ?
  24. New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 :  (exhaling)/ p- f8 z1 y9 {* ~, ~2 ~
  25. New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 :  (birds chirping)( A3 J& B8 x+ Y( |! w( W
  26. New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 :  (gun firing)1 \1 |- u$ A  Q+ x+ f
  27. New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 :  (gun firing)
    0 F7 Y5 Q3 T, H! p
  28. New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 :  (gun firing): L4 z0 p0 o1 \  B1 ?: _4 J' I
  29. New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 :  (tires screeching)5 _- U6 c$ \' \! ~) `- D: }& A* @
  30. New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 :  (glass shattering)
    7 o: i$ H3 t' [" p  O! j
  31. New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 :  (singing in foreign language), \, _5 A  K/ W' l% `. F
  32. New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 :  (singing in foreign language)* e9 R3 Z6 ~6 l: g
  33. New Segment: 00:01:11.5800000 ==> 00:01:17 :  (tires screeching)
    9 A( \' q2 D' R' X
  34. New Segment: 00:01:17 ==> 00:01:24.8400000 :  (singing in foreign language)) X8 l. [- a+ }" F8 u/ m
  35. New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 :  (panting)) A+ R5 n5 N6 n* J0 z
  36. New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 :  (gun firing)
    , \+ ]! X& y8 S
  37. New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 :  - Adrian.
    1 T' A2 T3 [/ l1 [+ z) Y  {. r
  38. New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 :  - Oh God.
    1 K* E8 l" f, I) O: `) ?# P* p7 m
  39. New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 :  - What's the matter sweetheart?' |, Q6 [) ]% ]+ x- f
  40. New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 :  Oh./ ~" L. e) Z5 M8 ?. x
  41. New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 :  - Oh it's horrible.8 c+ u' V' q. `0 p2 C
  42. New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 :  - Shh.
    ( _. L7 r3 T3 L2 z) ]/ F
  43. New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 :  It was just a bad dream.; p# R2 q+ |% d
  44. New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 :  - You don't ever have to be afraid of anything.
    , ?; J- R' \& B; G4 C
  45. New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 :  I'll always be here to protect you.
    0 e' W$ ~$ u5 n. O4 V* K
  46. New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 :  (gentle music)0 I0 }, ]) r, l; b1 f5 ^! [
  47. New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 :  (gentle music)
    5 R2 t# y4 c6 A( g" ]: ~; y
  48. New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 :  (gentle music)) E) e' }. ?7 R8 M9 u$ ^5 Y
  49. New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 :  (gentle music)
    $ C: U2 W- j$ U
  50. New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 :  (gentle music)& n6 w, O+ @6 u7 R( {
  51. New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 :  (gentle music)! i6 Q+ `& r. K" k! l
  52. New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 :  [Music]
    % }- [- N4 I- e, R" k; m
复制代码

" }  T3 d* ~2 s" S$ j& g2 n/ O, c4 C' C$ X; ?, V0 }* O





欢迎光临 冒险解谜游戏中文网 ChinaAVG (https://chinaavg.com/) Powered by Discuz! X3.2