冒险解谜游戏中文网 ChinaAVG

标题: 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译 [打印本页]

作者: shane007    时间: 2023-8-28 11:48
标题: 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译
多年之前,就有为不带字幕的游戏视频配上字幕的想法。
& T4 |$ k& f  D7 s8 Y6 s 但是当时条件不成熟,但是目前来看,条件似乎成熟了% d6 s& d9 l) j. K5 G0 }7 V

, b' ]; a1 g; R0 v4 b Whisper是openAI的开源语音识别软件。
  v# J9 W" A# l4 u5 z 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
/ i( k" g/ G# h( t# d 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
" M3 l2 A7 P! v/ f9 X0 }
3 Q: z$ U4 E8 F5 ~! D* K1 ~' n 地址如下
- H; o( u" H; N1 j https://github.com/sandrohanea/whisper.net
0 T7 r: N9 m, N: |/ j0 q/ s# x2 e0 l6 x1 U+ l1 V

( W/ I7 L- O; C1 D1 @ 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。: M7 r, ~8 Y5 n- A; S% Y
+ `0 r' i- Z* J) c' S( |
编译好之后,有几个注意点3 C% f8 E+ N* h- V* B! }; v

5 k5 ^3 h% G0 e- d0 Q <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
  t5 l2 J4 @% C$ O: E# g1 o8 ]    当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 - H8 G" w$ w) `. X; p$ @9 j5 T

% |2 T+ a8 M6 p3 V! z  s: Z& D <1>Language要设定为"english"。
' u4 ~9 G0 U6 ^  b! N1 s! `# ^8 U8 H, P
  1. /*    var builder = factory.CreateBuilder()
    8 K: y9 Q9 T4 m+ e! M
  2.         .WithLanguage(opt.Language);*/8 j5 Z* u  C2 P7 f
  3.     var builder = factory.CreateBuilder()
    # M3 S* b' Z/ x# q4 B" A
  4.     .WithLanguage("english");
复制代码
$ M  q% V9 v1 N
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
3 }, N. K* J! O7 g5 G  f* g. T$ S* T) l8 L: r# \: l
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。& G" C$ o6 W  Z0 o9 V
   (遍历某个目录中的所有文件)( o/ k) T# j$ z6 j
4 @# U! _  @7 Y' b, h
<4>输出的文件,需要稍加整理,以符合srt格式
9 X* F. S6 i' c3 K0 g1 l  E* _+ j3 W6 O; }6 j% O3 @  V
   以下是一个Wav文件的控制台输出(幽魂开场动画)
: P3 F( F% }+ f) n: E/ w. u
: D: u4 Y. K% A! K4 U* \* C
  1. 2 ^$ s1 _8 \- ?  q
  2. whisper_init_from_file_no_state: loading model from 'ggml-large.bin'' }6 F  g  o* q: [/ t+ n
  3. whisper_model_load: loading model
    7 r; K7 \3 F3 |" t  b5 H
  4. whisper_model_load: n_vocab       = 51865
    8 A: d1 R5 j1 ?
  5. whisper_model_load: n_audio_ctx   = 1500% n7 q$ s% C" x& M7 B0 l4 v
  6. whisper_model_load: n_audio_state = 1280
    1 f7 Z7 e9 X. k' Y
  7. whisper_model_load: n_audio_head  = 20  @+ }: L  ]/ N
  8. whisper_model_load: n_audio_layer = 329 Y) }( \  f3 `5 O
  9. whisper_model_load: n_text_ctx    = 448
    " j% e5 D" f& q
  10. whisper_model_load: n_text_state  = 1280
    7 d; S- n8 v# }) B
  11. whisper_model_load: n_text_head   = 20; w4 ^5 X6 Z& ^# v
  12. whisper_model_load: n_text_layer  = 328 s& ?# g4 {8 Y+ Z+ S) A' M, q# A" K
  13. whisper_model_load: n_mels        = 80
    6 E' V" M2 m- K2 }
  14. whisper_model_load: ftype         = 1# L2 k* N9 x/ C! `% V
  15. whisper_model_load: qntvr         = 0
    . ]$ f) {: r, D% |$ }( |
  16. whisper_model_load: type          = 54 f& ~+ V' ~# x2 k/ p8 y. [: _
  17. whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)1 Q1 v* q* M/ T* ^! ~2 q4 }
  18. whisper_model_load: adding 1608 extra tokens  i/ b5 V0 I5 D* b1 j0 J& D  {5 E
  19. whisper_model_load: model ctx     = 2951.27 MB1 F/ H% z# D3 m* U5 d5 ]3 T
  20. whisper_model_load: model size    = 2950.66 MB( j" ~9 R) M/ E: X  ?6 c9 a4 ?7 c
  21. whisper_init_state: kv self size  =   70.00 MB
    $ H/ F# I4 H. E+ _) W6 S) f
  22. whisper_init_state: kv cross size =  234.38 MB
    2 S* T. z- e: V5 d7 E- O1 Z! i
  23. New Segment: 00:00:00 ==> 00:00:02.7600000 :  (birds chirping)  M  d4 C% @+ `4 D
  24. New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 :  (exhaling)' E# F" Q5 z0 i4 C, k
  25. New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 :  (birds chirping)9 Y$ V) A2 g6 g$ e2 Q: B- B! l
  26. New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 :  (gun firing)! V$ f) T7 X! `! k
  27. New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 :  (gun firing)" t- [. m" ~, Y, Y1 R+ ~/ h
  28. New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 :  (gun firing)
    , C0 S) I1 J: v! t: K/ C
  29. New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 :  (tires screeching)" l$ Z' b+ z1 E; F( u7 i; p  x' p
  30. New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 :  (glass shattering)) P4 N* v8 \6 z) q9 ^
  31. New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 :  (singing in foreign language)1 L" M2 @) n. j% x6 C# ]
  32. New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 :  (singing in foreign language)
    % z- ^5 }8 _/ e( `/ [
  33. New Segment: 00:01:11.5800000 ==> 00:01:17 :  (tires screeching)
    / _/ f( f! P, `% ?
  34. New Segment: 00:01:17 ==> 00:01:24.8400000 :  (singing in foreign language)1 W2 I+ ?7 n$ _/ `  Y' {( }8 J# {
  35. New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 :  (panting)
    2 @' p0 U5 W8 t: x) x( Y8 p' M- K
  36. New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 :  (gun firing)
    4 N" C' w1 _" F" Z/ K4 p- w2 `
  37. New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 :  - Adrian.
    4 M2 k# C7 `9 H& x
  38. New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 :  - Oh God.; G/ D; B5 |! \2 T
  39. New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 :  - What's the matter sweetheart?
    6 n% ^6 n* i' D# B4 |5 C( C; k, d' R0 s
  40. New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 :  Oh.
    ) y+ A6 Q3 w4 j! |4 U
  41. New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 :  - Oh it's horrible.
    * ]2 z5 u, |( q$ j" O  R
  42. New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 :  - Shh.6 g, K/ P+ K, }: A% R' P! C! T; v
  43. New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 :  It was just a bad dream.. E+ \  ~( P: ?- }
  44. New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 :  - You don't ever have to be afraid of anything.
    ) W& J% ~& e, g% Q
  45. New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 :  I'll always be here to protect you./ o4 E8 r9 U* A1 k3 b
  46. New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 :  (gentle music)
    7 i' X8 c7 N2 W1 o
  47. New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 :  (gentle music)
    $ L6 y8 y( n9 b! r, I6 Q) c* i# G
  48. New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 :  (gentle music)
    + P5 L2 U# Q# j/ z9 }
  49. New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 :  (gentle music)
    2 [4 j7 J8 f2 L- j
  50. New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 :  (gentle music)
    . }1 _5 c* Q5 `; N: R
  51. New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 :  (gentle music), y2 u4 D$ J) m
  52. New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 :  [Music]
    5 h* u  s1 p( k3 t- ~
复制代码
: N4 D5 l0 a# q4 c- x
- g, ]$ `( A4 \! U' {& V





欢迎光临 冒险解谜游戏中文网 ChinaAVG (https://chinaavg.com/) Powered by Discuz! X3.2