设为首页收藏本站官方微博

建议 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译

[复制链接]
查看: 373|回复: 0
打印 上一主题 下一主题

[建议] 【游戏视频汉化 #1】 AI语音识别软件 Whisper编译

跳转到指定楼层
楼主
发表于 2023-8-28 11:48 | 只看该作者 回帖奖励 |正序浏览 |阅读模式

【游戏视频汉化 #1】 AI语音识别软件 Whisper编译

多年之前,就有为不带字幕的游戏视频配上字幕的想法。( {! s; }, Y0 \8 x2 `- ]/ s! e
但是当时条件不成熟,但是目前来看,条件似乎成熟了6 j" k! ~( @' M; |( y3 b
& V6 M! I+ v, G5 R! q; v0 R: I  D0 v
Whisper是openAI的开源语音识别软件。
6 F7 S7 J) o! g: E6 L6 W 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
: n0 y) j( o) R9 V4 k( ?- ~ 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。) B7 N1 u* q' c
# K# t& N7 K. b+ @
地址如下/ v# x8 u) @6 W( f9 O, Q& q
https://github.com/sandrohanea/whisper.net3 }, B  S7 \3 L* p; R4 @4 o% g% F

& z# z8 O( b" z+ C9 v+ W$ l: x3 w$ f
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。! k- g* a6 f9 M. I& f" L# E6 Q
4 A  R7 j9 H/ _. q
编译好之后,有几个注意点/ `6 y9 @' G7 {
' S9 }5 ~3 `+ z* @+ z, J: _  C1 i+ }
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。& U) @% i+ ~  ~4 f: ?/ s
    当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 / W. j! k7 H6 K9 \
; e* p! K0 h! _5 ]0 _
<1>Language要设定为"english"。! }  L9 v7 Z3 H* l; S3 e
* ^+ k% `, w) ~4 Q6 K9 K( C+ Y
  1. /*    var builder = factory.CreateBuilder()
    9 l' w$ m& ?7 |; g3 J
  2.         .WithLanguage(opt.Language);*/
    7 }0 ^" E1 i+ d8 \
  3.     var builder = factory.CreateBuilder()
    ; p8 C' n# \( \8 N4 y& W. f
  4.     .WithLanguage("english");
复制代码
: |2 v& _/ i; c
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
3 F9 `$ r. ]$ @, R$ E* ?7 Y' Y- U" q$ J1 `# D( j
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。- _( S! v' _. p  f. \( t
   (遍历某个目录中的所有文件)4 e) O3 d6 p2 E! P6 q4 D+ Z

: p5 m: s- r: N( b: W# X2 _ <4>输出的文件,需要稍加整理,以符合srt格式9 @' {+ l* E3 o' W- q" F* b. i

1 L8 U3 ?, g) K: j$ Q   以下是一个Wav文件的控制台输出(幽魂开场动画)
. c9 \) E% H) Q3 f" B% b9 n1 m* K- ]% D# L) f: s

  1. ( j; R3 V( u' R$ }4 {: V' R+ H
  2. whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
    ! P: M2 X, o, e( \
  3. whisper_model_load: loading model
    6 D/ w$ W6 F1 c) W
  4. whisper_model_load: n_vocab       = 51865
    * Q: F) C' N% w; L# C% m
  5. whisper_model_load: n_audio_ctx   = 1500+ ?8 Q4 y% E! z+ D
  6. whisper_model_load: n_audio_state = 1280% ^5 y; }+ c7 r# x
  7. whisper_model_load: n_audio_head  = 20
    % D, C& ?# q4 o: {8 i
  8. whisper_model_load: n_audio_layer = 321 S0 x7 g% u7 i: B7 u) w. V! X* c
  9. whisper_model_load: n_text_ctx    = 448* W7 A$ f3 n( N+ v
  10. whisper_model_load: n_text_state  = 1280
    & U  Z, @- z- E& ~2 w
  11. whisper_model_load: n_text_head   = 207 r; x3 |4 S$ |' L
  12. whisper_model_load: n_text_layer  = 32
    ' L  M3 ^- a2 W- k9 z9 F
  13. whisper_model_load: n_mels        = 80
    - k. ^4 n5 t: G/ R' w
  14. whisper_model_load: ftype         = 1! v9 N" Z5 c, O( @
  15. whisper_model_load: qntvr         = 0
    * p: w5 Z' ?- g2 y# H; }  h! c8 h
  16. whisper_model_load: type          = 56 P# p9 V0 Z% Z: R& n+ y; {
  17. whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)% e$ S2 ~9 U5 u4 |' O7 i* V4 T
  18. whisper_model_load: adding 1608 extra tokens
    7 C% N% [  c9 ?3 n! E
  19. whisper_model_load: model ctx     = 2951.27 MB
    6 x; k9 o6 b, F* Y" d1 Q/ H$ I4 A
  20. whisper_model_load: model size    = 2950.66 MB
    3 K& z% [, |. K/ n( _/ \  e, O
  21. whisper_init_state: kv self size  =   70.00 MB
      o  Z& y$ t& o5 d  J% u) R
  22. whisper_init_state: kv cross size =  234.38 MB+ P1 |3 t* T, I& m
  23. New Segment: 00:00:00 ==> 00:00:02.7600000 :  (birds chirping)) b9 M9 n. U9 _7 L" i
  24. New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 :  (exhaling)
    4 o1 v. r* @. ^5 b
  25. New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 :  (birds chirping)
    * W5 e, N% f4 d
  26. New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 :  (gun firing)
    + p/ u0 O* l* E0 ]- ~! U
  27. New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 :  (gun firing)
    0 {9 e4 _3 O1 C
  28. New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 :  (gun firing)9 ^- R/ M1 u! E6 U# ]1 X+ F
  29. New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 :  (tires screeching)* y# r, i+ Z) d! {5 a8 |$ J, T
  30. New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 :  (glass shattering)0 P/ ^0 m0 h6 b
  31. New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 :  (singing in foreign language)# O5 u$ K6 Q; r: D! p
  32. New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 :  (singing in foreign language)
    ' r! R% c. w2 h5 O! K
  33. New Segment: 00:01:11.5800000 ==> 00:01:17 :  (tires screeching)
    $ }3 |. G# b- ]3 V& t$ |% X
  34. New Segment: 00:01:17 ==> 00:01:24.8400000 :  (singing in foreign language)
    " K/ L" T  b, U; ~4 U& w; j
  35. New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 :  (panting)
    % j1 ?  u8 O* V: j- L
  36. New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 :  (gun firing)
    6 S- l8 j- c3 j. a6 Y
  37. New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 :  - Adrian.+ f7 j" _7 Z' r) Q7 @% f
  38. New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 :  - Oh God.0 R/ L- R6 l( ?4 M) B8 ~) z0 d2 `/ J8 D
  39. New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 :  - What's the matter sweetheart?
    1 ]: G( @) `  l- |3 i/ n$ \
  40. New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 :  Oh.
    5 A/ t% h& L2 W6 [
  41. New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 :  - Oh it's horrible.* Z& m  `! |4 A0 x% s, d% o
  42. New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 :  - Shh.+ u1 a0 O8 w6 P1 `( \) d. g$ n
  43. New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 :  It was just a bad dream.
    : F; T! N! \/ G5 Y2 V( O+ E
  44. New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 :  - You don't ever have to be afraid of anything.$ y. u2 O7 q0 K+ d# G
  45. New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 :  I'll always be here to protect you.# n" H9 `3 z9 g$ ~& x  W) X$ A7 @
  46. New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 :  (gentle music)
    # @, w% f( b! m" u' W
  47. New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 :  (gentle music)  a1 Q' p& [- Y8 v% h/ Y
  48. New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 :  (gentle music)7 u2 `7 z8 m! r. @
  49. New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 :  (gentle music)
    ) O% P3 g% Z% Y% l& w* H* y  C3 q
  50. New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 :  (gentle music)3 O; r6 H' W/ J& w% G* @
  51. New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 :  (gentle music)
    2 ?' C* C. M3 }2 l9 ]
  52. New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 :  [Music]* b) ~; }2 F" ?
复制代码
( I+ m7 j% j2 C6 a& ]

7 K  N8 o. u6 u+ ]* R- {3 `/ X6 w
分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 分享分享 很美好很美好 很差劲很差劲1
回复

使用道具 举报

高级模式
B Color Image Link Quote Code Smilies

本版积分规则

冒险解谜游戏中文网 ChinaAVG

官方微博官方微信号小黑屋 微信玩家群  

(C) ChinaAVG 2004 - 2019 All Right Reserved. Powered by Discuz! X3.2
辽ICP备11008827号 | 桂公网安备 45010702000051号

冒险,与你同在。 冒险解谜游戏中文网ChinaAVG诞生于2004年9月9日,是全球华人共同的冒险解谜类游戏家园。我们致力于提供各类冒险游戏资讯供大家学习交流。本站所有资源均不用于商业用途。

快速回复 返回顶部 返回列表