多年之前,就有为不带字幕的游戏视频配上字幕的想法。8 h7 T1 a3 q. E# X
但是当时条件不成熟,但是目前来看,条件似乎成熟了1 [2 W9 U, f0 i$ M, }' S# V
3 B0 Y9 ~1 h* ^7 J: g Whisper是openAI的开源语音识别软件。
6 g$ f' H+ N4 s$ s 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。9 n4 |8 j7 ~0 N+ i' N
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
' B+ ^! X+ O. l& F! S' f: q. p" C: s! \1 V
地址如下
( [2 K" d/ z/ I6 m https://github.com/sandrohanea/whisper.net
$ l" O% f. q# s8 y, Q7 W3 l! D! _: U1 E/ o% T z
" L* X0 l& P9 L! J5 s, `
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
, t; V7 m; k, e) k4 v
/ S9 v$ ^, ~& J5 ^( ]+ K. Z4 k# |$ R 编译好之后,有几个注意点
3 u4 c% f0 ~( ]" x3 C# G3 |, \4 L: [- R) U6 g
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。. R6 L* c: R! m ]$ _& R
当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
@ z* G) w8 z0 H" y4 ?. r8 T* N
* q! q2 x9 U2 g+ l. Z5 B* \/ l <1>Language要设定为"english"。7 M, n! J$ u6 a8 z
1 z" @$ ]7 I. h2 P% Q4 |5 s
- /* var builder = factory.CreateBuilder()- [- M ~$ [- j. {
- .WithLanguage(opt.Language);*/
+ L6 X3 f& J. n. i - var builder = factory.CreateBuilder()/ M7 O) Q1 M' s6 H) |0 ^
- .WithLanguage("english");
复制代码
& z+ r+ m! T7 E. Z; Z( [ <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
3 Y8 H$ ^4 c+ r9 T0 d" V/ d% l6 R/ H7 w! F/ I) f
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。 m# B) \7 i* ]! q2 h. I7 E9 j
(遍历某个目录中的所有文件)
6 r. c F+ e: ?1 Z) K0 y, L+ d, ?" _0 @5 y0 E9 {
<4>输出的文件,需要稍加整理,以符合srt格式' s0 Q" e9 Q! }
$ _* w' ~) r/ H 以下是一个Wav文件的控制台输出(幽魂开场动画)
& X( H6 X- G( F+ e4 \4 A2 A) m3 p- Q: @6 A V( Y# e
* `. ?8 ~" P! o+ Q, G7 m- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
6 o; k$ z! h$ u7 b( q" y - whisper_model_load: loading model
3 K- P4 Z6 y. |0 v: _* P# P6 t/ L - whisper_model_load: n_vocab = 51865
2 j/ [6 ]/ k' t8 `) }5 v* ^ - whisper_model_load: n_audio_ctx = 15000 G. k) {# _6 n' \0 V
- whisper_model_load: n_audio_state = 1280
2 C! A. ~" ?- n& |) N/ s2 h - whisper_model_load: n_audio_head = 20; Z/ S* J _; ~7 z+ V
- whisper_model_load: n_audio_layer = 32; ^2 {" R. Y8 y- B% t
- whisper_model_load: n_text_ctx = 448
/ F* e# q8 S' E* E) h - whisper_model_load: n_text_state = 1280! d0 U6 j& U6 l8 |$ b
- whisper_model_load: n_text_head = 201 \1 R6 N% Q4 U' r: I1 B8 `
- whisper_model_load: n_text_layer = 32+ z' Q% c( W8 G. l$ w" ^1 O
- whisper_model_load: n_mels = 80+ `& V. A9 G. G6 ?- O! R# B
- whisper_model_load: ftype = 1
U3 B9 I, g) H1 q" B - whisper_model_load: qntvr = 0
$ J7 [+ s( f& P6 d ^. G - whisper_model_load: type = 5
7 m) |0 F( a" x# g2 Y- M, h4 v - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)# z, w7 w, m1 s% P
- whisper_model_load: adding 1608 extra tokens
/ M8 G2 J, x+ ~ O) V - whisper_model_load: model ctx = 2951.27 MB) N+ [1 Q( E5 ^2 B
- whisper_model_load: model size = 2950.66 MB
& {- w, y7 s* w' y# N - whisper_init_state: kv self size = 70.00 MB7 w; j$ n9 H0 ]0 n0 D
- whisper_init_state: kv cross size = 234.38 MB! G$ m% N: b u1 t
- New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)8 w4 }; E$ i* i+ N( a/ S2 g# }
- New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)$ h3 i$ y. U- b& L9 r
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)
5 h/ c l/ O C8 p- ?+ c - New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
4 L0 ~: @' b0 g& J, c* n* d% Y - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)1 d, @+ t& f& o
- New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
2 I' o! C1 H& I+ E/ W* [' ]8 f - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)' W; e: n% _( N$ G1 N- r
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
/ o5 J! G! V K. O7 R* Z - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
( d$ |0 k0 y% E% C4 D2 @) S - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)
$ S2 C1 F- K' E! N. M - New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)
& z" n W. ]1 B9 T- c( F$ \0 P - New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
% K8 p! R7 y/ V7 c |5 U/ m* k - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
2 d2 q% Q9 x- e7 E: e - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)
1 ?5 V/ A! `) C - New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
) N- `' f5 O$ L+ C9 g W3 s - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
; j. r! a4 T$ ]* z; i) b - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart? V. ?4 J/ e& I( c- B$ _3 s+ B
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
7 J& ~2 E! T4 _) B' F Y - New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible., m4 G$ d) H- M/ c* ~
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.( G, C6 `- V' o3 V9 z1 p0 f
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
( c& R: L: O4 F, _# y# S2 y& u - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.; Q2 H% z! m! c" @1 c5 m5 \" ?& l
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.. U4 m3 `$ P2 n2 t& I( c
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)6 q" J2 O3 b) E/ O! a! S, `: ~% \
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)8 H1 F' g, \0 {. [& i m9 ~! i1 ?
- New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)# x0 t" X# B1 X/ `% k
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
- b, A5 g# X# X - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
4 v( t% d) M- x" E7 z1 F' N; J. g - New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)8 w- e0 Q. G( T( d0 n( c
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]9 K0 K5 e# P& Y! {' y
-
复制代码 & n& `' P4 y9 V- l
- ~& S2 G6 I* d M: J J
|