多年之前,就有为不带字幕的游戏视频配上字幕的想法。( r% e2 |5 i( F+ t0 ]
但是当时条件不成熟,但是目前来看,条件似乎成熟了7 x! ~/ r/ }/ h. ?" Y
0 ?! I, X3 G* ~ a) {3 j Whisper是openAI的开源语音识别软件。
: h: n# L" N$ `" T 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。+ w" Q# W7 h# q9 d, U0 w$ M, E
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
- i0 s0 ^. Y7 A7 J# G+ P+ r% f/ ]; r8 `
地址如下% H& E Q c' ~# @& ~. R4 U3 Z
https://github.com/sandrohanea/whisper.net3 ]; V6 \; O# m n( x6 @1 M- h' Y
9 X K7 F. Y8 Y( T* o, b! ]' ?
6 b, ?0 ?, ?$ ^7 M; q$ A 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。! C( `4 _& F1 S2 Q$ r- i/ _2 f
' p; W0 N! H3 b D
编译好之后,有几个注意点) V/ N! w1 S5 s: T1 u8 h9 A
) H; F: F; @- d2 p' V3 i/ O <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
0 c0 a+ b7 k' @ 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
) N. ~/ B2 e" Q& B% C1 O. Y" v) x* U5 P q
<1>Language要设定为"english"。
6 c4 y2 u; k* |& y. @1 g5 g7 O* Y, c+ T* \9 W% f3 k4 `
- /* var builder = factory.CreateBuilder()- U$ ^7 h9 ^' e% f
- .WithLanguage(opt.Language);*/
" R( d6 d& J) ` - var builder = factory.CreateBuilder()5 y! { {; o$ z5 g3 a& M
- .WithLanguage("english");
复制代码
1 D u# F! ] D3 @ <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。1 y" m$ c J: i# @! }) D
8 }' n- |! z$ ^& K$ i1 q( v
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
/ S/ ]4 |- L, P0 V9 b( K8 o+ P (遍历某个目录中的所有文件)2 \2 T$ g. m# z% \; H
+ H, o$ N& Y) A5 w7 u. n# F
<4>输出的文件,需要稍加整理,以符合srt格式6 h# {6 H+ h- m3 q3 K& a. g
8 [, l. g/ B1 m* E) i* }6 Z3 ^2 Q
以下是一个Wav文件的控制台输出(幽魂开场动画)
; \% W& K1 B7 F$ r* U F+ K) A1 `' ~3 q
/ ]: y& k6 {# ?- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'8 L, s' v: d' A/ P
- whisper_model_load: loading model
/ B# J" u. g$ Z - whisper_model_load: n_vocab = 51865! b" |( D% k( D( I* L, x u
- whisper_model_load: n_audio_ctx = 1500( v* W( z. T5 z8 c7 |" X
- whisper_model_load: n_audio_state = 1280) N1 I" H" ]" h4 z' s- L3 h
- whisper_model_load: n_audio_head = 209 f: J4 y+ A( p5 X/ W( G( n4 O
- whisper_model_load: n_audio_layer = 32
7 n1 W9 A, m( \$ M! Y - whisper_model_load: n_text_ctx = 448, t, k+ z7 _( }9 K" x, G2 @
- whisper_model_load: n_text_state = 1280
5 G: W ~" E& T4 H. r - whisper_model_load: n_text_head = 20
+ i# ~2 |0 a x; |+ R9 K" z - whisper_model_load: n_text_layer = 327 P6 S* L6 q, t! N6 C0 u
- whisper_model_load: n_mels = 80
# Y) w0 s9 R* E% n6 z - whisper_model_load: ftype = 16 A: C* K( l. e* w$ v
- whisper_model_load: qntvr = 0 r6 h7 `6 o! z& V5 s( p
- whisper_model_load: type = 5% B0 \" C5 z/ M$ j4 V4 y4 H: s
- whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)8 x- @7 U2 M; ~
- whisper_model_load: adding 1608 extra tokens3 A0 R( e5 o5 F. W% L- r5 _+ x+ p' b
- whisper_model_load: model ctx = 2951.27 MB
0 J m; |! F, Z, e; [ - whisper_model_load: model size = 2950.66 MB( F2 p/ {( i2 k* P+ r
- whisper_init_state: kv self size = 70.00 MB% O$ j& y0 L. y
- whisper_init_state: kv cross size = 234.38 MB
* k7 D X/ k& \/ _) o - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)3 v' R( u7 l4 V5 O
- New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)
9 }7 z) r2 A( n, M - New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)4 s: x3 D1 i% v6 O
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
l7 p5 h7 `2 l3 \ - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
- E) ]: {8 H0 ]" D- }& @' ?) k4 _. H9 e - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)3 h9 D% u! _; K7 ^% A. Y- Z1 l4 m
- New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)& d4 ]) x5 L8 N& B7 I2 Z
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
* ~8 m1 g. T! l - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)2 k& b. s6 v [
- New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)4 g" I* r# Z: b7 ^
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)6 s! H7 A4 ^( J
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
7 ~. m; S; g2 q9 b8 Y - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
- V+ z% O0 ~# m9 @ - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)' Q$ ]) @3 p% v! P$ k s
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
5 n( `9 x7 d7 j9 @ - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
" R* b; E- |" Z1 l* G - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?
! x6 d. f, f- q: t8 z9 ? - New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
+ z r5 ~6 P7 x5 ~, w6 A9 q2 u - New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.
& Q( e* Z' S* A9 w$ g( [ - New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.' V" b6 K7 T+ I, D) @3 G. b+ m+ e
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.* t1 }0 y) z. X- o2 ]/ w' _
- New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.: s3 o9 A' ? J8 M @! Y
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.8 P& g: O5 F: l( \3 K
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)! R5 o9 V9 p" e& E/ {7 z* n5 r
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
6 C1 F- [3 W+ P; _5 |6 E9 B - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)
, t a) ^+ Q" g/ K- f - New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)( A9 W0 _( C5 c+ A# m0 G
- New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music). x* c" o0 c/ c5 }2 L
- New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)
2 X% v3 |. q8 w; A, g& l - New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]! m% c1 w) y4 P& F3 Y
-
复制代码
& X3 G q! S7 W% e# l% l1 c, Z( f" ]0 c, P- D4 Y! z( }
|