多年之前,就有为不带字幕的游戏视频配上字幕的想法。
: T4 m: M6 D; @ 但是当时条件不成熟,但是目前来看,条件似乎成熟了+ N! g# Y2 D& T4 t3 A6 B! s
8 { O4 U+ G$ L {/ |( |
Whisper是openAI的开源语音识别软件。
v6 q \1 g2 N, n8 I 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。. |. D5 j0 x+ x' X3 T
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。$ F/ o3 L: J/ l# x2 m' y
- b+ p$ B+ {, c, r& B 地址如下
( t# z, Y0 d8 }$ ]1 Z. S https://github.com/sandrohanea/whisper.net
" D4 \# a2 b* Z" v9 c' L
* n+ ^$ ~/ L: N, h0 q4 I7 [. ~) [2 [5 l' D; G$ y
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
0 g/ E9 N* _0 z3 T* k) n
1 R2 _9 s9 R5 } t/ X 编译好之后,有几个注意点
6 g6 S3 _. v2 g+ o
/ c' p: e8 P6 u( F( b <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
# [- q! R+ R% A9 l 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
, n/ H- d" n8 x4 V' m0 o# ~; S, g, u1 l; V
<1>Language要设定为"english"。6 v5 c" E; J3 d2 g
. J: P3 {; I3 Y4 t' J# y% R" }- /* var builder = factory.CreateBuilder()
6 k6 k! c. [& s P L4 z - .WithLanguage(opt.Language);*/
/ t2 I" _/ X! j - var builder = factory.CreateBuilder()
% `2 @2 ]: N- B+ ^, y. Q - .WithLanguage("english");
复制代码
: V4 M/ c! w6 y5 ^( U! W& c <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
, {8 H8 A" {$ W6 r- F
( g& b* I! C( U <3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。5 H$ Y8 A& ^# v+ j0 L, G! z
(遍历某个目录中的所有文件)
5 l0 _2 [4 G( Y& b5 P6 Y0 f
6 A! g2 B+ f/ }2 M& v( l' i/ | <4>输出的文件,需要稍加整理,以符合srt格式! }; a. b4 E( y% G' p' e7 ~
& _: K) v! _8 r' N9 b8 d0 s 以下是一个Wav文件的控制台输出(幽魂开场动画)
) _, D- A+ D8 ~8 h' }. `! }5 m& T* m
- / B7 x! p" G- `% ?" _+ B4 E1 M) {
- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'" R- J C5 M: w4 W+ F2 A" q
- whisper_model_load: loading model, I$ K8 ^& V; P* K7 O( z
- whisper_model_load: n_vocab = 518651 H/ w4 |. k. i5 R
- whisper_model_load: n_audio_ctx = 1500
3 J2 a5 [* J6 X/ h3 Q6 {, x - whisper_model_load: n_audio_state = 1280
; c. o S1 w3 d* l - whisper_model_load: n_audio_head = 205 J ^& C U, i8 P
- whisper_model_load: n_audio_layer = 32
3 z j* W0 b0 [" |. n - whisper_model_load: n_text_ctx = 448* A5 z* H4 C) T, _
- whisper_model_load: n_text_state = 12806 v( {& n: g8 Z# K( h
- whisper_model_load: n_text_head = 20
" Q, w0 a' l8 {) t - whisper_model_load: n_text_layer = 322 z$ }3 W- {2 N% X c( R+ G
- whisper_model_load: n_mels = 80
# F0 f+ v- K" B! R' ^' X; R - whisper_model_load: ftype = 1: Z n* b5 u0 G* }% t
- whisper_model_load: qntvr = 0; h0 G+ Z" w/ Y7 d3 P/ t% W
- whisper_model_load: type = 5
+ k# k% n7 B5 g Q - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder); Z" ~& b. e+ b9 @3 M
- whisper_model_load: adding 1608 extra tokens k) b8 d" a2 I' Z2 t; N: ?
- whisper_model_load: model ctx = 2951.27 MB+ c% u# y, C' X7 l
- whisper_model_load: model size = 2950.66 MB
. D2 Z. F7 h8 w3 R1 L - whisper_init_state: kv self size = 70.00 MB% V/ }' D k/ v' [& ~9 w( s
- whisper_init_state: kv cross size = 234.38 MB( G. f, A' W5 O. d) c$ t! I
- New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
& [, @( e0 [, p: N& J% k - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)/ [5 p# Q5 j) Y( h2 @
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping). B" r M' v! K' H: T8 D% U6 h
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
- p& v8 Q& w4 Z4 I+ Q' _( m s7 { - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)- O% o" E) k& R, C. {
- New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)% y! f& B, A9 s/ S3 J
- New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching). p ?1 r) D, Y* g) M8 A) L
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)$ \6 r. s4 S. k' E( y* `
- New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
2 E9 z- E3 s- P( g6 ? - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)! _0 G# S, V5 N- Y4 u
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)
, q ^% v. t7 A& ^+ r4 Z0 u - New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)- k: ~! j' S# u& `- a$ H. @7 K
- New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
% l5 T/ [, R8 }: D" f - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing). z7 I6 f' y/ C4 a! @' D
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.! X" S' v1 |8 [( b# G* S/ Q
- New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
+ [6 ?) L7 z3 j( t; t& }: s - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?
. N7 \/ i5 a" I6 e - New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
J" Y: R. ?; H3 | J' U4 }& q - New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.
( {. c; t/ |5 N: W - New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.
5 }; w2 {0 n5 x1 q - New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
: }4 e, |) {/ I i7 O0 ^ - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.
2 W" O0 R. t I x - New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.. q7 R2 o- l( i' L( e5 ^
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music), D% q1 R1 t% T+ r
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
% x* {; s+ D+ f+ K9 C6 o( } - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)- q" M! |; }5 `: ^9 [+ h* x7 R
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
3 j1 J- a! I! L/ U5 h6 S# C - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
% \" `# B7 g4 I2 J4 h3 R# S5 } - New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)9 O3 H8 P9 Y: z- ?. p4 K
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
' a0 M# C3 i5 w6 D, n; E( G -
复制代码 ' \' r8 G$ O* h- O& e
# A- j% x4 q! U$ p |