多年之前,就有为不带字幕的游戏视频配上字幕的想法。3 ]+ p( W0 f& o6 x% P% T r
但是当时条件不成熟,但是目前来看,条件似乎成熟了
! I d }- o' K+ d/ A- i$ k8 [1 M7 f, L' F) T/ f* z/ w
Whisper是openAI的开源语音识别软件。
" T# f% |) i! B4 |% M 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
! X1 `' U. h) C* m4 `! w 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
5 N y3 C( X" m# h
5 k; [. `7 a Z! o- ^9 y, V 地址如下$ h! N7 @" B) ]
https://github.com/sandrohanea/whisper.net
( J3 o* [) ^2 F+ `8 W2 i$ a# k
w4 M5 S$ ~6 b) c+ B7 t
- K4 Q: i3 e1 j 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。' H$ E* i- U8 `! J5 e0 A
( I" s9 [! ]' g# X* x0 b 编译好之后,有几个注意点
- s' E- l" [! m6 T
, ^$ _6 ?6 ]) e% j/ p" _ <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。 ]# F( V0 ]) A) w$ R5 [6 W3 @
当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 ; F( n0 \% K( o- l
( Y8 j* H! Z& W! P& b( `. W9 R
<1>Language要设定为"english"。, d" Z* s8 z6 {2 }' ^
6 d( E. }" Y/ C% K+ ?( ~- /* var builder = factory.CreateBuilder()
2 @4 o- @. d |3 v# }/ n* a, }* ^ - .WithLanguage(opt.Language);*/ Q' x _8 E0 {; \. N6 C. E6 k" p; G
- var builder = factory.CreateBuilder()6 g7 r( R* ]6 W! o& M: X
- .WithLanguage("english");
复制代码
# Y2 S& z1 M* H2 a, s5 d <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
- Q0 K! m4 u* B. A/ w$ |0 N2 p+ j- S4 I: R( K
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
7 {5 q6 A% e) A6 \ (遍历某个目录中的所有文件)
- Q3 _" \% T/ T/ m9 d% ~, C6 A. l6 P" r9 [' }8 D
<4>输出的文件,需要稍加整理,以符合srt格式
d, [% S/ H( N! Y0 Y; C7 f7 p( l& Q# X0 M0 k
以下是一个Wav文件的控制台输出(幽魂开场动画)
# A4 G8 X' Z6 E3 y- Q: V% O# \: f4 `+ M
- ! c5 `4 }! U1 z' U6 W
- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
/ X* k e) p w. P$ f1 ~ - whisper_model_load: loading model
9 g- r1 `. c8 {: j - whisper_model_load: n_vocab = 51865* f2 ~4 e2 e$ B% t
- whisper_model_load: n_audio_ctx = 1500
8 a! e; h5 k6 W - whisper_model_load: n_audio_state = 1280
7 b0 D0 v( b: j, N& @3 L - whisper_model_load: n_audio_head = 20 @- M# k5 L; M9 p
- whisper_model_load: n_audio_layer = 32
, X! d1 G9 s- E% B- r - whisper_model_load: n_text_ctx = 448
- |$ [( H# l' @0 N - whisper_model_load: n_text_state = 1280% ^! k! a/ c# v6 D0 `8 y5 X, L
- whisper_model_load: n_text_head = 20
! g& r! r* w6 ^; N- k - whisper_model_load: n_text_layer = 32
' ~2 U% n- B: l - whisper_model_load: n_mels = 80
$ M+ e' c c4 f" o. T - whisper_model_load: ftype = 1
A3 y- j- k: i* J - whisper_model_load: qntvr = 0
+ v8 P9 S" R! g0 P' _" U - whisper_model_load: type = 53 {6 v9 m7 M( ?7 y
- whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)# q6 U. Q9 i6 k0 {. {
- whisper_model_load: adding 1608 extra tokens6 |5 a1 w' k% z. r: r3 A
- whisper_model_load: model ctx = 2951.27 MB
( F/ C/ m8 C1 U) d. v" Z - whisper_model_load: model size = 2950.66 MB
) V4 ^) ~8 Q5 f8 K% D* J. m - whisper_init_state: kv self size = 70.00 MB
- ]" C3 @3 m* p9 c - whisper_init_state: kv cross size = 234.38 MB
9 a9 v. R/ u6 @' f4 I2 g - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
2 y0 I+ `! J: ^0 [+ Q - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)& i# n3 y% {* u
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)
h/ k$ K' t( b - New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)7 [6 S" F! P' H/ g- W u3 y* R! c
- New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
9 Q* E) M/ e" L" T( J7 b7 r. C8 j - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
0 U! p" Z% v+ d - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)
- ^8 a: B; @* h! n3 u3 p4 {, m* @ - New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
) c% w. O0 b/ z- r7 c, q - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)7 A- o! G' J* o* l
- New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)
' u; c2 d: P8 M( h/ u, x4 x - New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)7 ]# V2 L( a9 D; p1 z5 x* h; L! E$ z
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)+ {- ]( |3 q7 o# p8 H
- New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
% D1 j# P/ d b' K/ x - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)$ q A5 M0 ]8 Q" a
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.8 W+ y4 W! K! D' k: v' L U
- New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
) r \/ b. ?0 ?5 ? - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?: g0 A2 q1 Q- ]1 d
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
& @6 F: ?7 Z5 V; x. v; D - New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.
: [9 b7 Z$ {; h$ }+ x - New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.
8 W* U9 N3 S9 T5 l% P - New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
9 O' u% n) T0 k* i - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.: S0 _: C6 w2 ?) l. c% \' c+ G
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.2 z; C- F( @2 U% A- j/ B0 {6 X
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)' q3 Q/ P% `/ j' `3 [3 e
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)+ K4 a( J6 G: k- N; e
- New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)
6 Z; w9 ^# H9 c( Z( Q8 m - New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
0 d @. ?5 Y; y6 }9 e - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
4 h V# s: N" ~; M - New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)( [" t3 e* k; N# n
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
2 P6 z+ _# u8 s0 C+ v! {) i9 x -
复制代码 / n3 Q1 a; T% d* T
0 n! Q" q9 E! V# y4 {, R |