多年之前,就有为不带字幕的游戏视频配上字幕的想法。
- }& b u C" Q 但是当时条件不成熟,但是目前来看,条件似乎成熟了/ V; N5 l# K8 {" l
- n! W: w% B: `& N! e1 G
Whisper是openAI的开源语音识别软件。
- @% Q2 o$ t' o' S ~ 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
1 ~% E4 s; V8 A0 ?2 l1 Q 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。3 q* K0 a f6 z2 q2 U
h) s% ?4 j9 x' e
地址如下
0 d# C; D1 r3 G) i, |5 a https://github.com/sandrohanea/whisper.net0 u6 l7 Q" o6 t4 c; \
7 ~7 f- s0 u! c$ r1 L
" J4 P+ X) B/ B* l/ N 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。; x6 Y2 t6 r2 i/ M' a" J: r
2 D, h4 O# Q% W; `0 L, E# p 编译好之后,有几个注意点
4 S! G, E% w% Y( y1 a8 z- Z9 J# N7 _- Y) r4 Q% g
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
# k/ k- [' L E2 | 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 7 O( @, y- g0 ]- X( z
% W) p P5 m0 R# P7 `9 j2 r <1>Language要设定为"english"。3 K7 |8 L& z3 d& c, O" x
V5 s& }7 t, y" O& f: p
- /* var builder = factory.CreateBuilder()" }5 J, J {0 c5 q& Q
- .WithLanguage(opt.Language);*/
S8 H* v; [4 m6 m1 j - var builder = factory.CreateBuilder()
5 n4 u# w4 @* |- y - .WithLanguage("english");
复制代码 : q! w& T0 V" E& L% x
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
/ D3 g# @$ m4 L+ T' `
& k' a0 x- `: w6 t; g0 u <3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
3 K4 K# R% _0 H0 Z (遍历某个目录中的所有文件)
5 }$ g& e {& O" O' N; m: _5 g/ K
<4>输出的文件,需要稍加整理,以符合srt格式
- _9 n0 ]0 I) [2 B! [6 W9 n3 \- @2 C: I9 q) ?; a+ w j( ]' O
以下是一个Wav文件的控制台输出(幽魂开场动画)4 f' y) L# u! T8 c& D
: s1 p5 m5 w- B3 k7 W3 v- ) m0 o( p% E. j- H; N0 v
- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'- [/ ~8 C/ Q f" i
- whisper_model_load: loading model
# h; ]. f3 H" T - whisper_model_load: n_vocab = 51865; K5 q- h6 b9 `+ s; P" K- w
- whisper_model_load: n_audio_ctx = 1500) x0 g P3 o, B; ?7 m8 Y \6 u
- whisper_model_load: n_audio_state = 1280
4 s$ G( Q* N1 i3 |2 \) F - whisper_model_load: n_audio_head = 20
5 o$ U+ p/ r: b, K/ u: | - whisper_model_load: n_audio_layer = 32, j# d: E' ~0 P! ], d: ^, B
- whisper_model_load: n_text_ctx = 448
0 N3 G6 u7 D( E3 |& t" j" G - whisper_model_load: n_text_state = 1280
5 Q% H7 ?7 r. C- q0 ` - whisper_model_load: n_text_head = 201 Z) A0 [9 e5 h6 g3 k
- whisper_model_load: n_text_layer = 32& h; A( G6 H6 M2 s. B: @: J2 o/ |
- whisper_model_load: n_mels = 801 w; m1 p1 ?$ A" [/ g8 u$ ^8 G' h
- whisper_model_load: ftype = 1
, ]2 }' L1 t) W3 \& i - whisper_model_load: qntvr = 0" @5 j, I; z* b' L/ }
- whisper_model_load: type = 5
/ T& p0 ?9 N, u5 v5 M9 R# ~ - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
7 F. L4 y0 |9 \5 b+ D* N' g8 D - whisper_model_load: adding 1608 extra tokens
3 f/ y* _+ U% R) |" _( f - whisper_model_load: model ctx = 2951.27 MB
l E* [8 f2 Y) b) |/ q/ K+ k: a9 X) k - whisper_model_load: model size = 2950.66 MB) b5 X6 K6 Q, B6 e5 b( I8 M- c
- whisper_init_state: kv self size = 70.00 MB1 A* K9 D4 X- f/ c9 V, [$ V
- whisper_init_state: kv cross size = 234.38 MB% {! _9 R0 G% y' T- w3 ?, `
- New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
% U* [: j6 j9 n, Z- S+ p - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)
5 y0 g8 k6 {$ [' r% K. C' C+ k - New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)
/ i* E( C" K; | - New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
A: S! z7 ~$ ^6 p) X" Q4 U - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
% }7 T4 k3 z5 Q; J( s3 b - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
! ^1 i: ^1 A0 E! [+ ^+ E4 @6 r, A( ~ - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching): |2 J k6 X" t% ~( A8 o0 u0 C& T
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)+ Y# u* x$ ^3 ^# V2 o8 S9 W% N% B
- New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
& T, l: T7 @9 S - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)4 Z. z0 ^4 I! u* t5 I
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching): G) e4 k& J( S0 g4 D/ X
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
o D6 j% K* S9 w - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
8 i/ X5 k6 t F; J4 u) |5 |4 c - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)) W# w% h ~4 C' Z* }
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
6 x- A- ?7 _( c% f8 E - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.+ N8 }& s) Q/ l1 ^
- New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?' y4 s8 k( \! J `
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh., X- D4 f: B" b# J
- New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.
! Q' ~( H: d& ?" v, [! E - New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.
$ {# X& r) w$ u2 S - New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
1 S+ W, K# ]' f, q - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.# c5 p2 M' d* E) ]% ~. [& m; R
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.0 y! u y# e3 w* j5 n
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)+ b4 \) X5 j2 E# z
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
0 h3 K) s; C, ] - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)
7 S, S4 c& N* g3 o: [: U: V - New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
+ }1 H) ~" L6 M4 C: [3 V - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music). z! o* E& Q1 U
- New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)
' x$ W9 r6 G" I a - New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]: s+ l1 N/ M N% O- o$ O* G5 K
-
复制代码 4 }% `; J# t6 H4 r# u. Q- h* @# `+ h
3 G* f- @: i7 r$ A8 k7 v |