冒险解谜游戏中文网 ChinaAVG
标题:
【游戏视频汉化 #1】 AI语音识别软件 Whisper编译
[打印本页]
作者:
shane007
时间:
2023-8-28 11:48
标题:
【游戏视频汉化 #1】 AI语音识别软件 Whisper编译
多年之前,就有为不带字幕的游戏视频配上字幕的想法。
4 L, z. e' d1 k
但是当时条件不成熟,但是目前来看,条件似乎成熟了
w0 h0 G1 h% d0 z
9 O# b. V. m) w2 P+ K
Whisper是openAI的开源语音识别软件。
/ y# m8 S. M0 }
它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
3 N3 D3 ]5 k/ e1 r+ a1 M8 M
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
9 |9 w8 z9 d1 r2 `& m
# l8 k/ q, B2 I* f3 ~! f" O
地址如下
0 O: S3 c' H; O3 v; P6 q" ?
https://github.com/sandrohanea/whisper.net
6 V W% K( F0 \) Y0 y; i6 B6 X
; Z$ q& l& E; K& L3 {0 I
; Q1 B! J l b( |7 w' S) l
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
9 f5 |3 B& X, j1 X
. Q! I, \( n2 y( F
编译好之后,有几个注意点
* T: W" s2 ?" ]) A; Z1 `$ V& _
8 }# R$ s( U8 H' o9 X; ~" ~2 l) ~( z
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
5 @* s/ z4 k6 E& D) n! W" [
当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
) \+ f4 S% l1 m+ H' L$ l J
$ c( g9 J3 \0 G* O' m
<1>Language要设定为"english"。
% Q0 _4 J+ s. v* f* ?
: H' f% U2 O! f$ Q0 o: p. a+ F5 C
/* var builder = factory.CreateBuilder()
3 e6 r* X* }0 R a4 j, P
.WithLanguage(opt.Language);*/
' Z D, E: Q: R$ N
var builder = factory.CreateBuilder()
9 j/ e7 @. f* F2 _4 b3 z
.WithLanguage("english");
复制代码
, t, @6 V+ V4 F; m2 V" o& H" N
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
2 t M7 o; C$ q ~* j
3 J7 U# f6 D( E8 a- H- D& L( l
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
# e( K/ D; Y8 y$ v" S
(遍历某个目录中的所有文件)
# w# f; {3 a6 k# l
% |, P [! a y% I& c
<4>输出的文件,需要稍加整理,以符合srt格式
% Y2 d& \4 ^% [1 D! I% C$ O% V
/ o/ n: B, }8 D. b8 U0 r0 w% \
以下是一个Wav文件的控制台输出(幽魂开场动画)
$ s5 L- G2 t9 l; ^8 N$ X
, Q1 c4 b; F0 K! I# }2 ^
4 X) v$ w9 @7 T0 g! g$ _7 x
whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
: J! K7 f9 v* m' f+ c
whisper_model_load: loading model
$ f4 Q- ^ b+ ^; s4 n2 K
whisper_model_load: n_vocab = 51865
' }! O* N1 }3 G9 V4 j4 @* b% n
whisper_model_load: n_audio_ctx = 1500
: r2 B4 m+ ]) v# k4 b
whisper_model_load: n_audio_state = 1280
# A, K' g6 t y% O
whisper_model_load: n_audio_head = 20
) V% T) ~9 ] s/ s
whisper_model_load: n_audio_layer = 32
5 D; u0 W7 `+ u* g
whisper_model_load: n_text_ctx = 448
. _+ I$ r v& A
whisper_model_load: n_text_state = 1280
( G( x- C6 D) V* Y3 a% h
whisper_model_load: n_text_head = 20
5 e, m5 A* r9 Z! e! C
whisper_model_load: n_text_layer = 32
1 B: I$ o: F% O: B
whisper_model_load: n_mels = 80
- ?# ^0 u' d/ k, Y0 J
whisper_model_load: ftype = 1
9 W& R! B2 \, d! A
whisper_model_load: qntvr = 0
% D, B$ r, |9 N; C* j- m- L8 U
whisper_model_load: type = 5
* g9 X2 w8 [2 K0 T
whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
! T: |" n! y( W( ]" m
whisper_model_load: adding 1608 extra tokens
* R+ h, k) X# @6 s1 {. L
whisper_model_load: model ctx = 2951.27 MB
! R! ]$ D0 U3 Z! M
whisper_model_load: model size = 2950.66 MB
J) d1 F) \* z5 W' _8 C
whisper_init_state: kv self size = 70.00 MB
6 G! _' y0 s7 x
whisper_init_state: kv cross size = 234.38 MB
* a& W0 ?7 V& w Y+ }2 f0 x
New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
3 Q- w [0 x4 S$ F
New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)
/ f8 s3 t" H" j
New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)
" ?' \9 n' T& X8 E( W3 r
New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
3 ]/ D( e; z( O8 J
New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
/ o' P# p( N* T% o
New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
" W5 p0 b7 j3 L
New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)
% [6 d Z' C: S/ J* A% N8 y
New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
( c9 s; I5 {% {$ A6 ]# z" b% O$ I8 J/ Y1 z
New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
7 F# N% C: Z- |7 P% {* a/ R2 \3 p" j
New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)
, p0 g2 \- W6 y3 p; U3 v2 U2 R
New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)
4 A6 K; W: b9 \
New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
: A( @6 u, X' d% Y' g
New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
* J8 B/ P d. N% m$ y
New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)
$ P2 t' e) `. a& e4 s* W. x
New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
9 ^- J' [& S' C+ }( w2 i
New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
& H: a4 q+ B6 q- e
New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?
3 o8 J- d- j( ]8 o1 R: P1 L
New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
; e) |2 I2 V& H3 J
New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.
; ]7 W0 e% l( R7 V4 b& ]
New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.
/ o0 j5 U# Y( A+ \9 E- _
New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
* }: a6 \4 Y2 i+ d8 r' e2 K# K) y4 L
New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.
t f) h q4 _( f$ n" ?! m2 I
New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.
8 w+ \8 D! d( B! q
New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)
9 W+ y. G; L" |% D2 D# x# J
New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
* {, G% ~ b/ S- n( V% m3 f
New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)
' B0 S7 d, |: M- D3 Y* z
New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
( y& v( |$ j/ }# C+ Y% Q
New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
" u) X# h: D( [) l- n' a
New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)
0 O, _! T: K3 z c/ S. V8 Y3 A
New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
( h+ l% ]! u; j# G L4 K7 [6 D
复制代码
8 A+ P7 M2 ~) Y2 v
: r# c7 b- Z$ H7 Y! r
作者:
星之韶华
时间:
2025-3-22 18:07
学习学习一下
欢迎光临 冒险解谜游戏中文网 ChinaAVG (https://chinaavg.com/)
Powered by Discuz! X3.2