OpenAI ปล่อยโมเดลแปลงเสียงเป็นข้อความใหม่ เล็กลงครึ่งหนึ่งแต่คุณภาพใกล้เคียงของเดิม ยกเว้นภาษาไทยแย่ลงมาก

By lew

on 4 October 2024 - 13:44 Tag: OpenAI, Artificial Intelligence

OpenAI

OpenAI ปล่อยโมเดลแปลงเสียงเป็นข้อความ whisper-large-v3-turbo ปรับย่อโมเดลโดยลดชั้น decoder ลงจาก 32 ชั้นเหลือ 8 ชั้น ทำให้พารามิเตอร์เดิม 1,550 ล้านพารามิเตอร์เหลือเพียง 809 ล้านพารามิเตอร์เท่านั้น

หลังจากปรับย่อลงแล้ว ทีมงานนำข้อมูลฝึกของโมเดล large-v3 เดิมมาฝึกซ้ำอีกสองรอบแล้ววัดประสิทธิภาพรวม พบว่าโมเดลกลับไปมีคุณภาพค่อนข้างดีใกล้เคียงกับโมเดลต้นทาง ยกเว้นภาษาไทยและกวางตุ้งเท่านั้นที่ประสิทธิภาพลดลงชัดเจน ในกรณีชุดข้อมูล Common Voice นั้นอัตราคำผิดภาษาไทยสูงขึ้นเกือบ 4 เท่าตัว

แนวทางการพัฒนา whisper-large-v3-turbo ปรับมาจากงานวิจัย Distil-Whisper ที่นำเอาท์พุตจากโมเดลขนาดใหญ่มาฝึกโมเดลขนาดเล็กกว่า แต่ทาง OpenAI อาศัยการฝึกด้วยข้อมูลเต็มแทน

ตอนนี้ whisper-large-v3-turbo เป็นโมเดลเริ่มต้นในแพ็กเกจ openai-whisper เวอร์ชั่นล่าสุด หากใครใช้งานภาษาไทยอาจจะต้องระวังปรับไปใช้โมเดลอื่น

ที่มา - OpenAI/Whisper

Hiring! บริษัทที่น่าสนใจ

Carmen Software

Hotel Financial Solutions

Next Innovation (Thailand) Co., Ltd.

We are web design with consulting & engineering services driven the future stronger and flexibility.

KKP Dime

KKP Dime บริษัทในเครือเกียรตินาคินภัทร

Kiatnakin Phatra Financial Group

Financial Service

Fastwork Technologies

Fastwork.co เว็บไซต์ที่รวบรวม ฟรีแลนซ์ มืออาชีพจากหลากหลายสายงานไว้ในที่เดียวกัน

Thoughtworks Thailand

Thoughtworks เป็นบริษัทที่ปรึกษาด้านเทคโนโยลีระดับโลกที่คว้า Great Place to Work 3 ปีซ้อน

Iron Software

Iron Software is an American company providing a suite of .NET libraries by engineer for engineers.

CLEVERSE

Cleverse is a Venture Builder. Our team builds several tech companies.

Nipa Cloud

#1 OpenStack cloud provider in Thailand with our own data center and software platform.

Bangmod Enterprise

The leader in Cloud Server and Hosting in Thailand.

CIMB THAI Bank

MOVING FORWARD WITH YOU - CIMB is the leading ASEAN Bank

Bangkok Bank

Bangkok Bank is one of Southeast Asia's largest regional banks, a market leader in business banking

MuvMi (Urban Mobility Tech Co.,Ltd.)

Shape the future of urban mobility towards affordable, clean, and safe solutions

T.N. Digital Solution Co., Ltd.

TNDS has been involving in every first move of banking’s major digital transformation.

KBTG - KASIKORN Business-Technology Group

KBTG - "The Technology Company for Digital Business Innovation"

Siam Commercial Bank Public Company Limited

"Let's start a brighter career future together"

Icon Framework co.,Ltd.

Global Standard Platform for Real Estate แพลตฟอร์มสำหรับธุรกิจอสังหาริมทรัพย์ครบวงจร มาตรฐานระดับโลก

REFINITIV

The Financial and Risk business of Thomson Reuters is now Refinitiv

H LAB

Re-engineering healthcare systems through intelligent platforms and system design.

The Gang Technology Co., Ltd.

We're a Digital Agency that helps our customers transform their business into digital with ease.

LTMH

LTMH มุ่งเน้นการพัฒนาผลิตภัณฑ์ที่สามารถช่วยพันธมิตรของเราให้บรรลุเป้าหมาย

Seven Peaks

We Drive Digital Transformation

Wisesight (Thailand) Co., Ltd.

The Best Choice For Handling Social Media · High Expertise in Social Data · Most Advanced and Secure

MOLOG Tech

We are Modern Logistic Platform, Specialize in WMS, OMS and TMS.

Data Wow Co.,Ltd

We enable our clients to realize increased productivity by solving their most complex issues by Data

LINE Company Thailand

LINE, the world's hottest mobile messaging platform, offers free text and voice messaging + Call

LINE MAN Wongnai

Join our journey to becoming No.1 food platform in Thailand

สำหรับภาษาไทย

tontan Fri, 04/10/2024 - 14:05

สำหรับภาษาไทย ถ้ามีเสียงมากกว่านี้น่าจะดีกว่านี้

ใครอยากช่วยให้โมเดลแปลงเสียงเป็นข้อความภาษาไทยแบบสาธารณะมีความแม่นยำสูงขึ้น สามารถช่วยได้โดยช่วยกันทำชุดข้อมูลเปิดสาธารณะอย่าง Common Voice สามารถอ่านได้ที่

common voice ไทยนี่เพราะ

kandation Sat, 05/10/2024 - 12:51

common voice ไทยนี่เพราะ dataset น้อย แล้วบางส่วนก็มีเสียง ai นี่เกี่ยวด้วยไหม

ส่วนใหญ่เกี่ยวกับขนาด dataset

tontan Sat, 05/10/2024 - 17:03

ส่วนใหญ่เกี่ยวกับขนาด dataset น้อยมากกว่าครับ แต่ต้องยอมรับว่า common voice ภาษาไทยไม่ได้เป็นชุดข้อมูลทดสอบที่เหมาะสมที่สุดในการวัดประสิทธิภาพโมเดลครับ เนื่องจากข้อความใน common voice มีการพิมพ์ข้อความผิดอยู่ปนบ้าง มีตัวเลขปน ภาษาอื่นปน คุณภาพเสียงเป็นแบบ real world มีเสียงรบกวนบ้าง สิ่งเหล่านี้ทำให้การวัดผลอาจจะ error ได้เหมือนกัน แต่ก็ยังมีข้อดีคือ common voice ของไทยเรามีเสียงหลากหลายพอสมควรครับ กว่า 7000 เสียงที่ไม่ซ้ำกัน ส่วนเสียง AI เหมือนจะโดนตีตกไปเยอะพอควรครับ