<sup id="2quio"></sup>
<code id="2quio"><tr id="2quio"></tr></code><code id="2quio"></code>
<object id="2quio"><small id="2quio"></small></object>
<tr id="2quio"><noscript id="2quio"></noscript></tr>
速記速錄師愛好者網站 http://www.846762.com
投稿郵箱:sujipx@163.com
當前位置:>> 首頁 >> 信息處理 > ACR語音識別 > 速記員即將被淘汰,未來 AI 可以把一切轉錄為文字

速記員即將被淘汰,未來 AI 可以把一切轉錄為文字

發布時間:2020/4/21 14:47:30 閱讀次數:1927

人工智能勢不可當。雖然尚不完美,卻極有可能在未來取代打字員,將人類從打字的繁瑣中解放出來,甚至使人們擺脫設備的束縛。便捷、高效、低廉的人工智能轉錄還將對未來社會產生哪些影響?本文編譯自GREG NOONE在 the Atlantic上發表的“”。

怎樣才是描述報業大亨魯伯特·默多克(Rupert Murdoch)被奶油派砸了一臉的最好方式?這對世界新聞界來說不成問題。幾乎所有媒體都報道了在2011年英國議會聽證會期間,這位媒介大亨發表證詞時發生的意外事件,報道風格從高雅喜劇到低俗喜劇皆由。但這對聽證會的官方書記員來說,則是另一回事。通常情況下,書記員的工作只是記錄聽到的話語。奶油派襲擊事件發生后——無論是出于有意選擇還是受制于議會的固定風格——書記員決定以最簡單的方式,將其標注為“中斷”。

What is the best way to describe Rupert Murdoch having a foam pie thrown at his face? This wasn’t much of a problem for the world’s press, who were content to run articles depicting the incident during the media mogul’s testimony at a 2011 parliamentary committee hearing as everything from high drama to low comedy. It was another matter for the hearing’s official tranionist. Typically, a tranionist’s job only involves typing out the words as they were actually said. After the pie attack—either by choice or hemmed in by the conventions of house style—the tranionist decided?to go the simplest route?by marking it as an “[interruption].” ?

專業領域有大量的對話——會議、面試和電話會議等——需要轉錄為文字并存檔,以備未來參考。這是一項繁瑣的日常工作,但對于愿意付費的人來說,這項工作可以外包給專業的轉錄服務商。轉錄服務商會反過來雇傭人員,遠程轉錄音頻文件,或像我幾個月的從業經歷一樣,參加會議,實時記錄聽到的內容。

Across professional fields, a whole multitude of conversations—meetings, interviews, and conference calls—need to be transcribed and recorded for future reference. This can be a daily, onerous task, but for those willing to pay, the job can be outsourced to a professional tranion service. The service, in turn, will employ staff to transcribe audio files remotely or, as in my own couple of months in the profession, attend meetings to type out what is said in real time.

盡管近年來出現了基于瀏覽器的轉錄助手,在現代西方經濟社會中,轉錄依然是一項苦役,因為機器還是無法完全替代人類。直到去年年底,微軟推出了一款產品使之成為可能。

Despite the recent emergence of browser-based tranion aids, tranion’s an area of drudgery in the modern Western economy where machines can’t quite squeeze human beings out of the equation. That is until last year, when Microsoft built one that could.

微軟首席語言科學家黃學東(Xuedong Huang)在蘇格蘭愛丁堡大學攻讀博士課程時,就被自動語音識別(ASR)深深地吸引了!爱敃r我剛離開中國,”黃學東回憶起用本科水平的美式英語,試圖聽懂蘇格蘭口音的教授講話時的困難,他說,“我希望每個講師和教授在教室里授課時,都能有字幕!

Automatic speech recognition, or ASR, is an area that has gripped the firm’s chief speech scientist, Xuedong Huang, since he entered a doctoral program at Scotland’s Edinburgh University. “I’d just left China,” he says, remembering the difficulty he had in using his undergraduate knowledge of the American English to parse the Scottish brogue of his lecturers. “I wished every lecturer and every professor, when they talked in the classroom, could have subtitles.”

為了實現這種實時服務,黃學東和他的團隊首先需要創建一個能夠追溯轉錄的程序。人工智能的發展使他們得以利用名為“深度學習”的技術,將該程序訓練為能從大量數據中識別出模式。黃學東和他的同事們利用該軟件來轉錄NIST 2000 CTS測試集,這是20多年來作為語音識別工作基準的一組記錄談話。職業打字員在轉錄兩個不同部分的測試時,分別會出現5.9%和11.3%的錯誤率。微軟團隊開發的系統則略微勝過兩者。

In order to reach that kind of real-time service, Huang and his team would first have to create a program capable of retrospective tranion. Advances in artificial intelligence allowed them to employ a technique called deep learning, wherein a program is trained to recognize patterns from vast amounts of data. Huang and his colleagues used their software to transcribe the NIST 2000 CTS test set, a bundle of recorded conversations that’s served as the benchmark for speech recognition work for more than 20 years. The error rates of professional tranionists in reproducing two different portions of the test are 5.9 and 11.3 percent. The system built by the team at Microsoft edged past both.

“這還不是一個實時系統,”黃學東承認,“但它與我們所期望的非常相近了,在我們現有能力的基礎上已經到達了極限。實時系統沒有那么遙不可及了!

“It wasn’t a real-time system,” acknowledges Huang. “It was very much like we wanted to see, with all the horsepower we have, what is the limit. But the real-time system is not that far off.”

的確,ASR程序已經能夠準確地轉錄采訪或會議內容,內容看上去也不再那么荒唐。在上個月微軟舉辦的Build大會上,副總裁沈向洋(Harry Shum)展示了一款PowerPoint轉錄服務,展示時的語音能夠和個人幻燈片相關聯。同時,微軟也在和蘋果、谷歌等公司展開激戰,讓實時移動翻譯應用能夠完美地進行轉錄。

Indeed, the promise of ASR programs capable of accurately transcribing interviews or meetings as they happen no longer seems so outlandish. At Microsoft’s Build conference last month, the company’s vice-president, Harry Shum, demonstrated a PowerPoint tranion service that would allow the spoken words of the presentation to be tied to individual slides. The firm is also in a close race with the likes of Apple and Google to perfect the trans produced by its real-time mobile translation app.

黃學東相信,轉錄軟件將超越人類能力的觀點是可以理解的!巴昝澜Y果的定義是存在爭議的,”他用人類打字員的錯誤率加以印證!叭绾巍昝馈Q于特定情形和應用!

Huang believes the point at which tranion software will overtake human capabilities is open to interpretation. “The definition of a perfect result would be controversial,” he says, citing the error rates among human tranionists. “How ‘perfect’ this is depends on the scenario and the application.”

如果帶有實時轉錄語言任務的ASR系統,只有在正確理解每個詞的情況下才被認為是成功的,那么這在很大程度上已經被Cortana和Siri等手機助手實現了,只是實時翻譯應用尚不具備這種功能。然而,越來越多的計算機科學家意識到,對于自動轉錄音頻的要求并不需要那么高,文本中的錯誤可以之后修改。

An ASR system tasked with transcribing speech in real time is only deemed successful if every word is interpreted correctly, something that largely has been achieved with mobile assistants like Cortana and Siri, but has yet to be mastered in real-time translation apps.? However, a growing number of computer scientists are realizing that standards do not need to be as high when it comes to the automatic tranion of recorded audio, where any mistakes in the text can be amended after the fact.

“我們并不聲稱…這是完美的。只是在擁有優質音頻的情況下,它能夠接近完美!

“We don’t claim ... this is perfect. But, with good audio, it can be close to perfect.”

兩家公司——位于倫敦的Trint和推出SwiftCribe應用的中國互聯網巨頭百度——已經推出了基于瀏覽器的工具,能夠將一小時以內的音頻轉錄為文本,且錯誤率在5%以內。在頁面上,它們的輸出和我作為自由職業打字員參加許多會議期間實時打出的原始文檔相似,最好時像詹姆斯·喬伊斯(Joycean)的意識流巨作,最糟時像一篇官樣文章。但是通過把用戶從轉錄員變為編輯,這兩個程序都能夠免去數小時繁瑣而不能分心的任務。

Two companies—Trint, a start-up in London,and Baidu, the Chinese internet giant with an application called?SwiftScribe—have begun to offer browser-based tools that can convert recordings of up to an hour into text with a word-error rate of 5 percent or less.*?On the page, their output looks very similar to the raw documents I typed out in real-time during the many meetings I attended as a freelance tranionist: at best, a Joycean stream-of-consciousness marvel, and at worst, gobbledygook. But by turning the user from a scribe into an editor, both programs can shave hours off an onerous and distracting task.

當然,節省的時間取決于音頻的質量。Trint和SwiftScribe在轉錄幾乎無噪音的面對面訪談時表現出色,在轉錄嘈雜房間中的錄音、信號不佳的電話訪談或帶有非美式或英式英語口音時則十分吃力。我嘗試過對Trint播放一段德國口音的英語,卻看到它把“天氣相當冷,但氣氛不錯”轉錄成“那顆心也在嘔吐。是的,他的第一面!

The amount of time saved, of course, is contingent on the quality of the audio. Trint and SwiftScribe tend to make short work of face-to-face interviews with the bare minimum of ambient noise, but struggle to transcribe recordings of crowded rooms, telephone interviews with bad reception, or anyone who speaks with an accent that isn’t American or British English. My attempt to run a recording of a German-accented speaker through Trint, for example, saw the engine interpret “it was rather cold, but the atmosphere was great” as “That heart is also all barf. Yes. His first face.”

“我們并不認為在幾分鐘的訪談中,這樣的轉錄結果是完美的,”Trint的首席執行官杰夫·考夫曼(Jeff Kofman)說!暗,只要有高質量音頻,它就能接近完美。你可以搜索、重聽、查錯,就能在幾秒內知道究竟說了什么!

“We don’t claim that this turnaround in a couple of minutes of an interview like this is perfect,” says Jeff Kofman, Trint’s CEO. “But, with good audio, it can be close to perfect. You can search it, you can hear it, you [can] find the errors, and you know within seconds what was actually said.”

考夫曼表示,Trint的絕大多數用戶都是記者,其次是定性研究的研究員以及商界和醫療保健客戶——換句話說,都是需要在嚴格的規定時間內完成大量音頻轉錄的職業。這與SwiftScribe的開發者Ryan Prenger和他的同事們收集到的匿名用戶行為數據相一致。雖然Prenger推測有一些長尾用戶,他們只是渴望測試SwiftScribe能力的人工智能愛好者,但他也看到一些日常使用該程序轉錄語音的“超級用戶”。隨著ASR技術的不斷改進,他對該技術能夠吸引的用戶范圍感到樂觀。

According to Kofman, most of the people using Trint are journalists, followed by academics doing qualitative research and clients in business and healthcare—in other words, professions expected to transcribe a large volume of audio on tight deadlines. That’s in keeping with the anonymized data on user behavior being collected by the developer Ryan Prenger and his colleagues at SwiftScribe. While there is a long tail of users who Prenger speculates are simply AI enthusiasts eager to test out SwiftScribe’s capabilities, he’s also spotted several “power users” that are running audio through the program on almost a daily basis. It’s left him optimistic about the range of people the tool could attract as ASR technology continues to improve.

“這就是轉錄技術的一般情況,”Prenger說,“一旦精確度突破一定范圍,所有人都有可能開始轉錄,至少在前幾輪!彼A測,最終自動轉錄技術能夠提升對轉錄工作的需求和供給!拔磥砜赡軙霈F一個良性循環,更多人期望他們的音頻能夠被轉錄,因為快速轉錄將變得低價、方便。而且,它將成為轉錄一切的標準!

“That’s the thing with tranion technology in general,” says Prenger. “Once the accuracy gets above a certain bar, everyone will probably start doing their tranions that way, at least for the first several rounds.” He predicts that, ultimately, automated tranion tools will increase both the supply of and the demand for trans. “There could be a virtuous circle where more people expect more of their audio that they produce to be transcribed, because it’s now cheaper and easier to get things transcribed quickly. And so, it becomes the standard to transcribe everything.”

未來,Trint將有意識地進行拓展。該公司剛剛募集到310萬美元的種子基金,用于下一輪擴張?挤蚵退膱F隊計劃本月底在維也納舉行的全球編輯網絡峰會上,展示該技術的能力。他們的目標是在峰會主題發言結束一小時內,將轉錄結果發布在《華盛頓郵報》的網站上。

It’s a future that Trint is consciously maneuvering itself to exploit. The company just?raised $3.1 million in seed money?to fund its next round of expansion. Kofman and his team plan to demonstrate its capabilities later this month at the Global Editors Network in Vienna. Their aim is to have the tranion of the event’s keynote address up on the?Washington Post’s website within the hour.

雖然人們預計會出現錯誤,但仍然難以準確預測這次轉錄結果將會如何。速記員很有可能像小販和售冰員一樣,進入被遺忘的職業行列。在輔助寫作工具的協助下,記者可以花更多時間進行報道和寫作,偵探可以更早地分析出犯罪嫌疑人證言中的矛盾。YouTube上的視頻字幕將標準化,大量聽障人士能夠接觸到廣播節目和播客。與熟人、朋友、舊情人的通話能夠像社交軟件和電子郵件一樣存檔、搜索,也能被執法部門攔截、存儲。

It’s difficult to predict precisely what this new order could look like, although casualties are expected. The stenographer would likely join the ranks of the costermonger?and the?iceman?in the list of forgotten professions. Journalists could spend more time reporting and writing, aided by a?plethora of assistive writing tools, while detectives could analyze the contradictions in suspect testimony earlier. Captioning on YouTube videos could be standard, while radio shows and podcasts could become accessible to the hard of hearing on a mass scale. Calls to acquaintances, friends, and old flames could be archived and searched in the same way that social-media messages and emails are, or intercepted and hoarded by law-enforcement agencies.

對于黃學東而言,轉錄技術只是ASR從根本上改變社會的一部分,這些變化已經能從Cortana,Siri和亞馬遜的Alexa之類的語音助手中瞥見!帮@而易見的是,下一波將讓你徹底脫離設備,”他想象著計算技術逐漸植入工作環境中!霸谖磥淼闹行,用戶界面技術將使人們從設備的束縛中解放出來!

For Huang, tranion is just one of a whole range of changes ASR is set to provide that will fundamentally change society itself, one that can already be glimpsed in voice assistants like Cortana, Siri, and Amazon’s Alexa. “The next wave, clearly, is beyond the devices that you have to touch,” he says, envisioning computing technology discreetly woven into a range of working environments. “UI technology that can free people from being tethered to the device will be in the front and center.”

然而目前,自動轉錄器的工程師們還是需要更多的相關用戶:例如在最后期限前拼搏的記者,或是想方設法描述一位男性在國會特選委員會上被砸了一臉奶油派的書記員。

For the moment, however, the engineers behind automated transcribers will have to content themselves with more germane users: the journalist sweating a deadline, or the tranionist working out the right way to describe a man being pied in a parliamentary select committee.

[1]

 
返回速錄fans首頁

上一篇:科大訊飛真的是全球語音識別冠軍..
下一篇:什么是ACR語音識別
相關標簽:速記員即將被淘汰,未來 AI 可以把一切轉錄為文字 ,速錄師,手寫速記,電腦速記,速錄機,中國速錄網
相關文章

圖文報道
92视频在线精品国自产拍 亚洲第一页日韩另类无码 国产欧美亚洲一区二区三 天天久久尤物精品福利99c
<sup id="2quio"></sup>
<code id="2quio"><tr id="2quio"></tr></code><code id="2quio"></code>
<object id="2quio"><small id="2quio"></small></object>
<tr id="2quio"><noscript id="2quio"></noscript></tr>