埃里克·布斯90岁的祖母来看他时, 她的听力已经严重恶化,即使戴上助听器, 她很难理解人们在说什么. 他看着她靠近说话的人,试图读懂他们的唇语, 努力理解所说的话. 当不止一个说话者参与其中时,她经常会忘记谈话内容.

后来,Eric——美光的云计算高级业务开发经理——有了一个主意. 他的祖母有一部智能手机,所以为什么不让它为她“倾听”呢? 他打开了她的笔记应用程序, 按下麦克风按钮, 并向她展示了它是如何将他的演讲转录成屏幕上的文字的.

“她太兴奋了,笑得合不拢嘴. Now she was able to participate in conversations where, in the past, she couldn’t,他说. “This is how this technology can really improve the quality of life for people with speech, 语言及听力障碍.”

The technology to transcribe speech to text may seem simple and easy to overlook but it is a complex process that has taken decades to advance to the point it is today.


自从第一个语音识别(SR)设备出现以来,已经有很长时间了, 奥黛丽,首次亮相. 贝尔实验室在1962年推出了奥黛丽. 这台六英尺高的计算机只能识别个位数. 而不是生成文本, it flashed lights corresponding to the digit spoken — nine blinks of the light for the word “nine,例如:.

甚至在几年前, SR技术不是很友好:经常不准确, 即使是最轻微的环境声音也无法过滤掉, 转录缓慢. SR要真正发挥作用还有很长的路要走.

今天, 人工智能的进步使SR成为可能, 虚拟助手技术, 5G蜂窝技术, 和记忆, 存储和计算机处理. This enables us to do many things we couldn’t before: communicate in 语言s we’ve never spoken, 几乎立即转录长录音, order – merely by speaking words into the air – almost anything we want for delivery to our front door.

现在,生成式人工智能正在进一步提升这项技术. 而语音识别则将音频解析为文本, 生成式人工智能处理文本以真正理解其含义. 不只是,怎么说来着? 但是,这些话是什么意思? 这些文字在问问题吗? 如果有,答案是什么?

这种类型的机器学习可以创建文本, video, 图片, 计算机代码和其他内容, 基于用户提示或对话. 基于语音识别的生成式人工智能将学习提升到了一个新的水平, opening up possibilities for this technology to further help people with speech or hearing disabilities.

While nimble speech recognition ingests 语言 that may not follow normal speech patterns, generative AI and natural 语言 processing (NLP) make sense of it and turn it into relevant recommendations. 这个过程使整体的,高度个性化的语言治疗成为可能.

埃里克的女儿也参加了语言治疗, 所以他对所需的时间和精力有第一手的了解. These experiences inspired him to enroll in a doctoral program at Boise State University in Idaho to re搜索 ways that technology can help children with speech disabilities.

在语言治疗中, we used to think that the therapist would give the student content to read and then a tool would score how well they did in pronunciation and enunciation,埃里克解释道。. “But with generative AI, there is promise of a tool that could handle the whole process. 它擅长识别模式, 所以它可以判断一个学生是否, 例如, 总是发错o的音.”


直到最近, 语音识别意味着你需要一个拥有大量内存的大型服务器, 所有收集到的数据都必须上传到云端. 现在,你的手机内置了语音识别功能. 计算变得更快了, 内存变快了, 以前的数据中心流程现在在你的手机上.

很快,生成式人工智能过程也将出现在你的手机或其他终端设备上. Because the training process for AI models is not just about making more complex models, 还可以简化它们,以便在手机或PC等终端设备上工作. 随着这些大型语言模型的增长, 在云环境之外进行培训是不可能的. But, once you have it trained, and then simplified, it can move to the endpoint device.

In the last few years, there’s been tremendous advancement on large 语言 models:

“These models are key to generative AI chatbots and advanced 搜索 functions,” says Eric. “大型语言模型有数万亿个参数. 几年前,一万亿参数是不可想象的——它无法被处理. 今天,一万亿是基准. 当然,模型越大,它就越智能. 这正是驱动计算和内存需求的原因.”

自然语言处理和生成式人工智能需要强大的大型语言模型训练, 参数越多, 需要更多的内存(参见图1).)

自然语言生成ai模型的图表 图1
沙巴体育安卓版下载在一个任务或数据集上训练模型, 然后使用这些参数在不同的任务或数据集上训练另一个模型. 对于ChatGPT, 例如, the model has been pre-trained on huge amounts of conversational data from the internet so that it can answer general questions, then it adapts to the current conversation based on the additional context received from the prompts it is given. 这给了模型一个良好的开端,而不是从头开始. 现在您有了一个具有少量数据的健壮模型.


如今,许多人工智能研究人员都专注于生成式人工智能. 这不仅仅是因为ChatGPT的热议, it’s also because of the profound potential 应用程序 in healthcare and other 行业.




在美国有超过一百万的儿童.S. receive professional help in school for speech and 语言 disorders, according to the 美国语言听力协会. 总的来说, 8%的儿童有语言迟缓或残疾埃里克说.


“You can’t just go on the open market and buy a speech therapy package of technology for children,他说. “它不存在.他说,这项技术是必要的,尤其是对低收入家庭的孩子. 对儿童进行评估至少需要两个小时埃里克说, 但政府项目可能只支付30分钟的费用.


“A lot of things that take up the therapist’s time could be done by a computer to free up the therapist to do more long-term planning and more focused therapy sessions,他说.


Children with learning disabilities such as dyslexia can also benefit from having their spoken words transcribed into text, 根据学习障碍资源基金会. Like the ingenious use of talk-to-text to help Eric’s grandmother join into conversations, 这种基础人工智能技术有许多未开发和无法想象的用例.




今天, 美光的密度越来越大, ever-faster memory and storage that increasingly allow 语言 processing to occur right on a person’s phone rather than in the cloud, 节省数据传输时间.


为这些端点设备供电, 微米’s low-power double data rate 5X (LPDDR5X) memory delivers a balance of power efficiency and performance for a seamless user experience. LPDDR5X提供最快,最先进的移动存储器,峰值速度为8.每秒533千兆比特(Gbps),比上一代快33%. LPDDR5X’s speed and bandwidth are essential to have powerful generative AI (literally) at hand.


生成式人工智能, SR is getting closer and closer to working as quickly and as accurately as the human brain. 但是,要实现这一目标仍然存在巨大的障碍, 尤其是处理儿童的语言, 口音, 对于有听力或语言障碍的人. Projects like the one Eric is working on can truly change the way generative AI technologies can enrich the lives of all people.


But generative AI is using deep learning to produce text from speech that is increasingly natural — more like human speech. 在过去, 人工智能模型擅长于吸收大量数据, 从诊断的角度识别模式并确定根本原因. 今天, generative AI “reads” text and uses that data to make contextual inferences from human communications. 从本质上讲,这就是“训练”本身. 这样做, 它需要访问和同时吸收大量数据的能力, 从大量的记忆中提取以确定适当的反应. 美光技术正在使这些进步成为可能.


微米’s high-density DDR5 DRAM模组 and multi-terabyte 固态硬盘 storage enable the speed and capacity required to train generative AI models in the data center. 新发布的 HBM3E 进一步提高性能,在超过1的情况下提供50%以上的容量.2 terabytes per second bandwidth which can reduce training time for multitrillion parameter AI models by more than 30%. As these technologies get faster and more accurate, more people can “speak” and be heard.


“We’re going to see disruptive leaps in performance in generative AI and SR technology in the near future,埃里克预测道. “看到这项技术丰富了人们的生活,我真的很酷.”

