-
摘要: 【目的】本文旨在总结开源多模态大语言模型(Multimodal Large Language Model,MLLM)的最新进展,探索其在新闻领域的落地实践。【方法】首先,笔者介绍了 MLLM 的研究背景,对典型的开源和闭源模型在不同测试基准上的表现进行了对比,然后解析了模型架构,包括其组成部分和工作原理,接着探讨了训练策略以及所需要的数据,最后展望了应用场景和研究方向。【结果 / 结论】通过本文的分析,可以了解到开源 MLLM 在赶超闭源商业模型上的潜力和发展方向,以及在新闻领域广阔的应用前景,同时为采编业务全流程提供强大的语言理解和生成能力,未来可以结合实际场景开展相关技术的落地实践。
-
[1] A ConvNet for the 2020s[EB/OL].(2022-03-02)[2024-05-020].https://arxiv.org/abs/2201.03545. [2] InternVL:Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks[EB/OL].(2024-1-15)[2024-05-20].https://arxiv.org/abs/2312.14238. [3] How Far Are We to GPT-4V?Closing the Gap to Commercial Multimodal Models with OpenSource Suites[EB/OL].(2024-04-29)[2024-06-03].https://arxiv.org/abs/2404.16821. [4] Visual Instruction Tuning[EB/OL].(2023-12-11)[2024-05-06].https://arxiv.org/abs/2304.08485. [5] Improved Baselines with Visual Instruction Tuning[EB/OL].(2024-05-15)[2024-06-05].https://arxiv.org/abs/2310.03744. [6] Mini-Gemini:Mining the Potential of Multi-modality Vision Language Models[EB/OL].(2024-03-27)[2024-05-29].https://arxiv.org/abs/2403.18814. [7] A Survey on Multimodal Large Language Models[EB/OL].(2024-04-01)[2024-05-21].https://arxiv.org/abs/2306.13549. [8] A survey of large language models[EB/OL].(2023-03-31)[2024-06-06].https://arxiv.org/abs/2303.18223. [9] Attention Is All You Need[EB/OL].(2023-08-02)[2024-05-23].https://arxiv.org/abs/1706.03762. [10] Gpt-4 technical report[EB/OL].(2023-03-15)[2024-05-10].https://arxiv.org/abs/2303.08774.
点击查看大图
计量
- 文章访问数: 21
- HTML全文浏览量: 3
- PDF下载量: 6
- 被引次数: 0