Since ChatGPT arrived in late 2022, giant language fashions (LLMs) have continued to lift the bar for what generative AI methods can accomplish. For instance, GPT-3.5, which powered ChatGPT, had an accuracy of 85.5% on widespread sense reasoning information units, whereas GPT-4 in 2023 achieved round 95% accuracy on the identical information units. Whereas GPT-3.5 and GPT-4 primarily targeted on textual content processing, GPT-4o — launched in Could of 2024 — is multi-modal, permitting it to deal with textual content, photographs, audio and video.
Regardless of the spectacular developments by the GPT household of fashions and different open-source giant language fashions, Gartner, in its hype cycle for synthetic intelligence in 2024, notes that “generative AI has handed the height of inflated expectations, though hype about it continues.” Some causes for disillusionment embody the excessive prices related to the GPT household of fashions, privateness and safety considerations concerning information, and points with mannequin transparency. Small language fashions with fewer parameters than these LLMs are one potential resolution to those challenges.
Smaller language fashions are simpler and more cost effective to coach. Moreover, smaller fashions could be hosted on-premises, offering higher management over the shared information with these language fashions. One problem with smaller fashions is that they are typically much less correct than their bigger counterparts. To harness the strengths of smaller fashions whereas mitigating their weaknesses, enterprises are taking a look at domain-specific small fashions, which have to be correct solely within the specialization and use circumstances they help. This area specialization could be enabled by taking a pre-trained small language mannequin and fine-tuning it with domain-specific information or utilizing immediate engineering for extra efficiency positive aspects.