해리슨 블로그

Apple's OpenELM / MS's Phi-3 / Meta's Llama 3 Release

Created: 2024-04-27

Created: 2024-04-27 10:41

Recent Notable Large Language Model Releases

Over the past week, major tech giants like Apple, Microsoft, and Meta have successively unveiled new large language models, causing significant ripples in the AI industry. Let's delve deeper into the key features and significance of these newly released models.

Apple's OpenELM

On April 25th, Apple unveiled its self-developed OpenELM language model family. Comprising four models of varying sizes – 0.27B, 0.45B, 1.08B, and 3.04B – even the largest model boasts only 3 billion parameters, making it relatively small. Considering that most large language models today have at least 3B parameters, OpenELM is indeed a compact model.
This is because Apple developed OpenELM with a primary focus on deployment on smaller devices. While increasing the number of parameters was once the primary method for achieving high performance, the current trend is shifting towards miniaturization and lightweight designs. Apple has also enhanced openness by publicly releasing not only the model weights and inference code but also the dataset and framework.

MS's Phi-3 Series

Microsoft also initially released the Phi-3 Mini model (3.8B parameters) on April 23rd, with plans to launch the 7B Phi-3 Small and 14B Phi-3 Medium models in the future. Phi-3 Mini is an open model, freely available for commercial use by anyone. All new Phi-3 series models will be offered through MS's cloud service, Azure.

Meta's Llama 3

Meta (formerly Facebook) first released the 8B and 70B versions of the Llama 3 model on April 18th, with plans to unveil the larger 400B model in the summer. Notably, the 8B model has received praise from the developer community for its exceptional performance despite its smaller size.
This is attributed to Meta's extensive use of training data and the creation of an efficient model architecture. It can be seen as a result of prioritizing data and algorithm optimization over simply increasing the number of parameters.

xAI's Grok 1.5

Announced on March 38th, the xAI Grok 1.5 model can process long context tokens up to 128K, enabling complex and lengthy prompting. While the previous trend in language model development focused on simply increasing parameter size, Grok 1.5 introduces a new direction towards enhancing long-context understanding.

⁠⁠⁠⁠⁠⁠⁠
The recent wave of new large language model releases from leading companies like Apple, MS, and Meta has led to increased diversification in the evolutionary direction of AI technology. New attempts are emerging across various aspects, including model size reduction and lightweight design, data/algorithm optimization, and enhanced context comprehension. The future evolution of the AI ecosystem is worth watching.

Comments0