This is an AI translated post.
Apple's OpenELM / MS's Phi-3 / Meta's Llama 3 Released
- Writing language: Korean
- •
- Base country: All countries
- •
- Information Technology
Select Language
Summarized by durumis AI
- Major tech companies such as Apple, Microsoft, and Meta are causing significant changes in the AI industry by recently releasing new large language models.
- Each company is showcasing differentiated models by reducing model size, optimizing data/algorithms, or enhancing contextual understanding.
- In particular, Apple's OpenELM is designed to be suitable for small devices, and Meta's Llama 3 exhibits excellent performance despite its small size thanks to its efficient model structure.
Recent Notable Releases of Large Language Models
In the past week, major tech giants such as Apple, Microsoft, and Meta have consecutively unveiled new large language models, causing a significant stir in the AI industry. Let's delve deeper into the key features and significance of these recently released models.
Apple's OpenELM
On April 25th, Apple unveiled its own OpenELM language model suite. It consists of four models of varying sizes: 0.27B, 0.45B, 1.08B, and 3.04B. Even the largest model only has 3 billion parameters, making it relatively small. Considering that most current large language models have at least 3 billion parameters, OpenELM is quite compact.
This is because Apple developed OpenELM with the intention of primarily using it on small devices. In the past, increasing the number of parameters was the main approach to achieving high performance, but recently, the trend has shifted towards miniaturization and lightweighting. Apple has also increased openness by releasing the entire package, including model weights, inference code, datasets, and frameworks.
MS's Phi-3 Series
Microsoft also first released the Phi-3 Mini model (3.8B parameters) on April 23rd, and plans to release the Phi-3 Small with 7B parameters and the Phi-3 Medium with 14B parameters in the future. Phi-3 Mini is an open model that anyone can use for commercial purposes free of charge. All new Phi-3 series models will be provided through MS's cloud service, Azure.
Meta's Llama 3
Meta (formerly Facebook) first released the 8B and 70B versions of the Llama 3 model on April 18th, and plans to release the larger 400B model in the summer. Particularly, the 8B model has been praised by the developer community for its outstanding performance despite its small size.
This is believed to be due to Meta's investment of a vast amount of training data to build an efficient model structure. It can be seen as a result of focusing on data and algorithm optimization instead of increasing the number of parameters.
xAI's Grok 1.5
The Grok 1.5 model announced on March 38th can process long context tokens up to 128K, enabling complex and lengthy prompting. While the trend in language model development has focused on simply increasing the size of parameters, Grok 1.5 has presented a new direction of enhancing long-context understanding ability.
As such, the release of new large language models by leading companies like Apple, MS, and Meta in the past week has diversified the direction of AI technology evolution. New attempts are being made in various aspects, including model size reduction and lightweighting, data/algorithm optimization, and enhanced contextual understanding. It remains to be seen how the AI ecosystem will evolve in the future.