Apple just unveiled these nifty little AI language models called OpenELM. They’re pretty compact, so you can run them right on your smartphone instead of needing some beefy cloud servers.
It’s all part of this growing trend of “small language models” that are gaining traction in the AI world.
OpenELM models are still in the research phase, but they could be the foundation for some seriously cool on-device AI features from Apple down the line. Microsoft’s doing something similar with their Phi-3 models, aiming to pack a punch in terms of language understanding and processing, all while keeping things local.
Some of these OpenELM models are super tiny, ranging from just 270 million to 3 billion parameters. That’s nothing compared to behemoths like Meta’s Llama 3 or OpenAI’s GPT-3, which have billions upon billions of parameters. But here’s the thing: recent research is all about making these smaller models as capable as their larger predecessors.
Apple’s giving us eight different OpenELM models to play with. Four of them are “pretrained,” which is basically the raw, next-token version. The other four are instruction-tuned, making them perfect for building AI assistants and chatbots.
- OpenELM-270M
- OpenELM-450M
- OpenELM-1_1B
- OpenELM-3B
- OpenELM-270M-Instruct
- OpenELM-450M-Instruct
- OpenELM-1_1B-Instruct
- OpenELM-3B-Instruct
OpenELM models can handle a whopping 2048 tokens at a time, which is pretty impressive. Apple trained them on some seriously massive datasets, like RefinedWeb, a cleaned-up version of PILE, and chunks of RedPajama and Dolma v1.6. All in all, we’re talking about 1.8 trillion tokens of data. That’s a mind-boggling amount of information for these AI models to chew on.
Now, here’s where it gets really interesting. Apple’s using this nifty “layer-wise scaling strategy” for OpenELM. In summary, this clever strategy distributes parameters across each layer of the model, thereby reducing computational resources and enhancing performance. Get this: OpenELM managed to beat Allen AI’s OLMo 1B by 2.36 percent in accuracy, and it only needed half the pre-training tokens to do it. That’s some seriously efficient AI right there.
But wait, there’s more! Apple’s not just giving us the OpenELM models; they’re also sharing the code for CoreNet, the library they used to train these bad boys. Plus, they’ve included step-by-step recipes to recreate the model weights, which is pretty rare for a big tech company. Apple’s all about transparency with this release, aiming to “empower and enrich the open research community.”
Of course, Apple’s not naive. They know that since these models were trained on public datasets, there’s a chance they might spit out some inaccurate, harmful, or biased stuff. But hey, that’s part of the learning process.
We haven’t seen Apple integrate this cutting-edge AI into their devices yet, but rumor has it that iOS 18 might come packed with some nifty on-device AI features. And who knows, maybe they’ll even team up with Google or OpenAI to give Siri a much-needed upgrade.
The Information is Taken from Ars Technica, Times of India and FirstPost