Snowflake Releases Arctic: A New Open Source LLM

Snowflake, a major provider of cloud data solutions, has made a significant move into the AI domain with the introduction of Arctic, its large language model (LLM) boasting 480 billion parameters.

Prior to this, Snowflake introduced only some features allowing to use and integrate with other LLMs (like OpenAI) within its tools, but didn’t went till releasing their own LLM.

Hence, this marks a major milestone for Snowflake and positions Snowflake as a new (key?) player in this competitive landscape.

Arctic: A Hybrid Architecture

I think we can safely consider Arctic as an important addition to the (long) list of LLMs, and not “just” an addition.

It is a 480B parameter model utilizing a Dense-Mixture of Experts (MoE) hybrid architecture with 128 experts, making it one of the most sophisticated models available in the open-source community. Arctic uses 17 billion active parameters to ensure both efficiency and performance.

The model integrates a 10 billion parameter dense transformer core with 128 segments of 3.66 billion parameter MoE MLPs (Multi-Layer Perceptrons). This hybrid setup allows Arctic to dynamically activate different segments of the model depending on the task at hand, optimizing both computational resources and performance.

Performance Benchmarks: Competing head-to-head with Incumbents

In performance tests, Arctic has proven good overall performance. The model competes head-to-head with other leading open-source models like Llama 3 70B, Mixtral 8x7B, and DBRX, particularly in enterprise-centric tasks such as SQL generation and complex instruction following. Arctic’s ability to generate SQL commands from plain language queries is a standout feature, scoring 79% on the Spider benchmark for SQL generation, a performance that rivals the best in its class.

It is interesting to see, that even if this is a “general” LLM, Snowflake focus is clearly Enterprise AI and more specifically, its usage within or in combination with Snowflake database solutions. The model excels in business operations areas, such as SQL generation, code automation, and sophisticated data analysis tasks. This makes it an interesting choice for companies looking to integrate advanced AI capabilities directly into their operational workflows.

Snowflake Arctic Inference efficiency — Source : Snowflake blog

(Cost-) Efficiency

One of the most interesting aspects of Arctic’s development is its cost-effectiveness. Developed in under three months for less than two million dollars, Arctic represents a new benchmark in efficient AI model training. Utilizing Amazon EC2’s P5 instances for training, Snowflake demonstrates that state-of-the-art models can be trained quickly and at a fraction of the cost traditionally associated with such endeavors.

Arctic is Open Source

Arctic is released under the permissive Apache 2.0 license, ensuring that it can be used, modified, and distributed freely. This open approach facilitates a broader adoption and encourages innovation within the community, allowing developers and researchers to build upon a cutting-edge foundation without the barriers often imposed by proprietary systems.

Go To Market and Availability

With Snowflake’s extensive customer base, which includes major enterprises like Adobe and Mastercard, Arctic could soon be at the forefront of enterprise AI applications. The model is available for serverless inference on Snowflake Cortex and is also accessible via Hugging Face, broadening its reach within the developer community.

Here is the link to get access to the code base in Hugging Face:

Snowflake/snowflake-arctic-base · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Both Base model and Instruct models are available.

You can get started right away, following the link above to Huggingface ! It is also available to Snowflake Cortex users.

Conclusion

Snowflake’s Arctic represents a major advancement, particularly for enterprise applications. It showed good overall performance, and open-source availability. As companies increasingly rely on data-driven insights and automation, especially in Information Systems leveraging databases and particularly snowflake databases, Arctic can start playing a pivotal role in increasing teams efficiency and providing a major additional enterprise tool.

References

I regularly write about Tech, AI and Data, feel free to follow me :

In Plain English ????

Thank you for being a part of the In Plain English community! Before you go: