Recently, there has been a wave of investment and acquisition in the AI ​​field. Salesforce, a world-renowned company, invested $450 million in Anthropic, while Runway successfully raised $141 million in funding. In addition, Snowflake also announced the completion of its acquisition of Neeva, while China's domestic giant Meituan acquired AI company Lightyear for 2.065 billion.

However, the most eye-catching deal is undoubtedly the acquisition of the startup MosaicML. It is understood that MosaicML was acquired by the big data giant Databricks for about US$1.3 billion. Its valuation increased sixfold in this transaction, becoming the largest acquisition in the first half of this year. With only two years of establishment and more than 60 employees, what supports MosaicML's high valuation?

1> Databricks acquires MosaicML to accelerate the democratization of generative AI technology

Databricks recently officially announced that it would acquire the generative artificial intelligence startup MosaicML for approximately US$1.3 billion (approximately RMB 9.3 billion) to provide services to build ChatGPT-like tools for enterprises.

After the acquisition, MosaicML will become part of the Databricks Lakehouse platform, and the entire MosaicML team and technology will be incorporated into Databricks, providing enterprises with a unified platform to manage data assets and enable them to use their own proprietary data to build, own and protect their own generative AI models.

MosaicML is a very young generative AI company. It was founded in San Francisco in 2021. It has only publicly disclosed one round of financing and has only 62 employees. In the last round of financing, its valuation was $220 million, which means that the valuation of the acquisition of MosaicML has directly jumped 6 times. This transaction is the largest acquisition announced in the field of generative AI so far this year. Not long ago, cloud computing giant Snowflake just announced the acquisition of another generative AI company Neeva. After several months of investment fever, a large-scale merger and acquisition wave of generative AI startups by large companies seems to be opening.

Databricks originated from UC Berkeley and participated in the development of the Apache Spark project. As a data storage and analysis giant, it is valued at $31 billion as of 2022, helping large companies such as AT&T, Shell, and Walgreens to process data. Some time ago, it just open-sourced its large model Dolly, which aims to achieve similar effects to ChatGPT with fewer parameters. After cloud computing became more popular, the "lake warehouse integration" concept proposed by Spark deeply influenced a group of big data startups. Since its establishment in 2013, Databricks has rapidly grown into the world's hottest Data Infra company. Last year, Databricks announced annual revenue of more than $1 billion, and after completing its latest round of financing in August 2021, its latest valuation reached $38 billion.

2> Advantages of MosaicML MPT series models

MosaicML's MPT series models are subclassed from the HuggingFace PretrainedModel base class and are fully compatible with the HuggingFace ecosystem. The MPT-7B model is one of MosaicML's most popular models, with billions of parameters and can handle more than 2,000 natural language processing tasks. Among them, MPT-7B's optimization layers include FlashAttention and low-precision layer norms, which can make the model 2-7 times faster than traditional training methods. The near-linear scalability of resources ensures that models with billions of parameters can be trained in a few hours instead of the days of the past. MosaicML also released a new commercially available open source large language model MPT-30B, which has 30 billion parameters and outperforms GPT-3.

Data source: MT-Bench's evaluation of MosaicML mainstream models

The advantage of the MPT series of models lies in their efficiency and low cost. The complexity of artificial intelligence models that are "trained" using large amounts of data has risen sharply. Training a model now costs at least millions of dollars, which is generally unaffordable for small and medium-sized enterprises except for large companies. MosaicML's MPT series of models allow enterprises to train their own language models at a lower cost and higher efficiency, making it easier to apply generative AI technology and achieve better business performance. Most open source language models can only handle sequences with a maximum of a few thousand tokens (see Figure 1). However, with the MosaicML platform and a single node of 8xA100-40GB, users can easily fine-tune MPT-7B to handle context lengths of up to 65k. The ability to handle this extreme context length adaptation comes from ALiBi, which is one of the key architectural choices in MPT-7B.

For example, the full text of The Great Gatsby is less than 68k tokens. In one test, the model StoryWriter read The Great Gatsby and generated an epilogue. One of the epilogues generated by the model is shown in Figure 2. StoryWriter finished reading The Great Gatsby in about 20 seconds (about 150k words per minute). Due to the longer sequence length, its "typing" speed is slower than other MPT-7B models, about 105 words per minute. Although StoryWriter was fine-tuned with a context length of 65k, ALiBi enables the model to infer longer inputs than training: 68k tokens in the case of The Great Gatsby, and up to 84k tokens in testing.

Figure 2: MPT-7B-StoryWriter-65k+ writes an epilogue for The Great Gatsby. The epilogue results from providing the full text of The Great Gatsby (about 68k tokens) as input to the model, followed by the word “epilogue”, and allowing the model to continue generating.

3> Popularization of generative AI technology

Generative AI technology is a branch of artificial intelligence that uses large amounts of data and deep learning algorithms to automatically generate content such as raw text, images, and computer code. The emergence of this technology allows people to process and analyze data more conveniently and better serve human needs. With the rapid development of big data and artificial intelligence technology, generative AI technology has been widely used in fields such as natural language processing, image recognition, and virtual reality. For example, in the field of natural language processing, GPT-4 has become one of the most popular generative AI models, which can be used for tasks such as generating articles, translating languages, and answering questions. In the field of image recognition, StyleGAN2 can generate high-quality images, which can be used in game development, film and television production, and virtual reality.

Naveen Rao, CEO of MosaicML, previously said that since 2018, the complexity of artificial intelligence models that use large amounts of data to "train" has risen sharply. Training a model now costs at least millions of dollars. Except for large companies, others Small and medium-sized enterprises generally cannot afford it. After this acquisition, the joint product of Databricks' Lakehouse platform and MosaicML technology will allow enterprises to use their own proprietary data to train and build generative AI models simply, quickly and at low cost, while allowing users to own the data. Custom AI model development is possible with control and ownership. According to Databricks, with the platform and technical support of Databricks and MosaicML, the cost of training and using LLMs for enterprises will be significantly reduced and is expected to be reduced to around a few thousand dollars. This facilitates the popularization of generative AI.

4>  The significance of Databricks’ acquisition of MosaicML

The main purpose of Databricks' acquisition of MosaicML is to accelerate the development and democratization of generative AI technology. By integrating the technologies and resources of the two companies, Databricks can better meet customer needs and provide more efficient and convenient solutions. Specifically, the acquisition will bring about the following changes:

1. More efficient large language model

After Databricks acquired MosaicML, it can integrate the MPT series models into its Lakehouse platform to provide customers with more efficient and low-cost large language models. This will help enterprises better handle natural language processing tasks and improve business efficiency and accuracy.

2. Faster model training speed

MosaicML's MPT series models have the characteristics of fast training, which will help Databricks provide faster model training services. This is especially important for companies that need to respond quickly to market demand and can help them better meet customer needs.

3. Higher Democratization

Databricks' acquisition of MosaicML also means that the democratization of generative AI technology will be further improved. MosaicML's MPT series models can make it easier for small and medium-sized enterprises to train their own language models, so that they can better apply generative AI technology and achieve better business performance. This will help promote the development and application of generative AI technology and promote the popularization and development of artificial intelligence technology.

Summarize

Generative AI applications are designed to generate raw text, images, and computer code based on natural language prompts from users. Interest in the technology has surged since AI startup OpenAI launched ChatGPT, an online generative AI chatbot, last November. "Every organization should be able to benefit from the AI ​​revolution and have more control over how their data is used. Databricks and MosaicML have an incredible opportunity to democratize AI and make Lakehouse the best place to build generative AI," said Ali Ghodsi, co-founder and CEO of Databricks.

The significance of Databricks' acquisition of MosaicML lies not only in accelerating the development and democratization of generative AI technology, but also in integrating the technologies and resources of the two companies to provide customers with more efficient and convenient solutions. With the rapid development and application of artificial intelligence technology, generative AI technology will play an increasingly important role. Databricks' acquisition of MosaicML also reflects the importance and investment of various companies in this direction. Companies like Anthropic and OpenAI license ready-made language models to companies, which then build generative AI applications on them. Driven by strong commercial demand for these models, opportunities have been created for startups like MosaicML. From the successive acquisitions of Snowflake and Databricks, we can see that large technology companies are gradually moving from independent research and development and strategic investment to mergers and acquisitions for generative AI technology.

 

References:

https://www.databricks.com/company/newsroom/press-releases/databricks-signs-definitive-agreement-acquire-mosaicml-leading-generative-ai-platform

https://mattturck.com/mosaic/

https://twitter.com/lmsysorg/status/1672077353533730817/photo/1

https://www.mosaicml.com/blog/mpt-7b#appendix-eval

https://www.mosaicml.com/blog/mpt-30b

Copyright Statement: If you need to reprint, please add our assistant on WeChat for communication. We reserve the right to pursue legal liability for any unauthorized reprint or plagiarism.

Disclaimer: The market is risky and investment should be cautious. Please strictly abide by the local laws and regulations when considering any opinions, views or conclusions in this article. The above content does not constitute any investment advice.