To receive industry-leading AI updates and exclusive content, sign up for our daily and weekly newsletters. Learn more
Hugging Face is based in Seattle today. Zhubis a collaborative development platform founded by ex-Apple researchers to help machine learning teams handle large datasets and models more efficiently.
The exact value of the deal was not disclosed, but CEO Clem Delange said in an interview: Forbes This will be the company’s largest acquisition to date.
The HF team plans to integrate XetHub’s technology into their platform and upgrade their storage backend to enable developers to host larger models and datasets than they can today with minimal effort.
“The XetHub team will help us unlock the next five years of growth in HF datasets and models by switching to our own, superior version of LFS as the storage backend for the hub’s repository,” wrote the company’s CTO, Julien Chaumond. Blog Post.
What does XetHub bring to Hugging Face?
Founded in 2021 by Yucheng Low, Ajit Banerjee and Rajat Arya, who previously worked on Apple’s in-house ML infrastructure, XetHub has made a name for itself by providing businesses with a platform to explore, understand and manipulate large-scale models and datasets.
The service enables Git-like version control for repositories up to TB in size, enabling teams to track changes, collaborate, and maintain reproducibility in their ML workflows.
Over the last three years, XetHub has garnered a large customer base, including leading companies like Tableau and Gather AI, due to its ability to address complex scalability needs as tools, files and artifacts increase. It has improved storage and transfer processes using advanced techniques such as content-defined chunking, deduplication, instant repository mounting and file streaming.
As a result of this acquisition, the XetHub platform will no longer exist and its data and model processing capabilities will be Hug Face HubUpgrade your model and dataset sharing platform with a more optimized storage and versioning backend.
On the storage side, HF Hub currently uses Git LFS (Large File Storage) as a backend. It was released in 2020, but Chaumond says they knew for a while that at some point the storage system would not be sufficient, given the ever-growing amount of large files in the AI ​​ecosystem. It was a good time to start. But the company needed an upgrade, and XetHub will provide it.
Currently, XetHub platform support With individual files exceeding 1 TB and total repository sizes well over 100 TB, this represents a significant upgrade over Git LFS, which only supported file sizes up to 5 GB and repositories of 10 GB, enabling HF Hub to host much larger datasets, models, and files than it can today.
This, plus the extra storage and transfer features of the XetHub, makes this package even more attractive.
For example, the platform’s content-defined chunking and deduplication capabilities allow users to upload selected chunks of new rows when a dataset is updated, rather than uploading the entire fileset again (which would be very time-consuming). The same is true for the model repository.
“As the field moves towards trillion-parameter models in the coming months (thanks to Maxime Labonne for the new BigLlama-3.1-1T), we expect this new technology to open up new scale for both communities and enterprises,” said the CTO, adding that the two companies will work closely together to launch solutions to enable teams to collaborate on HF Hub assets and track their evolution.
Currently, the Hugging Face Hub hosts 1.3 million models, 450,000 datasets, and 680,000 spaces, for a total LFS footprint of 12PB.
It will be interesting to see how this number grows as the storage backend is enhanced to enable support for larger models and datasets. The timeline for the integration and release of other supporting features is unknown at this stage.