Home News BLOOM: Inside the radical new project to democratize AI

BLOOM: Inside the radical new project to democratize AI


But Meta’s model is only available on request, and it has a license that restricts its use to research purposes. Embrace the face a step further. Minutes of meetings detailing their work over the past year are uploaded online, and anyone can download the model for free and use it in research or building commercial applications.

A big focus of BigScience is to embed ethical considerations into the model from the start, rather than treating it as an afterthought. The LLM was trained on a large amount of data collected over the Internet. This can be problematic because these datasets contain a lot of personal information and often reflect dangerous biases. The group developed a data governance structure specifically for LLM that should make it clearer what data is being used and who it belongs to, and it has sourced different datasets from around the world that are not readily available online.

The group has also introduced a new Responsible AI license, similar to a terms of service agreement. It is designed to prevent the use of BLOOM in high-risk areas such as law enforcement or healthcare, or to harm, deceive, exploit, or impersonate another person. The Danish contractor, an AI researcher who volunteered for the project and co-created the license, said the license was an experiment in self-regulating LLMs before the law caught up. But in the end, nothing stops anyone from abusing BLOOM.

Giada Pistilli, an ethicist at Hugging Face, which drafted BLOOM’s ethics charter, said the project had its own code of ethics from the start to serve as a guiding principle for model development. For example, it emphasizes recruiting volunteers from diverse backgrounds and locations, ensuring that outsiders can easily replicate the project’s findings and publish its results publicly.

all boarding

This philosophy translates into one major difference between BLOOM and other LLMs available today: the large number of human languages ​​the model can understand. It can handle 46 of these languages, including French, Vietnamese, Mandarin, Indonesian, Catalan, 13 Indian languages ​​such as Hindi, and 20 African languages. Over 30% of the training data is in English. The model can also understand 13 programming languages.

This is very unusual in the world of large language models where English is dominant. This is another result that the LLM is constructed by scraping data from the internet: English is the most commonly spoken language online.

The reason BLOOM is able to improve this is that the team has brought together volunteers from around the world to build suitable datasets in other languages, even if those languages ​​are not well represented online. For example, Hugging Face has organized workshops with African AI researchers to try to find datasets, such as records from local authorities or universities, that can be used to train African language models, says Hugging Face intern and Masakhane researcher Chris Emezue, a An organization dedicated to natural language processing in African languages.

Source link

Previous articleMushroom E-Commerce Brand Shroomboom Celebrates Launch With Celebrity Guests – vegconomist
Next articleInteresting facts you may not know about the Bollywood actor


Please enter your comment!
Please enter your name here