Meta AI launches the Clip Anything Model (SAM): a new AI model that can crop any object in a photo/video with a single click

Wireless

Computer vision relies heavily on segmentation, which is the process of determining which pixels in an image represent a particular object for uses ranging from analyzing scientific images to creating artistic images. However, building an accurate segmentation model for a given task usually requires the assistance of technical experts with access to AI training infrastructure and large amounts of carefully annotated data in the field.

Recent Meta AI research showcases their project called “Segment Anything,” an attempt to “democratize segmentation” by providing a new image segmentation task, data set, and model. The Segment Anything Model (SAM) and the Segment Anything 1-Billion Mask (SA-1B) dataset, the largest segmentation dataset ever.

There were two main categories of strategies for dealing with fragmentation issues. The first interactive segmentation can segment any object, but it needs a human operator to iteratively refine the mask. However, the automatic segmentation allowed segmentation of predefined object classes. However, it required a large number of manually annotated objects, as well as computing resources and technical expertise, to train the segmentation model. Neither method offers a universally automated and foolproof method of hashing.

🚀 Join the fastest ML Subreddit community

SAM includes both of these two broader categories of methods. It is a unified model that effortlessly performs interactive and automated segmentation tasks. Due to its flexible and fast interface, the model can be used for various segmentation tasks simply by appropriate vector engineering. In addition, SAM can generalize to new types of objects and images because it is trained on a diverse, high-quality data set of more than a billion masks. In general, practitioners will not have to collect their own segmentation data and adjust a model for their use case because of this ability to generalize.

These features allow SAM to move to different domains and perform different tasks. Some of SAM’s capabilities are as follows:

  1. SAM facilitates object segmentation with a single mouse click or through interactive selection of points for inclusion and exclusion. The border box can also be used as a form vector.
  2. For practical fragmentation problems, the ability of SAM to generate competing valid masks in the face of object fuzziness is an important advantage.
  3. SAM can instantly detect and hide any objects in the image.
  4. After pre-calculating the image embedding, SAM can immediately generate a hash mask for any prompt, enabling real-time interaction with the model.

The team needed a large and diverse data set to train the model. SAM was used to collect information. In particular, SAM was used by annotators to perform interactive image annotation, and the resulting data were subsequently used to refine and optimize SAM. This loop has been run multiple times to improve the model and data.

New hash masks can be collected at lightning speed using SAM. The tool the team uses makes interactive mask annotations quick and easy, taking just about 14 seconds. This model is 6.5 times faster than COCO’s fully manual polygon-based mask annotation and 2 times faster than the largest previous data annotation effort, which was also model-powered compared to previous large-scale data collection efforts.

The provided mask dataset of 1 billion masks cannot be built using only interactively annotated masks. As a result, the researchers developed a data engine to use when collecting data for the SA-1B. There are three “gears” in this data “engine”. The first mode of operation of the model is to help human commenters. In next gear, fully automatic annotations are combined with human assistance to scale grouped masks. Fully automatic mask creation supports the scalability of the dataset.

The final dataset contains more than 11 million images with privacy protection licenses and 1.1 billion hash masks. Human evaluation studies have confirmed that the masks in SA-1B are of high quality and diversity and are comparable in quality to masks from earlier, smaller, manually annotated datasets. SA-1B has 400 times as many masks as any hash dataset in existence.

The researchers trained the SAM to provide an accurate segmentation mask in response to various inputs, including foreground/background points, a rough box or mask, free text, etc., and observed that the pre-training task and interactive data collection imposed certain limitations on the model design. For commenters to use SAM effectively during annotation, the model must be running in real time on the web browser’s CPU.

The lightweight encoder can instantly convert any prompt into an embed vector, while the image encoder creates a one-time embed of the image. A lightweight decoder is then used to combine the data from these two sources into a hash mask prediction. Once the image embedding is calculated, SAM can respond to any web browser query with a clip in less than 50 milliseconds.

SAM has the potential to support future applications in a variety of fields that require the location and segmentation of any object in any given image. For example, understanding the visual and textual content of a web page is just one example of how SAM can be integrated into larger AI systems for a general multimedia understanding of the world.


scan the Paper, Presentation, Blog And github. All credit for this research goes to the researchers on this project. Also, don’t forget to join 18k+ML Sub RedditAnd discord channelAnd Email newsletterwhere we share the latest AI research news, cool AI projects, and more.


Tanushree Shenwai is a consulting trainee at MarktechPost. She is currently pursuing her Bachelor of Technology from Indian Institute of Technology (IIT), Bhubaneswar. She is passionate about data science and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new developments in technologies and their real-world applications.


🔥 MUST READ – What is an AI Hallucination? What’s going wrong with AI chatbots? How do you discover the presence of artificial intelligence hallucinations?

Source link

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.