Home / Technology / Computer vision goes ‘common sense’ with Facebook’s latest research – TechCrunch

Computer vision goes ‘common sense’ with Facebook’s latest research – TechCrunch

Machine learning can do all kinds of things, as long as you have the information to teach it. That’s not always easy, and researchers are always looking for ways to add “common sense” to AI, so you don’t have to show 500 cats before they can get it. Facebook’s newest research is a major step forward in tackling data bottlenecks.

The company’s formidable AI research department has been working for years on how to develop and scale things like advanced computer vision algorithms, and it’s constantly advancing, typically shared with the community. Research the rest One interesting development that Facebook has done in particular is what it calls “semi-supervised learning”


Basically, when you think of AI training, you think of the 500 cat pictures above – selected and labeled images. (This could mean summing up a cat, placing a box around the cat, or just saying that there is a cat in some place) so that machine learning can compile algorithms to automate the cat recognition process. Usually, if you want to make a dog or a horse, you need 500 dogs, 500 horses, etc. – it’s a scale. LinearWhich is a term you don’t want to see in technology.

Semi-supervised learning, which involves “unattended” learning, involves finding the key parts of a dataset without any labeled information. It’s not just fierce But still has a structure For example, suppose you study a thousand sentences and then show 10 more sentences with many missing words. The system might work fine in filling in the gaps as seen in the previous thousands. But that’s not always easy to do with photos and videos – neither as straightforward nor as predictable.

But Facebook researchers have shown that while it may not be easy. In fact, the DINO system (which is quite unbelievable for “disseminating knowledge without labels”) can learn to find interesting objects in videos of people, animals and things. Pretty good, without any labeled information.

The animation shows four videos and an AI interpretation of the objects in them.

Image credits: Facebook

Considering that the video is not the sequence of images to be analyzed one by one. Instead, it is a complex and related set, such as the difference between a “word set” and a “sentence” when joined at the middle and end of a video, including at the beginning. “Objects of this common shape shift from left to right.” Such information is transmitted to other knowledge, for example when an object on the right overlaps the first one, the system recognizes that it is not the same thing, just touch them in those boxes. And that knowledge can be applied to many other situations, in other words, it develops a basic sense of visual meaning and does so with little training of new objects.

This resulted in not only computer vision systems. But only effective But it still works well compared to the originally trained system, but is more correlated and explained. For example, while an AI trained with 500 dog images and 500 cat images recognizes both. But there is no idea that they are similar in any way, but DINO, although it cannot be specified specifically. But it also makes them more in common with one another than they are with cars, and that metadata and context appear more in memory, dogs and cats “closer” in the digital perception space. More than dogs and mountains You can see a few of those ideas here – see how these ideas stick together:

An animated diagram illustrates how the concepts in the machine learning model are close to each other.

Image credits: Facebook

This is useful in itself as a technique that we won’t go into here. If you are wondering, there are more details in the documentation linked in the Facebook blog post.

There is also an adjacent research project, a training method called PAWS, that eliminates the need for labeled information.PAWS combines some concepts of semi-supervised learning with The conventional supervised approach basically enhances training by letting it learn from both labeled and unlabeled data.

Of course, Facebook needs good and quick visual analysis for products that relate to the images that its users face. But these general advances in the computer vision world will be welcomed by the developer community for other purposes.

Source link