计算机视觉学习 2
1.What are some real-world use cases of generative modeling?
There is a case named‘Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks’.
In this case, the researchers present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Their goal is to learn a mapping G: X → Y, such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, they couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. CycleGAN is fundamentally hallucinating part of the content it creates. Its outputs are predictions of "what might it look like if ..." and the predictions, thought plausible, may largely differ from the ground truth. CycleGAN should only be used with great care and calibration in domains where critical decisions are to be taken based on its output.
Using CycleGAN, the researcher map Monet paintings to landscape photographs from Flickr、transfer input images into artistic styles of Monet, Van Gogh, Ukiyo-e, and Cezanne、object transfiguration between horses and zebras、translate between driving scenes in different style、transfer seasons of Yosemite in the Flickr photos.
The model does not work well when a test image looks unusual compared to training images. Handling more varied and extreme transformations, especially geometric changes, is an important problem for future work of CycleGAN.
2.Explain how game-playing machines can use computer Vision to win.
According to my point of view, the involvement of calculator vision in the game machine to win the game mainly plays the application of computer vision in the detection of image problems.
As an example, the software (about 50KB of memory, spread over 4 files) uses computer vision algorithms for image analysis and perspective transformation, convolutional neural networks for numerical classification, and the algorithm solves a logic problem in a few seconds, which takes us at least a few minutes.
The project is called "LogicGamesSolver". The software is written in Python, using opencv4.01 and the Tensoflow 2.3.0 library. It can solve three games: Sudoku, Star Wars and Skyscraper. The first step is to detect the puzzles in the input image. The idea is to find the largest contour, i.e. the largest polygon in the image. This step is easier for the software if the scene is clean, with as little noise and objects as possible. The contours are found using the parameter cv2.RETR_EXTERNALfindContours method, considering only the extreme external contours; we then sort the contours according to their area and take the first element.Once the puzzle is found, we take the four vertices and perform a perspective transformation and use warpeperspective on the image of the polygon. Once we have a flat image of the puzzle, we can analyze it to get the information already provided to solve the game. Sudoku and skyscraper puzzles require the consideration of numbers. To understand which numbers are in the figure, the software uses a convolutional neural network to classify handwritten numbers and is trained with the well-known MNIST dataset: 60,000 element, 28×28 pixel grayscale images of individual numbers in handwriting between 0 and 9. Tensorflow provides a way to convert the images into an array suitable for CNN, while providing an exclude_classes array.The final step is to solve the puzzle as if it were a constrained problem.