OpenCV is an open source computer vision library that is extensively used by the research community as well as some commercial ventures. Applications are quite varied and includes stitching multiple images to create a panoramic view; video stabilization; super resolution imaging; object detection, tracking, recognition; motion from video (kinect); and machine learning applications. The underlying objects of interest are images. Hence, there are numerous tools and support for working with images (and frames from videos).
This programming assignment will provide some hands on experience with tracking movement of people. This is a well studied problem, but will be even more important as the shelter in place restrictions around the world are starting to slowly be lifted, we are still being advised to maintain social distancing {SD} (and wear face masks {FM} when social distancing is difficult). OpenCV has tools that can potentially address both of these {SD and FM}. However, as with real world (vs academic) problems, solutions are not always perfect -- please keep this in mind -- this may be one of the few academic assignments you will get that is quite open ended. For program 4, we will focus on SD (we initially considered FM, but that may be too hard). There will be some minimum requirement as well as plenty of room to explore further.
You are provided with two start up code: method 1 and method 2 to detect moving objects from a video source. Method 1 (motion differencing) is an image processing approach that detects motion via differencing of adjacent video frames. Anything that moves are detected -- these can be people (our target of interest), cars, birds, etc. Method 2 (cascade classifier) is based on feature extraction where output from multiple simple feature detectors are combined in a cascade (think of this as an pipeline or assembly line processing where output from earlier stages are fed to later stages) to classify whether the feature is interesting or not. This method is based on the Viola-Jones cascade classifier. The body classifier that's included in the OpenCV distribution has 38 cascades. To process a frame of a video, method 2 uses the classifier to search the entire frame for bodies. Both methods were developed some time ago and have since been improved upon by more recent works (e.g. https://www.youtube.com/watch?v=9vp0a0u3tns, https://www.youtube.com/watch?v=tq0BgncuMhs). Even then, they provide plausible results, and are a good basis for getting an appreciation of the challenges.
Your first task is to combined these 2 methods. Specifically, we will use method 1 to identify subregions in a frame (based on the resulting bounding boxes) that are then fed to method 2. In effect, for each frame, method 2 will search a list of bounding boxes (of moving objects) to see if what's moving is a person or persons. This should help speed up the processing time for method2. The caveat though is that we are implicitly ignoring people that are stationary across frames.
The output from your combined method will also be a list of bounding boxes that contain one or more person. Your next task is to count how many people are in each of the resulting bounding boxes. There are multiple strategies for going about this task. A simple, though sensitive to noise, approach is to count how many full body boxes are within the bounding box. Think about other more robust methods for counting.
Identify if two or more people are too close to each other. This is a hard problem to solve in general (depends a lot on camera attributes, etc.), so we will need to make simplifying assumptions. For this assignment, we will assume that: (a) if a bounding box contain a single person, then that person is observing SD, and (b) if a bounding box contain more than 1 person, then those people are too close to each other. To flag the latter case, you will change the color of the bounding box to red.
Related to this task (i.e. part of the distance check rubric), is simple contact tracing. Here, a contact happens as soon as SD is violated. Therefore, each person in the red boxes will be bounded by red boxes, even if they later follow SD. As mentioned in the press COVID19 can be transmitted via droplets across people in proximity. Since anyone (unless they're immune) is a potential carrier, we'd like to flag everyone who violated SD. Think about what python structure can best represent and handle such situations. Note -- we are just doing tracking and not doing recognition (i.e. while technology is available to identify individuals, we are not interested in that). So, what we are doing is similar to the Dot simulation from prog2, except we are working with video instead of dots.
Finally, we want to obtain a visual representation of trajectory of different people in a scene. Think about people as particles in your firework program (prog3). We want to draw a trail of the person's trajectory up to that point in the video. The output here is similar to the one found in: https://www.youtube.com/watch?v=tq0BgncuMhs where the center of the bounding box is used as the location of the person. Instead of drawing the trail from a short duration in the past, draw the trail from the start of the video up to that instant of the video. The color of the trail may change among red (too close), yellow (maybe too close), and green (SD) to reflect the different states along the path. If you want to assign a different color to each person similar to the video above, just be sure to not use the 3 reserved colors.
Rubric:
Combined method: 25 Counting people: 25 Distance checks: 25 Tracking people: 25 Bonus: 5 (submit by 11:59pm, Saturday, June 6, 2020)
Mark Allamanno : A - Ch Michelle Kwong : Co - I Singaravelavan Rajesh : J - M Andre Navid Assadi : N - Sh Kyle Nagao Oda : Si - Z
Last modified Wednesday, 27-May-2020 21:18:09 PDT.