Surveillance cameras are subject to an identity problem due to the inherent tension between privacy and utility. These tiny, powerful devices are popping up everywhere. Machine learning tools have automated video content analysis on a large scale. However, with increasing surveillance, there are no legal enforceable rules to limit privacy invasions.
Security cameras are capable of a lot. They’re smarter and more skilled than the ghosts of grainy photos they were, often referred to as “hero tools” in crime media. “See that blurry blue blob in right-hand corner of densely populated corner? We got him!” Video surveillance can be used to help healthcare officials determine the proportion of people wearing masks, monitor traffic flow and allow transportation departments to track pedestrians and vehicles. It also provides businesses with better insight into their customers’ shopping habits. Privacy has been viewed as a secondary concern.
Retrofitting video with black boxes or blurred faces is the status quo. Analysts are unable to ask genuine questions (e.g. Are masks worn by people?). It doesn’t always work. The system might miss certain faces, leaving them unblurred for all to see. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, (CSAIL), were dissatisfied by the status quo and came up with a way to guarantee privacy when surveillance camera video is captured. The system, called “Privid”, allows analysts to submit video data queries and adds a bit of noise (extra information) to the final result to ensure that no individual can be identified. The system is based on a formal definition privacy, “differential privacy”, which allows for aggregate statistics about private data and does not reveal any personally identifiable information.
Analysts would normally have full access to all the video and be able to make whatever decisions they like, but Privid ensures that the video isn’t a buffet. While honest analysts have access to all the information they need for their work, it is restricted enough so that malicious analysts cannot access too much. Instead of running all the code on the entire video at once, Privid splits the video into smaller pieces and runs the processing code for each one. Instead of getting results from each segment, the segments are combined and added noise. You can also see the error bound for your result, which could be a 2-percent error margin given the noise data.
The code could output, for example, the number of people seen in each video segment. The aggregation could be the “sum”, to count all people covered with face covers, or the “average”, to estimate the density among crowds.
Privid lets analysts use deep neural networks, which are used in video analytics. Analysts can ask questions that Privid’s designers didn’t anticipate. Privid performed as close to a non-private system in capturing and querying a range of videos and queries.
“We are at a point right now when cameras are almost everywhere. Frank Cangialosi from MIT CSAIL, a PhD student, said that if there were cameras at every corner and every place you go, it would be possible for someone to process all those videos and create a timeline of where and when a person has been. People are already concerned about GPS location privacy — video data could be collected at all locations and capture not only your location history but also your moods, behaviors, as well as other information.
Privid introduces a new concept of “duration-based Privacy”, which separates the definition and enforcement of privacy. With obfuscation the enforcement mechanism must do some work to locate the people it is supposed to protect. This mechanism doesn’t require you to specify everything and you can’t hide more information than necessary.
Let’s suppose we have a video that overlooks a street. Alice and Bob claim that they want to count how many people pass each hour. They submit a video processing module, and request a sum aggregation.
The city planning department is the first analyst. They hope to use this data to analyze footfall patterns and plan sidewalks in the city. The model counts people and outputs these numbers for each video chunk.
The malicious analyst is the other. They want to be able to identify Charlie every time he passes the camera. Their model does not look for Charlie’s face and outputs a large amount if Charlie is present (i.e. the “signal” that they are trying to extract), but zero otherwise. They hope that the sum will not be zero if Charlie is present.
These queries are identical from Privid’s point of view. It is difficult to determine the internal workings of their models or the purpose for which the analyst intends to use the data. Here’s where the noise comes in. Privid executes both queries and adds the exact same amount of noise to each. This noise won’t have any impact on the usefulness of the results in the first case because Alice was counting everyone.