技術探索

CityEyes- An Embarrassingly Parallel Cloud Service for Smart Surveillance Video Applications

ITRI ICT Lab / Tse-Shih Chen

ITRI “CityEyes” is a unique Cloud-based (IaaS/PaaS/SaaS) video analytical service that has been deployed to and used by local police in Taiwan. CityEyes leverages existing police surveillance video infrastructure, adding Cloud Computing and DNN features to enhance its analytical scalability and flexibility of videos. Police officers can skip the time-consuming manual work and turn it into automatic processes to increase work efficiency. CityEyes PaaS provides analysis for video applications. Videos can be processed in parallel with a designated analysis workflow. Moreover, the Video Analysis Engine (VAE) that is deployed on PaaS is programmed by traditional image processing algorithms or DNN modes. All in all, CityEyes’ Cloud-native deployment and intuitive interface provides extensible capabilities including the most important massive-scale video analysis that increase efficiency and accuracy of the whole process.

Embarrassingly parallel

City Eyes is unique due to its use of a Platform-as-a-Service (PaaS) architecture. This PaaS consists of three major components. Similar as MapReduce, the proposed PaaS system is supported by a computation server pool, which is composed by a number of scalable virtual machines that perform the video analysis tasks. The job queue component manage and schedule the assigned jobs in the middle of the system. The front-end interface of our PaaS offers many application programming interfaces (APIs) for Cloud service application software developers. Developers can call each corresponding API while simultaneously completing various dispatches and retrieving results. Therefore, thanks to the coordination of parallel backend tasks on PaaS, users can now dispatch a large number of tasks in parallel and reduce processing time significantly.

For instance, to detect a moving object in a video recording, portions of the recording with no movement are deleted, thereby significantly reduce viewing time required. With a PaaS platform, 100 hours of video recording can be divided into 100 sections and 100 analysis engines will start working simultaneously. Then, it will finish analyzing a 100-hour video within 10 minutes for a hundredfold increase in video processing speed. In short, a PaaS system for parallel video analysis will computationally partitions intensive video analysis tasks into many independent subtasks and process them at the same time.

Service scenarios

Nowadays, digital surveillance systems are installed everywhere and continuously generate huge amount of video data. The video data is used by governments for crime prevention and used as evidence of accidents or crimes. Very often, human inspection of the recorded video is still required for detection of threats even though automatic techniques exist to facilitate the detection of potential security issues, it remains a highly demanding task to process these data. In real life, users are hindered by different brands of surveillance cameras that produces different types of video format and with many kinds of resolution or system settings.

For instance, in the New Taipei City Police Department, CityEyes integrated 6 major different types of NVR/DVRs systems and manage over 30,000 surveillance cameras. One can imagine that the sources of video is diverse among such large camera networks. Therefore, the greatest efforts when implementing City Eyes had been to develop various video crawlers to retrieve videos from NVR /DVRs of different vendors. Fortunately, under the framework of City Eyes, it only requires minor modifications, e.g. reconfiguration of workflows, to migrate the applications between different physical environments. Our service can crawl video files in parallel and transfer proprietary video file formats into standard video format. Officers can then use the video files for various types of analysis.

Fig. 1. PaaS architecture.Fig. 1. PaaS architecture.

The CityEyes PaaS provides the analytical power for video analysis applications. Video analysis can be processed with a user-specified workflow. Result collection and status report are also handled by the PaaS system. Fig. 1 illustrates the system architecture of CityEyes PaaS and the details of the main components and workflow are explained as follows:

– PaaS Controller (PC): PaaS Controller is the intermediate interface for receiving requests from front-end applications and sending results. The SaaS applications send their video analyzation jobs with relevant inputs, such as SaaS ID, engine ID, engine workflow, engine parameters and job priority depending on the selected video analysis engines and related actions. For instance, in Fig. 1, a video analysis task may consist of three stages corresponding to different engines (A-B-C). The progress of the whole workflow is monitored and logged by the PaaS Controller. The PaaS Controller also monitors work status and generates reports, which can then be queried by front-end applications when needed.

– Computation Unit (CU): Computation unit is a virtual machine/container where a variety of video analysis engines are installed and video analysis tasks are carried out. CUs may be of different types of operating systems and resources to host and meet the needs of different kinds of engines. Orchestrator, which is an administrative program running on each CU, is responsible for fetching jobs from job queues (described below), launching engines specified in the workflow, monitoring job progress, collecting results and error handling.

– Job Queue (JQ): The jobs received from applications with different configurations of engine workflows will be put into Job Queue after a validation process. In order to serve the jobs with different priorities, we designed a probability method for JQ such that higher priority jobs have a higher ratio to be picked up by the Computation Unit and vice versa. A job of higher priority will be put into the queue with higher process ratio and computation unit will pick up jobs by the order of the queues sorted by defined probability. Moreover, if the queue which has been picked by a computation unit is empty, then the next priority queue will be checked to avoid the starvation problem.

– Video Analysis Engine (VAE) and other supporting/DNN engines: The proposed PaaS system does not impose any restrictions on the types of supported engines as long as they can be launched through command line interface (CLI) by orchestrator. A VAE may also communicate with orchestrator by writing messages to standard output or error streams, which are compliant with a pre-defined format. For example, a license plate recognition engine can notify orchestrator of the current progress or the recognized plate number by printing messages to stdout. In addition to VAEs, a video analysis task may also need other supporting engines, such as FTP client, to accomplish video retrieval and fulfill the whole workflow.

The PaaS is designed with scalability and flexibility. For instance, users who likes to speedup analytic speed can increase computation nodes without interrupt existing work. Besides, any VAE or DNN engines can be easily integrated into PaaS to perform various kinds of image or video object recognition or segmentations applications.

The CityEyes applications

Based on the CityEyes PaaS framework, it has been made very easy to develop various smart applications on top. We hereby describe briefly the service architecture and applications developed under such a programming paradigm.

(a)	The job processing flow(a) The job processing flow
(b)  Camera anomaly detection service(b) Camera anomaly detection service
(c) Vehicle detection and tracking(c) Vehicle detection and tracking

Fig. 2. The CityEyes Services

Fig. 2 (a) shows the job processing flow of CityEyes.
Firstly, the source video file from NVR/DVRs are crawled by corresponding crawler engines. Users can dispatch a large number of crawler jobs by setting a scope on map. The CityEyes application than send requests to PaaS to bring up crawler engines. After the video file has been crawled, CityEyes will transfer the file format and process VAE steps for applications. The Fig.2 (b) and Fig.2 (c) are showing “Camera anomaly detection” and “Vehicle detection and tracking” respectively. The detailed service functions are describe as following:

Camera anomaly detection: Maintenance of large surveillance camera network is important in order to ensure each surveillance camera is of good image quality and correct field of view. To minimize the efforts of system administrators, we have built a map-based web application which enables users to bring up live video feeds easily as
well as locating the malfunctioning surveillance cameras with broken connection. In addition to hardware failure, we also adopted the image-based approach to automatically detect camera anomaly events such as spray painting, blockage and defocusing.

Vehicle detection and tracking: In this service, we combine automatic license plate recognition technology and geographic information of street surveillance cameras to recover the trajectory of a vehicle given its license plate number. Ideally, such system may greatly simplify and accelerate the process of investigating certain security issues, e.g. searching for stolen cars. However, its success is also heavily dependent on factors such as the density and image quality of the cameras.

Video summarization: Due to the rapid development of video capturing technology, surveillance cameras are installed everywhere and generate a large number of videos continuously. Relying on human inspection for threat detection is becoming not only impractical but also intractable. Video summarization techniques aim to deal with this issue by eliminating irrelevant video contents before human inspection. We apply the background subtraction technique to delete still images containing no moving objects and produce a compact version of source videos without loss of salient information.

To date, City Eyes has been deployed to support stolen vehicle tracking, tailgating monitoring, and suspicious activities monitoring. It turns activities that can be very resource consuming to public safety and law enforcement agencies into accurate and automatic action-able information for field officers. With the latest state-of-art human machine interface (HMI) best practices, City Eyes ensures that field users can operate fast and accurately and retrieve relevant information for their needs

Future prospects

CityEyes is built upon an open analytical platform that can be extended to include additional functions. For example, additional video analytic or AI/DNN engines (Fig.3) can be used to identify object colors, shapes or motions. It is a Cloud-native service that leverages the inherent capabilities of Cloud Computing. As a result, it allows for rapid deployment and scale up for sudden demand spikes to support the needs of public safety and law enforcement agencies. All in all, CityEyes’ Cloud-native deployment and intuitive interface provides extensible capabilities including the most important massive-scale video analysis that increase efficiency and accuracy of the whole process.

Fig.3. CityEyes with DNN applicationsFig.3. CityEyes with DNN applications