Computer Vision and Machine Learning

Computer Vision can be referred to as a part of Artificial intelligence (AI) that trains computers to interpret and understand the visual world. Computer vision and Machine learning (ML) that learns and improves on data have become closely related whereas computer vision uses ML techniques, and in fact, some ML techniques are specifically developed for computer vision. Therefore, it is apparent that ML has broadened the boundaries of computer vision by offering effective methods for acquisition, image processing, analysing the and understanding digital images and videos. On the other hand, deep learning which is a subset of ML has grown in popularity for its capability to achieve highly accurate results for challenging computer vision problems such as object detection, image classification, and face recognition.

Human-Assisted AI-ML

Although AI and ML undoubtedly play a significant role in today’s data-driven analytical world, it is challenging to build standalone AI-ML systems that achieve the required accuracy. AI and ML are not intelligent on their own but simulate intelligence that they have learned. All or most of the AI systems require some element of human intervention for their successful functioning and only as good as the data and training methods employed [1]. The quality of the training data is the most impacting factor for the accuracy of an AI model. However, it is challenging to obtain a training data set that is representative of the entire population. Human-assisted AI can be used to help AI models get smarter over time by the feedback received from human expertise. Humans are continuously interacting with an AI-driven service and provide information that creates more accurate AI models and hence this is an iterative and cyclic process.

Street Sign Detection

Using AI-ML approaches to road feature detection is growing in popularity due to the important use cases  - - note that road feature is any object or point of interest that can be identified from a data set of geo-referenced photos or videos collected by a Mobile Mapping System. They include inventory management, damage assessment, change and inconsistency detection and hence can benefit several entities such as local governments and private organizations.  

In this case study, Cognitivo worked with a local council in Melbourne, Victoria, Australia to detect and classify parking signs (a subcategory of street signs) as part of an inventory management use case. Parking sign detection is known to be a relatively difficult problem compared to other street sign detections. The majority of research [2,3] that involves street sign detection exclude parking sign detection while there is only a limited parking sign data sets available publicly. Therefore, there is little knowledge available on how an AI-ML model would perform in parking sign detection. We employ computer vision and machine learning techniques to detect and classify street signs and developed an interface to provide human expertise on AI-ML detection to iteratively improve our models.


Whittlesea council provided a dataset that consists of approximately 1.5 million geo-referenced images which extensively covered the council streets. The council requires an asset database of geo-referenced parking signs. We build an AI-ML model that detects parking signs in the images and classifies them to types and subtypes and a geo-spatial visualization tool to further curate and interact with data as shown in Fig. 2.

Figure 2
Figure 2

3. Methodology

An overview of the method is shown in Fig. 3 below. Firstly, we develop an object detection model that identifyies the parking signs in images. Secondly the output of the object detection model is fed to the object classification model to further divide them to parking sign types and sub types.

Figure 3
Figure 3

Training Data preparation and Labeling

The initial data set of 1.5million images is reduced to 500K images by removing the unusable images that include images of pavements. A larger proportion of the project was consumed to create the training and testing dataset from the scratch. We have labelled 25K images (a yield of 5% from the entire usable data set) for training purposes.

Parking Sign Detection

We develop an object detection model to detect parking signs in the images. Object detection refers to a computer vision technique that deals with detecting instances of semantic objects of a given class in digital photographs and videos. Although there is well- researched domain of object detection such as face or pedestrian detection exists, the existing work offers limited solutions in the detection of street signs, especially parking signs. We experimented with several object detection algorithm described below.

  • Region-Based Convolutional Neural Networks (R-CNNs)
  • RCCNs are a family of techniques for addressing object localization and recognition tasks.
  • You Only Look Once (YOLO)
  • YOLO is a second family of techniques for object detection designed for speed and real-time use.

From the results of experimenting with RCNN and Yolo, we were impressed with the performance and detection with YOLO. We apply a single neural network YOLO to the full image. This YOLO network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

The use cases for this model development are quite broad. We have developed YOLO in darknet for parking sign detection as shown in Fig. 4, it can be trained and developed with an object of given interest. Yolo also gives a base foundation for various video analytics. In our use case of detecting parking signs, we trained a model to detect parking signs of type, Parking, No -Stopping, Clearway, and Zone. The model output the bounding box for each detected parking sign and it is fed as the input to the classification model next.

Figure 4
Figure 4

Parking Sign Classification

Figure 5
Figure 5

The set of detected parking signs from the previous object detection phase is used as the input to the image classification model. Image classification is a supervised ML problem where a set of target classes are defined, and a model is trained from a set of labelled images to classify new data into trained classes.  In our work, we classify signs into different parking sign categories: Parking, No-Stopping, Clearway, and Zone as shown in Fig. 5.

Removing Duplicated Signs

Mobile Mapping System that was employed for data collection take 4 images at each 10m distance which is referred to as Front Left, Front Right, Rear Left, and Rear Right. It is highly likely that the same sign is captured multiple times from Front and Rear cameras. Additionally, depending on camera angles and visibility the same sign can be captured at multiple locations. By filtering out either the signs captured by Front cameras or Rear cameras, we can reduce the possible duplications to a certain extent. However, it is challenging to perform duplicate removal within the Front or Rear set and it is out of the scope of our work at this stage.

4. Results

The training phase included the 10k parking signs labelled from 25k images. The evaluation of the parking sign detection and classification models is explained now. Our parking sign detection model produced only a less than 15% false detections given an Intersection of Union (IoU) threshold of 0.5. IoU is the ratio of area of overlap to area of union between predicted and ground-truth bounding box. The accuracy of our classifier in classifying the detected parking signs (from object detection model) to the 4 sign categories of our interest, Parking, No-Stopping, Zone and Clearway was on average 98%.

Some of the predictions made by our models are shown below in Fig. 6.

Figure 6
Figure 6

5. Challenges

One of the challenges we faced is the limited representation of the entire data set with a relatively smaller training set. Another challenge we faced is the quality of the images to perform a sign recognition where a sign occupies only a small proportion of a given image. The zoomed images appear to be noisy and images are largely impacted by glare,  different lighting conditions, and angles while there exist distorted shaped signs (Fig 7). Moreover, the state-of-the-art street sign detection use more closer images than our data set (Fig 8). All the above challenges lead to reduced accuracy of the parking sign detection and classification models.

Figure 7
Figure 7
Figure 8
Figure 8

5. Human Assisted AI with Geo-Spatial AI tool

Earlier we highlighted that due to the nature of the data collection mechanism (Mobile Mapping System that was employed for data collection take 4 images at each 10m distance), it is highly likely that the same sign is captured multiple times in images and hence there can be duplicated detections. We have developed a geospatial application to review the detected signs where the user can manually add, modify, or delete signs prior to exporting the signs a road feature database. In other words, detections are curated through the geo-spatial application and our models can be retrained based on an extended and more accurate training set.

Therefore, we can address one of the challenges of not employing a training data set that is representative of the entire population. We make use of the human intervention in an AI-driven user interface and allow AI models to get smarter and in turn accurate over time by the feedback received from human expertise.

@Cognitvo, as part of this project, we have developed an image labelling and curation platform that allows users to iteratively build training data (through labelling) and curate predicted outputs. This user interface is a crucial step in linking human activity into the AI/ML value chain. (Refer Fig 10).

Figure 9
Figure 9
Figure 10
Figure 10

6. Conclusion

In this study, we performed street sign identification using Human-assisted AI-ML with less than 15% false detections and 98% accuracy in classifying different parking sign types. We have further developed an AI driven geo-spatial AI tool to integrate human expertise to extend the training data set and obtain improved model performance in the future.

7. References


Blog by Iresha Pasquel Mohottige

recent posts