Deep Learning for Enrichment of Vector Spatial Databases

Spatial analysis and pattern recognition with vector spatial data is particularly useful to enrich raw data. In road networks, for instance, there are many patterns and structures that are implicit with only road line features, among which highway interchange appeared very complex to recognize with vector-based techniques. The goal is to find the roads that belong to an interchange, such as the slip roads and the highway roads connected to the slip roads. To go further than state-of-the-art vector-based techniques, this article proposes to use raster-based deep learning techniques to recognize highway interchanges. The contribution of this work is to study how to optimally convert vector data into small images suitable for state-of-the-art deep learning models. Image classification with a convolutional neural network (i.e., is there an interchange in this image or not?) and image segmentation with a u-net (i.e., find the pixels that cover the interchange) are experimented and give better results than existing vector-based techniques in this specific use case (99.5% against 74%).


INTRODUCTION
Spatial analysis of vector data remains very complex because of the diversity of configurations and the heterogeneity of datasets. Among other applications, map generalization (i.e., the process to simplify map information when scale is reduced) is particularly dependent on vector pattern recognition [14] to make explicit the implicit structures of the map (e.g., buildings aligned along a road). This vector pattern recognition task is very complex, and the existing techniques are not completely satisfying [27]. Pattern recognition in images was recently revolutionized by deep convolutional neural networks (CNNs) and more generally deep learning techniques. Even if CNNs cannot directly work on vector data for now, can they also revolutionize vector spatial analysis?
Deep learning has already been thoroughly used on spatial information but generally on raster spatial information or on 3D point clouds: for style transfer (e.g., a Google Earth image rendered as 21:2 G. Touya and I. Lokhat a Google Maps image) [10,17], for image segmentation (i.e., finding buildings, roads, crosswalks, or other features in images) [2,9,12], or for image classification [3,4,18]. But it is also possible to use deep learning techniques when vector spatial data is concerned, with images generated from representations of the vector data: the machine learning model learns to process the images of the vector data and not the vector data themselves. Applications to the assessment of OpenStreetMap data quality [30], car trajectory analysis [16], or map generalization [23,29] were recently proposed. This work follows the same principles (i.e., generating images of the vector data) to explore the possibilities offered by deep learning for vector spatial analysis. The workflow is the following: first convert vector data to raster optimally, then process the raster data with deep learning techniques, then finally reinject the results into the vector dataset. We focus on a use case that is particularly difficult to tackle with vector data analysis: automatically identifying the road sections that belong to a highway interchange.
This article is structured as follows. Section 2 describes the use case and the past attempts using vector-based or geometrical spatial analysis. Then, Section 3 presents an image classification model that identifies images containing highway interchanges. Section 4 details an image segmentation model that identifies the pixels of the image containing highway interchanges. Then, Section 5 discusses the optimal generation of training images based on vector spatial data. Finally, Section 6 draws some conclusions and discusses future research.

HIGHWAY INTERCHANGE DETECTION IN VECTOR DATASETS
Usually, roads are modeled in geographic datasets with minimal semantics. But they are important features of most maps, not only topographic maps, because of their key role for transportation. This is why road network enrichment by spatial analysis has been an important topic for years in geographic information science [7,8,21,26,31,32]. Past research focused on various types of road structures and patterns: continuous road sections or strokes [26], complex crossroads [8,21], ring roads [7,32], grid-like patterns [7,31,32], and dual carriageways [31], among others.
Among the patterns and structures that are implicit in a network of road sections, highway interchanges are particularly interesting ( Figure 1). Highway interchange road sections are the roads that connect a highway with other highways or other simple roads. Highway interchange patterns can be very diverse, and even if most of them can be classified to well-known patterns [5], local modifications of the patterns make them hard to recognize. Why do we need to detect highway interchange roads? The main application is map generalization, because highway interchange can be represented very differently across scales (Figure 2), and a generalization process requires the identification of these patterns prior to their graphical abstraction [15,27]. The recognition of highway interchange is also useful to enrich data in car navigation applications [5].
To detect the roads belonging to an interchange, road directions can be effectively used as slip roads and highway sections are mainly one-way directed [25]. The shape and the curvature of the road sections can also be used to detect interchanges [21]; however, in this work, the road network is already limited to highway sections plus slip roads thanks to an attribute value in the road dataset.
When there is no semantic information on road direction or road function, two other methods exist in the literature to recognize the road sections that belong to a highway interchange, the second one being a specialization of the first one [15,27]. The method is based on the classification of each junction between two road sections according to their shape and their connectivity. For instance, y-shaped junctions are connected with three road sections that form angles similar to the shape of letter y. The first step of the detection method is to recognize the y-shaped or fork-shaped junctions. Then, the second step is to cluster the close detected junctions using the distance in the   road network as the clustering distance. Next, all roads intersecting the convex hull of the large clusters are considered as belonging to a highway interchange.
In this use case, the road data we work with is produced by the French national mapping agency 1 and covers the whole French territory. In addition to the polylines, the road sections are labeled with an importance value that distinguishes highways from less important roads, but the sliproads that connect the network to the highway have varying importance values, which prevents from using this semantic information in the detection ( Figure 3). In this dataset, we can also find an interesting geographic layer with the large complex junctions, which include some highway interchange, being represented with a point geometry ( Figure 3). This multiple representation of highway interchange is a key feature to automatically derive training datasets for machine learning [29]. But only approximately half of the French highway interchanges are considered as large junctions, so this information cannot really be used in the vector-based detection method. Figure 4 shows some results on the use case dataset with the existing vector-based method from Touya [27]. The results are generally unsatisfying. Around 40% of the interchange are correctly delineated, as in Figure 4(a), but most interchange instances are identified but incorrectly delineated, as in Figure 4(b); there is even a significant part of the detected interchange instances that are in fact not highway interchange, as in Figure 4(c). These results confirm that a better method is required, and the paradigm shift brought by CNNs gives that better method, as shown in the following section.

Classification of Road Network Images
The initial breakthrough of deep learning models was in the domain of image classification, so our first idea was to classify small images of the road network to classify them into two classes:  interchange, which means that there is at least one instance of interchange in the image, and no interchange, which means that there is no instance of interchange in the image. Such a modeling of our problem does not provide the road sections that belong to an interchange. But if the image is small enough, it should limit the parts of the network that are processed with the vector-based method and could improve its results. At least it should avoid the false-positive instances like the one in Figure 4(c).
Our problem is reduced to an image classification problem, so we decided to use a deep CNN that proved successful for such problems. We empirically selected a network close to the LeNet-5 network that was proposed for hand-drawn digits recognition [13]. The version of the network we used is described in Figure 5. It can be noted that we added a dropout layer that reduces overfitting, which was particularly important for the no-interchange images that can be more diverse than the ones with interchange, so the training images might not represent the diverse patterns with enough instances. A dropout layer randomly drops out some of the units of the network to prevent co-adaptations during the early steps of training [24]. Table 1 shows the parameters used in the CNN.

Generating a Training Dataset
As mentioned earlier, the dataset for this use case is composed of road line sections that cover the whole French territory, and of 2,835 points that correspond to the most significant highway  interchange of the country. We also have access to a complete topographic dataset, and we will see later that it will be useful to automatically generate the training dataset for our model. The first step is to generate the instances of the first class (i.e., the interchange images). We use the 2,835 points to generate the same number of images of the interchange class. We generate images with 256 × 256 pixels with a scale of 1.2 × 1.2 km per image ( Figure 6). These two values for image size and scale gives a good compromise between images that should be big enough to contain all interchange roads but small enough to make the vector-based detection effective. The images are centered on the interchange point and then randomly slightly deviated from the center, as the images that the model will predict after training might contain interchanges near their border. We decided to generate black and white images with the background in black and the roads in white, with a 1 pixel width as the road symbol. All of these choices to generate the images are discussed in Section 5.
Then, the second step is to generate the training examples of the second class, no interchange. We need to generate images that do not contain any interchange. Rather than picking some random points in our road network, a more controlled process was used. First, tunnel points available in the dataset were extracted; tunnels in France are mostly located in mountainous areas where there are very few highways. Only the tunnel points that do not have an interchange point around were kept. Images were generated with the same process as with interchange points (Figure 7). The number of these points was not big enough, and the road network around tunnels, mostly located in mountainous areas, did not represent well the diversity of network structures. Then,  to obtain more images of urban networks, we selected school points, as they are abundant and mostly located in urban areas. Once again, only the school points that do not have an interchange point around were used ( Figure 8). Using these points, 3,410 images were generated for the class no interchange.
In terms of implementation, scripts using the Mapnik library 2 were developed to interact with the data stored in a PostGIS database. Mapnik is a library to generate tiled raster maps from geographic information and can be easily hacked to generate images for deep learning.
To process the vector data that is classified in the interchange class, keeping track of the original vector data, related to a given image, is necessary. Several options are possible: generating a geotiff image (i.e., a geolocated image) or keeping track of the coordinates of the extent of each image in a separated file. We chose a third option and generated a geojson file containing the vector data related to each training image, using the same Mapnik library. The created training dataset is made openly available, as well as the scripts and the model, in the DeepMapGen platform (https:// github.com/umrlastig/DeepMapGen).

Results
To test the model and the training dataset, we separated the images of each class randomly, keeping a little more than 1,500 images in each class for the training phase, 500 in each class for testing during the training phase, and another 500 images in each class for a further assessment of the trained model. The presented results in this section are all from these last 500 images of interchange and 500 without interchange. The model was implemented with the Keras Python library, with a TensorFlow core. We performed the random split of our dataset several times, and the results were extremely similar with different images in the training, validation, and test samples.  The best results obtained with this dataset were ceiling around 96% of overall accuracy on the test data, so we added some data augmentation with random rotation of the input images, which yielded better results with 99.6% overall accuracy, with a loss of 0.036, on the test data. The detailed results presented in Table 2 show that images containing interchange instances are correctly classified with an even higher rate of 99.8%, which means that images without interchange are are classified as false positive 0.8% of the time. This result is interesting because the aim of this classification model was not to provide an automated result for interchange detection, but just an optimized subset of the road network to improve the vector-based method, and this is what is achieved with this model.
The best results were obtained with a batch size of 64 and 100 epochs. To select these optimal values, experiments were carried out with batch sizes ranging from 16 to 128 and epochs ranging from 20 to 500.
To assess how good this model performed, there was no existing baseline, so we defined two ourselves. The first one is based on the vector-based method: the vector-based interchange detection is triggered on each vector extract corresponding to an image of the validation dataset, and the extract is classified as interchange when there is at least one interchange instance detected and no interchange when there is none. The results of this baseline are presented in Table 3. The CNN classifier is clearly better than the vector baseline, and the difference is even bigger, as expected, on the images that do not contain any interchange.
The second baseline is also analyzing the vector data and not its image, and uses a random forests classifier. We derived a set of descriptors of the vector road sections in each extract of the dataset. We used the following descriptors in our baseline: • the number of intersections (as interchanges contain a high density of intersections); • the total length of roads in the extract (to describe the global density of the network); • the mean and the standard deviation of the road section lengths; • the mean and standard deviation of the road section sinuosity (the proxy for sinuosity used here is the distance between the extreme points of the polyline divided by the length of the polyline); and • the total length of the road sections with maximum importance (as highways are mainly labeled with a maximum importance).
The random forests classifier was trained with the same train and test datasets and then assessed on the same validation dataset. This second baseline gives an accuracy of 88.3% on the 500 extracts of the validation dataset, and the confusion matrix is presented in Table 4. The results of the second baseline are already way better than the first one, but once again our proposed CNN model clearly outperforms this baseline. We believe that the ability of the CNN to detect the cluttered areas of the image explain the performance difference, as the descriptors used in this baseline are not really able to convey this local clutter caused by the rendering of interchange road sections.
We also used the Grad-CAM algorithm [22] to visually assess what was learned by the CNN. Figure 9 shows heat maps that correspond to what "sees" the last layer of the CNN, and it clearly learns to highlight the locations of the interchange, which confirms the good classification results.

Detecting Interchange Roads in Predicted Images
Now that the classification gives a subset of the road network that is very likely to contain highway interchange instances (i.e., the roads contained in an image classified as interchange), we need to check that restricting the vector-based method to this subset of roads improves the vector-based method results. We processed all extracts that were related to an image classified as interchange and analyzed the results. Although there is a major improvement compared to the initial processing of the complete road network, the results are still unsatisfying ( Figure 10).
It is pretty clear that both classification results and vector-based detection could be improved by adapting the method to process small extracts of the network that do contain an interchange instance. We believe that the way to optimize our results is to both filter even more the roads to process in the vector method and to adapt the vector method to these filtered roads. In this work, we decided to first focus on filtering even more the roads on which to apply the vector-based  method. We propose to use an image segmentation model, and this segmentation method is presented in the following section.

Segmentation of Road Images
Image segmentation by deep learning has been a very active field in the past 10 years, and models are now able to delineate features in photographs, scientific images, or videos. The images generated from the road network are not close to photographs that attract most of the attention of the researchers but are closer to the scientific images processed in the works introducing the U-Net architecture [20]. U-Nets were already used with images generated from vector spatial data [16,23]. U-Nets provide a classification probability for each pixel of an input image and are based on a sequence of down convolutions, as in CNNs, and then up convolutions ( Figure 11). The compactness of the features segmented by the model is assured by so-called concatenate layers that are connected to the neurons of down convolution layers.
We also briefly tried other network architectures dedicated to segmentation problems [1,19], but there was no clear improvement compared to our U-Net, so we decided to limit the investigations with these networks. Future work is required to know if a finer architecture can improve our results.

Generating a Training Dataset
In this case, a training example consists of an input image of the road network and a label image showing the pixels of the input image that belong to an interchange ( Figure 12). We first used the same images as the ones in the classification model, but the results were not optimal so we changed the color model only, switching from black and white images to RGB images, with a white background and roads with colors corresponding to their importance value in the dataset. As highway interchanges are usually around important roads, it helped improve the segmentation results a lot.
Regarding the label images, they are black and white images, where a white pixel means that the pixel does not belong to an interchange, and a black one means that the pixel belongs to an interchange. The label images were generated from the interchange points used to train the classification model, and a black square was generated around each point. The extent of an interchange can vary a lot from small to very complex interchanges, but to keep the dataset generation automatic, we had to choose a fixed size for the black square. The size of the square is kept rather small to make sure it does not cover areas where there is no interchange. The drawback is that it does not completely cover all of the roads of the interchange. Label images that do not contain any interchange are left totally white.

Results
The experiment setup was the same as the one for image classification: similarly to the classification experiment, we separated the images of each class randomly into train, test, and validation datasets, and the U-Net model was also implemented with the Keras Python library, with a Ten-sorFlow core. The best results obtained after 30 epochs are the following: an accuracy of 97.8% and a loss value of 0.0696 on the training data, and an accuracy of 94.3% and a loss value of 0.3773 on the validation data. Figure 13 shows some good results on four of the interchange images of the test data. The pixels with the higher probabilities (with darker shades of grey in the image) are clearly the ones that intersect the roads of the interchange. Even the very small interchange on the right of the image is correctly segmented. And even when the highway is not colored in red because the importance value is not the one usually applied to such roads (maybe an error in the data), the interchange is segmented (second image from the left). Similar good results appear on a large majority of the tested samples: sometimes the pixels with a high probability cover a little more than the interchange, and sometimes they cover a little less, but as the raster-based segmentation is just a first step to filter the roads to include into the vector-based method, the results really correspond to what was expected. Figure 14 shows four other examples that confirm the good results even with very complex or unusual structures. The image from the right shows the correct segmentation of two different interchanges in the same image. And the third image shows that there is nothing segmented when there is no interchange in the image.
However, both figures show that the segmented area is always mostly a square due to the shape we gave to our training masks. In the examples of Figures 13 and 14, the square grossly captures the interchange location, but in some rare cases the delineation is visually not correct. We compared the segmentation results with interchange delineations performed manually in Figure 15. Sometimes the segmented square is not located on the interchange (in the left image, the model segmented the roundabout instead of the small interchange at the bottom left). And sometimes the square is off-center (in the other three images).

DERIVING TRAINING IMAGES FROM VECTOR SPATIAL DATASETS
The previous two sections presented two deep learning models to detect highway interchanges in road networks that we trained with images derived from vector data. The way images are generated was determined empirically, and it might be sub-optimal. There are different variables on image generation: the scale and the resolution of the image, the style of the rasterized vector data,  data selection, and the way the label area is created in the segmentation use case. In this section, we discuss different alternatives for these variables to generate the training images from vector data and how they perform in our use cases.

Scale and Image Resolution
The first two variables to set when generating images for the vector data are the scale and the resolution of the image. The scale is simply the ratio between the width (or height) of the image and the length of the same geographic extent on the ground. The image resolution is the ratio between the width (or height) of the image and the number of pixels. To assess the importance of those two variables on the effectiveness of the images, we tested two types of variations: • Scale variations: Same image size and resolutions but different ground extents.  Table 5 shows the results obtained with resolution variations with the image classification model for a same scale (images cover 1.2 × 1.2 km). The resolution we used (256 × 256 pixels) gives the best results. A smaller resolution gives worse results, but they are still very good. On the contrary, a larger resolution gives much worse results, which suggests that images should not be larger than 256 × 256 pixels. A smaller resolution than 128 × 128 pixels for such a scale was not tested, but we believe that it would give bad results, as many highway interchanges would be covered by very few pixels, making them hard to recognize in the image.
The scale variation was also tested, but the range for scale variation in our use case is not that big. Hence, an highway interchange can be quite large, and a too large scale might cause incomplete interchanges in the images, which is a problem. However, too small scales might cause too much coalescence between the road symbols, except if the resolution is high enough (but we have seen just before that a high resolution gives bad results). This is why only slight variations of the scale were tested: Figure 16 shows images that are 1.2 km wide and images that are 0.8 km wide. The results with these two scales are extremely similar, so we conclude that if scale remains in a reasonable range, this variable has a low impact on the effectiveness of the deep learning models.

Styling Vector Data
Besides scale and resolution, another key variable is the way vector data is styled prior to rasterization. In the experiments described in the previous sections, we tested two different styles: • a black background and white symbols for roads (1 pixel width), and • a white background and roads colored according to importance (from black to red, with 1 pixel width).
The first style has the advantage of simplicity, as the color information is coded with only two values (0 or 1), whereas a colored image is encoded with a triplet of values between 0 and 256. The drawback of this simple style is that it does not use the semantic information about road importance, whereas highways are coded as important roads. It was enough to get great classification results but not enough for the segmentation model.
The second style makes a complete use of the convolutional layers, as the model learns that interchange roads often lie around roads colored in red (the color of the most important roads). However, we have seen in the results that not all interchange instances contain roads coded as important (Figure 14), so we have to make sure that colors do not introduce a learning bias.
Another way to modify style is to change the width of the symbols applied to roads rather than the colors. Figure 17 shows an example of an interchange represented with a symbol width of 1 pixel on the left and 2 pixels on the right. It is visually clear that a symbol width of 2 pixels introduces symbol coalescence, and this coalescence might blur the location of interchanges. A quick  test on the classification model with 2 pixel symbols confirmed our doubts, with much lower classification results. However, this quick test does not dismiss the usefulness of using larger widths for other use cases where a larger scale is possible.

Data Selection
The last variable for the generation of input images is selection of the features of the dataset to render in the image. First, if we only focus on the roads to display in the image, there are different possibilities: • display only the n closest roads to the interchange point (this is the option chosen in our classification experiment with n = 200), • display all of the roads that fall into the envelope of the image, or • display only the most important roads.  We did not test the last option of data selection (i.e., selecting only the important roads). Indeed, a simple filter on the importance attribute was often filtering too many roads, even when only removing the least important roads (the black ones in the colored examples presented in this article). Road network generalization provides methods to select the most important and salient roads, going beyond a basic semantic filter [27]. It would be very interesting to test the training of the segmentation models with images where the road network was generalized to get rid of the minor roads of the network.
If we broaden the scope of data selection, it is possible to display other types of geographic features in the image. For instance, buildings are rarely built inside interchanges, so we could display the buildings in the image. We did not carry out any experiment with buildings for time reasons, but it would be interesting to verify if it can improve the segmentation results, as the classification results are already very good.

Segmentation Labels
Regarding the segmentation problem, we tried different types of outputs to learn, all derived from the interchange point in our initial dataset. Figure 19 shows some of the labels we tested before selecting the one presented in Section 4.2. All of the methods that use the roads in the label were the least effective compared to the more abstract ones.  We also compared the automatically created labels to labels segmented manually on a very small number of examples ( Figure 20). On this small number of examples, the manually segmented labels allowed much better results than our best automatic method (a small black square on a white background). Although such results cannot be generalized without experiments, we believe that using a large amount of images that were manually segmented, would improve the results of the segmentation model. But creating such a large manual training dataset is costly.

Data Augmentation
This last section is not strictly about variables to generate training images but is about the possibilities to augment the dataset to have enough training examples. Contrary to photographs, it is very easy to augment a dataset derived from vector by simple translations and rotations during the rasterization. Figure 21 shows an illustration with three images generated for the same interchange point thanks to three different slight translations of the image center from the interchange point.

CONCLUSION AND FUTURE WORK
The experiments presented in this article show that deep learning greatly improves the detection of highway interchanges in vector road data. First, a simple image classification CNN model greatly reduces the number of false-positive interchange detection when coupled with the vector-based detection method. Then, the U-Net segmentation model clearly improves the delineation of the interchange roads, even with an automatically generated training dataset. Beyond the progress on this specific issue, the main contribution of this research is the exploration of different methods to generate images adapted to deep learning models from vector datasets. It opens up new perspectives on the use of deep convolutional networks for pattern recognition with vector spatial data.
There is a lot to do to go further. Regarding the use case on highway interchange, a first step would be to improve the training datasets used for the segmentation model by data augmentation. Another way to improve the training dataset is to use the vector-based method to find interesting areas in the network: it could add new interchange instances that do not have an interchange point in our dataset, and also add false-positive cases (i.e., areas recognized as interchange by the vectorbased method but are not), because it would help the model learn how to process such cases. Another way to improve the training dataset is to add new instances. We already used all instances in the French national mapping agency dataset, but we can use the worldwide OpenStreetMap dataset, where the features with the tag value "motorway_junction" for the tag "highway" (Figure 22). There are different features than the ones used in this article, but images centered on such features would contain the interchange roads. By the time of this writing, there were 171,784 instances 3 of motorway junctions that could be used to generate new examples.
We already mentioned it several times in the article, but another obvious future work is to improve the vector-based method to obtain optimal results once the roads have been properly filtered by the deep learning models. Our idea is to combine our existing method with the one using road shapes and curvature [21] that also process filtered roads, a semantic filter in their case. More generally, we can improve the segmentation model by going deeper on the experiments on the model architecture. First, it could be interesting to test residual U-Nets [33] as an improvement of the segmentation model: residual U-Net outperformed traditional U-Nets on different binary segmentation problems. Regarding the loss function, we used a basic binary cross-entropy function, and we plan to test more functions, such as the one proposed for the seminal U-Net [20] or overlap measures. To segment geometrically realistic delineations of the interchanges, it could also be interesting to test generative adversarial networks that perform well to generate realistic map images [10,11]. We also plan to test the model on other regions of the world that might present other types of interchange patterns. In addition to these tests on other regions, we want to try the model on other datasets, such as OpenStreetMap, to check if there are some transfer learning issues.
A straightforward follow-up to this work would be to use deep learning techniques to recognize other types of patterns in road structures, such as complex crossroads, ring roads, grid-like patterns, or dual carriageways, as existing vector-based methods [7,8,21,31,32] can really be improved.
A similar approach could also be used for other types of vector analysis problems. For instance, urban block classification is quite useful for map generalization purposes, but the current approaches based on classical supervised learning techniques fall quite short because the optimal descriptors to classify a building block are not easy to identify. Another interesting example is the inference of landmarks from a set of polygon buildings, i.e. finding the ones that are visually salient during a navigation or a map reading task [6,28]; once again, classical machine learning techniques give perfectible results because it is extremely complex to define the descriptors of what makes a landmark for the human brain.
Finally, one last track for future research would be to generalize the experiments and discussions of Section 5.1: more experiments could help us define some general guidelines on the variables for image generation (scale, resolution, styling, data selection).