Automatic Ferroelectric Domain Pattern Recognition Based on the Analysis of Localized Nonlinear Optical Responses Assisted by Machine Learning

Second‐harmonic generation (SHG) is a nonlinear optical method allowing the study of the local structure, symmetry, and ferroic order in noncentrosymmetric materials such as ferroelectrics. The combination of SHG microscopy with local polarization analysis is particularly efficient for deriving the local polarization orientation. This, however, entails the use of tedious and time‐consuming modeling methods of nonlinear optical emission. Moreover, extracting the complex domain structures often observed in thin films requires a pixel‐by‐pixel analysis and the fitting of numerous polar plots to ascribe a polarization angle to each pixel. Here, the domain structure of GeTe films is studied using SHG polarimetry assisted by machine learning. The method is applied to two film thicknesses: A thick film containing large domains visible in SHG images, and a thin film in which the domains' size is below the SHG resolution limit. Machine learning‐assisted methods show that both samples exhibit four domain variants of the same type. This result is confirmed in the case of the thick film, both by the manual pixel‐by‐pixel analysis and by using piezoresponse force microscopy. The proposed approach foreshows new prospects for optical studies by enabling enhanced sensitivity and high throughput analysis.


Introduction
Ferroic materials are, in essence, functional materials owing to the switchable character of their ferroic order under external fields. The domain structure formed by the arrangement of the local order parameter at the nano and mesoscale is of the highest importance since it controls the functional properties of these materials. [1] For instance, a nonlinear optical crystal can be designed through the periodic arrangement of the polarization in a ferroelectric material, where the domain period controls the wavelength of the photonic crystal. [2] The domain boundary regions can also present extraordinary properties different from that of the adjacent domains, which further increase the functionality of the material. [3] Advanced imaging methods are required to study domain configurations in ferroelectric materials due to their reduced size and the complexity of the domain structure. Modern aberration-corrected transmission electron microscopy (TEM) is undeniably the most suitable technique for resolving complex polarization textures such as vortices, [4,5] polar waves, [6] or chiral wall arrangements. [7] Nevertheless, this method cannot be applied routinely since the sample preparation for TEM measurements is usually difficult, timeconsuming, and invasive. Moreover, the obtained information is limited to a small area, which may not represent the overall sample properties.
Piezoresponse force microscopy (PFM) can, to some extent, present a good alternative. [8] In particular, vector PFM provides a 3D mapping of the polarization [9] that could allow for a detailed description of the domain structure. Despite its essential importance in probing ferroelectric domains, PFM can present a few downsides that hinder its application in some systems. Among these, the need for a base electrode for the measurements, the strong interaction between the probe-sample requires in some cases precise modeling of such interaction, and it is difficult to distinguish between small piezoelectric responses and possible electromechanical deformations induced by nonpiezoelectric phenomena such as electrostatic or electrochemical effects.
Low energy electron microscopy and X-ray photoemission electron microscopy allow for a non-contact and non-invasive observation of polar domain structures with a lateral resolution better than 50 nm as demonstrated in BiFeO 3 thin films. [10] On the other hand, these methods are surface sensitive and require a thorough treatment of the sample in ultrahigh vacuum conditions before the measurements to reduce surface contamination and external charge screening that can strongly affect the quality of the detected signal.
In contrast to this, optical studies are user-friendly, and no substantial sample preparation is required. While optical methods are generally shunned due to their limited lateral resolution, their combination with advanced data analysis methods has recently extended their capability by multiplying their resolving power. Principal component analysis has proven its efficiency in capturing Raman signatures of ferroic domain walls in LiNbO 3 single crystals [11] in spite of their sub−10 nm scale. The recent development of open-source algorithms based on artificial intelligence has further impacted the development of contemporary Raman and surface-enhanced Raman scattering sensors. [12] Similarly, machine learning methods have boosted the sensitivity of second-harmonic generation microscopy (SHG) to obtain phase information in reference-free experiments, [13] provide automatic cancer diagnosis in breast tissues using high-throughput SHG polarimetry studies, [14] or achieve correlative imaging at the nanoscale by combining SHG polarimetry and scanning probe microscopy. [15] This study demonstrates the high potential of laterally-resolved SHG polarimetry assisted by machine learning methods to investigate the ferroic order through the analysis of the domain structure of Germanium Telluride (GeTe), a high-T c non-oxide ferroelectric. Such Rashba-type ferroelectrics have recently attracted great attention owing to their potential technological applications based on the ferroelectric control of the Rashba-type spin-orbit coupling. [16][17][18] Even if the ferroelectricity of crystalline -GeTe is well-established, little is known about the domain structure of epitaxial GeTe films and its effect on the functionality of the material. Here, we study the domain structure of GeTe thin films epitaxially grown on Si(111). The coexistence of four different domain variants is demonstrated: the main c-domain with outof-plane polarization and a-domains forming nano-stripes oriented along three different orientations and exhibiting an inplane polarization along their short axis (see Figure 1). The local polarization orientation of each domain is derived using SHG microscopy and pixel-by-pixel polarization analysis. Unsupervised machine learning methods have been used to automatically extract the different polar domain variants from the SHG anisotropy datacubes. The results obtained by machine learning methods were confirmed by the artisanal pixel-by-pixel analysis of the local SHG polar plots in the case of a thick film containing sufficiently large domains. Vertical and lateral piezoresponse force microscopy results further corroborate the SHG results. In addition to minimizing human intervention in the image analysis process and speeding up the data processing by avoiding the tedious pixel-by-pixel fitting procedure, the use of machine learning methods has allowed for resolving domain types in the thinnest films that were not detectable by the human eye in conventional SHG analysis. This work shows that the application of advanced data processing methods assisted by machine learning increases the sensitivity and resolution of the optical method and allows for fast domain analysis.

Results and Discussion
Given the rhombohedral distortion and the existence of an electric dipole in the [111] direction in Germanium Telluride, eight possible polar domain orientations are expected in this system. This was unambiguously confirmed by the observation of herringbone domain configurations in bulk GeTe crystals. [19,20] The epitaxial growth of thin films on single crystal substrates often leads to the development of strain fields, providing an extra degree of freedom to control the domain structure. Various domain patterns can be obtained, depending on the balance between tensile or compressive epitaxial strain and the electrostatic and mechanic boundary conditions. Strain relaxation in thick epitaxial ferroelectric films is often accompanied by the formation of ferroelastic domains and twins showing a/c-domain boundaries with the main c-domain exhibiting out-of-plane polarization and fine a-domain incursions with in-plane polarization. [21][22][23] While such domain patterns have been intensively studied in the case of oxide perovskites, here, we show that similar multidomain structures can be observed in GeTe based on SHG microscopy and polarimetry analysis in combination with simulations based on the analytic form of SHG and symmetry arguments.
Even if SHG can probe nanoscale structures in particular cases owing to its quadratic sensitivity to local electric fields, [24] the lateral resolution of this nonlinear optical method is rather limited (≈300 nm). Knowing that the size of the domains strongly decreases with the film thickness, the observation of the domain structure in thin ferroelectric films is therefore clearly challenging. A key solution to this challenge is provided by the application of machine learning that has been recently demonstrated as a pertinent approach for the investigation of ferroelectric materials. [25][26][27][28][29][30] Here, machine learning algorithms are applied to SHG data recorded for two different film thicknesses: a 1800 nm-thick GeTe film exhibiting 300 nm-wide stripe domains and a 200 nm-thick film showing 50 nm-wide stripe domains.
In the following, we first recall the basic principle of SHG polarimetry and its application to ferroelectric domain imaging. Then, a detailed artisanal analysis of the pixel-by-pixel SHG polarization response is presented for the 1800 nm-thick GeTe film. The SHG polar plots derived at each pixel are manually fitted using a complex analytic form of the SHG that accounts for both the light focusing effects and the mixed contribution of different domains in a given pixel. Given the fastidious and complex character of this task, different approaches are followed to simplify and automatize the analysis of the localized nonlinear optical response. We first use a simplified artisanal method that minimizes the effect of focusing as well as the multiple domain contributions by wisely selecting the measurement geometry. Finally, we test two analysis methods based on unsupervised machine learning for the automatic recognition of the domain variants in the thick GeTe film, containing 300 nm-wide domains, and in the thin GeTe film in which the domains' size is below the resolution limit of SHG.  (7)) provides the polarization angle as an output parameter of the fit. A threshold R 2 = 0.90 is set to select the best fitting results. In this case, >50 % of the selected results exhibit an R 2 > 0.96. g) The selected results of the fit are presented in the form of bidirectional arrows forming a polarization map that is superimposed on the SHG microscopy image. The scale bars are common to all images and correspond to 2 μm.

Revealing Polar Nanodomains by Means of Second-Harmonic Microscopy with Polarization Analysis
SHG microscopy has proven its efficiency in analyzing polarization states and symmetry aspects in various systems [31][32][33] among which: thermotropic phase boundaries, [34] phase coexistence in thin films [35] and heterostructures, [36] phase transitions, [37,38] non-Ising and chiral ferroelectric domain walls, [39,40] as well as polar domain boundaries in centrosymmetric materials. [41] The variation of the SHG polarization with the light polarization provides a unique way to probe the nonlinear optical susceptibility tensor, and with this, to gain information on the local structure, symmetry, and polarization orientation in ferroelectric materials. SHG polarimetry measurements can consist of anisotropy plots (simultaneous rotation of the analyzer and polarizer, either parallel or perpendicular to each other), analyzer plots at a given incident light polarization, or by changing the incident light polarization and observing its effect on the second-harmonic emission at a given analyzer angle. The SHG intensity variation with the incident laser polarization and analyzer angle is given by: The tensor form of the second-order polarization induced by the light-matter interaction P( , ) reads: where E i ( ) is the electric field of the fundamental wave, and d ij represent the elements of the nonlinear optical susceptibility tensor written following the Voigt notation: 2d ij = (2) ikl . The indices i, j, k refer to the Cartesian laboratory coordinates (x, y, z). The complete dependence of the SHG response on the analyzer angle and the input polarization is obtained from Equation (2) using the Jones formalism accounting for the rotation of a linear polarizer as follows: GeTe films exhibit a trigonal symmetry 3m. The susceptibility tensor related to c-domains (polarization along the z-axis) can thus be written as For a-domains with in-plane polarization oriented at an arbitrary angle with respect to the x-axis in the laboratory coordinates system, the susceptibility tensor is derived using the rotation and transformation matrices as explained in Ref. [42] In the case of a moderate focusing of the fundamental beam, which is the case when objectives of 0.70 numerical aperture (NA) or smaller are used, a scalar model of the electric field E( ) based on the analytic form of SHG (Equations (1) and (2)) is sufficient to model the local polarimetry response in SHG experiments. [43] Spectromicroscopy, nevertheless, often requires high-resolution imaging and local spectroscopy analysis. An objective with 0.85 NA was, e.g., necessary to resolve the fine adomains incursions in this study. In the case of focused light, the electric field in Equation (2) is written as [44] where I 1 is a focusing-related parameter (I 1 = 0.1 for a 0.85 NA objective). [44] The scalar model, valid for unfocused photon beam or at optimum measurement geometry, is retrieved by taking The domain structure of GeTe films is derived from SHG polarization analysis by fitting the local polar plots at each pixel using a semi-analytic model based on the analytic form of SHG. This is given by Equations (2), Equation (3), and by substituting the d ij susceptibility tensor elements in Equation (3) by those of d˜c -domains or d˜a -domains . We also take into account the vector character of the electric field due to the tight focusing of the incident wave (substitution of the electric field given by Equation (6) in Equation (2)). The as-derived fitting function also considers the mixed a and c-domain fractions contributing to the SHG response. The mixed contribution is accounted for in the fit function using weighting factors W a and W c for a and c-domains, respectively, where W a + W c = 1. This leads to where I SHG c and I SHG a represent the SHG response of the individual (pure) domains. The measurement geometry presented in Figure 1 allows for the detection of the four domain variants in a single anisotropy plot in which the analyzer and the polarizer are rotated simultaneously parallel to each other (ϕ = ). In this case, the analytic form of the SHG responses is given by and where the intrinsic anisotropy factors D i correspond to the normalized nonlinear optical susceptibility factors D 1 = d 22 Figure S1 (Supporting Information).

Manual Data Processing Method
In practice, local SHG polarimetry analysis is conducted by recording a set of images as a function of the polarizer and analyzer angles. In this study, local anisotropy plots were obtained by the simultaneous rotation of the polarizer and analyzer while keeping them parallel to each other leading to the so-called SHG anisotropy plots. The images of the recorded stack are subdivided into a checkerboard consisting of homogeneous squares of multiple pixels (typically 3 × 3 pixels 2 ). A polar plot is derived at each superpixel by integrating the local SHG intensity over the stack of images. These plots are then represented at their superpixel center position to form a polar plot map as displayed in Figure 1e. The fitting of each local polar plot provides the polarization angle as an output fit parameter. A bidirectional arrow with a given orientation can, thus, be assigned to each superpixel. This results in a polarization map that can be superimposed on the SHG microscopy image (Figure 1d) for better clarity as shown in Figure 1f. This conventional analysis method that we refer to as the artisanal method shows the existence of four domain variants in the 1800 nm-thick GeTe films. The main domain exhibits an outof-plane polarization (c-domain), and the three additional stripe domains (a 1, 2, 3 -domains) exhibit in-plane polarization along the short axis of the stripe (see Figure 1g inset). From vector piezoresponse force microscopy (PFM) measurements carried out on the 1800 nm-thick GeTe sample, we can confirm that the out-of-plane polarization of the c-domain is uniform throughout the film, with no phase contrast visible in vertical PFM measurements. The lateral PFM measurements, meanwhile, show a distinct phase signal at the twin domains, corresponding to a polarization oriented across the width of the stripe domain, in agreement with SHG results. From measurements at different cantilever-sample rotation angles (see Figure S1 and the related text in the Supporting Information), we can obtain a full picture of the polar axis assignments in these structures. The two sample regions presented in Figure 2 show that a 1 and a 2 twin domains can present either identical or reversed contrasts with respect to each other, corresponding to a parallel or anti-parallel polarization orientation of the in-plane components.

Minimising the Effects of Mixed Domain Contributions and SHG Polarimetry Distortions Due to Focusing
Additional polarimetry measurements are performed by recording the variation of the second-harmonic response as a function of the analyzer angle at a fixed incident light polarization = 90°. These measurements are repeated three times such that the stripe domains a 1 , a 2 , and a 3 are aligned with the x −axis in each measurement. In this configuration, the SHG signal arises mainly from horizontally aligned domains (i.e., parallel to x −axis), thereby minimizing the mixed contributions of the c-domain and the two other a-domains. In addition, this geometry is expected to reduce the effects of the light focusing and, thus, obtain distortion-free polar plots in a similar way as demonstrated in the case of non-Ising domains presenting inplane polarization. [44] Figure 3 shows that the second-harmonic responses of the three a-domains show the same uniaxial anisotropy with a maximum intensity along the short axis of the stripe domains. This result indicates that the three stripe domains share the same nature and polarization orientation. The polar plots obtained , and (c), allowing the specific probe of g) a 1 , h) a 2 , and i) a 3 -domain components, respectively while minimizing the distortion of the polar plots due to focusing effects. [44] The experimental polar plots are represented by scattered dots, while the fit results are presented in blue continuous lines (scalar model) and orange dashed lines (vectorial model).
for a 1 , a 2 , and a 3 domains (dots in Figure 3g-i) are perfectly fitted using the vectorial (blue continuous line) and scalar (red dashed line) models of SHG. The fits show that the polarization is along the short axis of the stripes. The perfect overlap of the curves obtained by the two fit models confirms that this measurement geometry reduces focusing effects, in the same way as in non-Ising domain walls. [44] It is worth noting that in these measurement configurations, the modeling and interpretation of the results are highly simplified with respect to the method presented in the previous section. Yet, obtaining complete information on the domain structure requires a large number of measurements at different sample orientations to properly probe each domain variant.

Machine Learning-Assisted Second-Harmonic Generation Polarimetry Analysis
In the following subsections, we will demonstrate that machinelearning-assisted analysis of the data resulting from a single measurement geometry can be sufficient to easily retrieve the domain structure. The complete analysis workflow presented in this study is implemented using the Python 3 programming language. The program contains a code to load and pre-process the SHG data cube (stack of images recorded at different polarizer and analyzer angles), as well as the K-means clustering and nonnegative matrix factorization codes implemented via open access Python packages. [45]

Deriving the Domain Structure Using the K-Means Clustering Method
Ferroelectric thin films and superlattices develop complex domain configurations and topological structures. For example, in GeTe thin films, several variants and intricate domain patterns are expected. Human perception may not be sufficient to reach a precise segmentation of the SHG polarimetry data in an unbiased manner. Machine-learning-based techniques have recently proven their efficiency in such cases. In particular, clustering methods are highly suited for the identification of groups with distinct properties in a given data set, based on a concept of similarity between elements within each cluster. [15,30,46,47] In the present analysis, clustering was performed through K-means clustering in order to segment the SHG datasets into regions of interest with distinct behaviors corresponding to ferroelectric domain variants. Euclidean distance criterion is used to segment the data set into spatially indexed clusters, with centroids encoding the differing mean behaviors within each cluster. The algorithm is then initialized with k randomly distributed centroids, and each point is attributed to the cluster of the closest centroid. The centroids are then displaced to the resulting cluster center. This expectation-maximization process is repeated until convergence, with the cluster number for each parameter space vector assigned to its spatial position. When applied to polarization-dependent SHG microscopy measurements, the K-means clustering method tends to find the shortest distances between the measured polar plots and the centroids. At the end of the iterations, we obtain centroids corresponding to the typical average polar plots contained in the system, and different clusters corresponding to the domains linked to a given polar plot type as detailed in Figure S2 and the related text in the Supporting Information.
The K-means analysis of the SHG polarimetry data set related to the 1800 nm-thick GeTe film is shown in Figure 4 for a number of clusters ranging from k = 2 to k = 4. At the lowest cluster count, k = 2, the resulting map displays two color-coded clusters: one containing the main c-domain (grey background in Figure 4), and the other representing the vertical a-stripe domains (blues vertical stripes Figure 4). Increasing the number of clusters k has the effect of identifying a larger number of regions with a distinct SHG signal, thus revealing additional domain variants. Having four clusters k = 4 yield a cluster distribution that corresponds to those identified by eye in the artisanal approach detailed above. The four clusters formed by the main c-domain (gray, corresponding to the polar plot labeled 1 in Figure 4), the vertical a 1 -domain (blue, corresponding to the polar plot labeled 2 in Figure 4), and the two oblique a 2 (red, corresponding to the polar plot labeled 3 in Figure 4) and a 3 -domains (green, corresponding to the polar plot labeled 4 in Figure 4) are fully recovered, without requiring any manual input except the number of clusters. The K-means method takes only a few seconds (2.5 s) to retrieve the domain variants present in the system, while the pixel-by-pixel polarimetry analysis leading to polarization maps such as that presented in Figure 1g can take up to 48 h. Nonetheless, the main drawback of the K-means approach is the association of polar plots to a single cluster even if they exhibit slightly different behaviors. [29] Therefore, small or gradual polarization variations from one pixel to the other are washed-out by the K-means method. Besides, in the case of different coexisting domains of small sizes such as those expected for 200 nm-thin GeTe exhibiting (sub-100 nm domains), the K-means method fails (see Figure S3, Supporting Information). The use of more advanced dimensional reduction techniques such as the non-negative matrix factorization method that we discuss below could be applied in these challenging cases to enable intermixed behaviors to be separated.

Improving the Sensitivity to Nanodomains Using Non-negative Matrix Factorization
Non-negative Matrix Factorization (NMF) is an algorithm that factorizes an input matrix into two matrices. In the case of laterally-resolved SHG polarimetry, the input matrix is the polarimetry data cube and the two output matrices are the SHG polar plots, and the fraction of each polar plot at a given pixel (see Figure S4 and the related text in the Supporting Information). This method is particularly suited for SHG data analysis since it assumes non-negative matrix elements corresponding to a nonnegative SHG intensity. Besides, owing to its decomposition procedure, the NMF method is particularly adapted to the study of mixed signals per pixel, such as the mixed a/c-domain contribution to the SHG signal in the case of GeTe films. It is also worth noting this method provides the 2D distribution of the polar plot fractions (i.e., a given polar plot type percentage per pixel), while the K-means method shows the presence or absence of a given polar plot type (cluster). We, therefore, represent the NMF results in a different color code.
The NMF method was first tested in the case of the 1800 nmthick GeTe film so that the results can be compared to those obtained by the artisanal and K-means methods. This method