Analytical Prediction of Steel Grid-Shell Stability and Dynamic Behaviors Using Neural Networks – Part 1

,


Introduction
It is not easy to trace back the origin of grid-shells but Shukhov´s diagrids are probably the most agreed starting point (Edemskaya 2016).Especially when referring to modern steel grid-shells, his roof for the Vyksa workshop can be regarded the first double-curvature lattice roof.Yet it is not until the end of the 80's, and especially the 90's, that they became really popular through the work of engineers like Schlaich and Schober (Schlaich 1996).
Grid-shells are transparent, thin and typically exhibit high structural efficiency.Their design and fabrication are high precision jobs where tolerances are tight and flexibility low (Schlaich 2009).It is for these reasons that a suitable concept design that understands its failure mode is paramount down the line.This paper is the first one (out of three) of a broader research where artificial intelligence was applied to the stability and dynamic analyzes of steel grid-shells of paraboloid shape supported on a horizontal plane (see Fig. 1.1).In that study, three Artificial Neural Networks (ANN) with 8 inputs were independently designed for the analytical prediction of a single target variable, namely: (i) the critical (i.e. for the 1 st mode) buckling factor for uniform loading (i.e. over the entire roof), (ii) the critical buckling factor for uniform loading over half of the roof, and (iii) the fundamental frequency of the structure.This paper provides a set of equations to obtain the critical buckling factor of the structure under uniform loading, where the latter is defined as the critical load / (external load + selfweight).That factor provides a good indication of the stability of a structure, even though a geometrically non-linear analysis is still mandatory for the final design.The 1 st buckling mode identified in the analysis can be global or local, whichever is the lowest.The ANN was designed for a 1098-point dataset obtained via finite element analyzes.The characteristics of the finite element (FE) modelling carried out for data gathering are defined in section 2.1.The FE models meet some predetermined variables defining the scope under which the performed research is valid (section 2.2).
The state of the art regarding the stability of grid-shells was thoroughly expounded by Gioncu (1994).Most of the available analytical solutions predicting buckling of reticulated shells resort to the homogenization technique, treating the structure as a continuum shell (Dulácska and Kollár 2000, Kato 2005, Lefevre 2015).The analytical solution presented in this paper doesn't rely on this simplification.To the knowledge of the authors it hasn't been formulated yet analytical models describing the buckling or dynamic behaviors of the family of grid-shells addressed in the current study.

Modelling techniques
A topology has been defined parametrically on Rhinoceros 3D (McNeel 2014) + Grasshopper (Rutten 2014), as illustrated in Fig. 1.2.The base geometry is a paraboloid shell obtained by means of two translational parabolas, the generatrix and the directrix.The translational technique has the virtue of leading to flat quadrilateral surfaces that can be easily covered with planar glass panes (Pottmann 2014, Schober 2016).
Hundred and sixty (160) different paraboloids have been generated where the main dimensions L1, L2, and h, have been varied along with the spacing s between beam nodes (see Fig. 1.2), where the latter remains constant in each model.The domains considered for those variations are addressed in section 2.2.The aforementioned geometries were exported to the FE package GSA (Oasys 2010), where all line segments (beams) were first transformed into beam FEs with 6 DOF (degree of freedom) nodes, and later split into 3 equal elements as result of the sensitivity study explained in section 2.3.At this stage, a Visual Basic (VB) script was run to rotate each rectangular beam (i.e., around its longitudinal axis) by an angle , illustrated in Fig. 1.3 as the angle between the default beam local axis   ⃗⃗⃗⃗ (following global Z) and   x  ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ (the cross product between   ⃗⃗⃗⃗vector joining transversally adjacent nodes to the beam, and   ⃗⃗⃗⃗vector joining beam nodes).This technique doesn´t provide the mathematically exact normal to the paraboloid surface (Makin 2006) but it leads to equal angles  between the bar width and the supported glass panes, which is convenient engineering wise.The GSA software is valid to analyze grid-shell structures since it has been successfully used on numerous occasions in the past, either as a design or validation tool.Examples range from steel shells (Dini 2013, Olsson 2012), to timber shells (Kuijvenhoven 2009, Toussaint 2007) or composite material shells (du Peloux 2013, Tayeb 2015).

Modelling inputs
The decision about the number of inputs to consider was a trade-off between (i) the versatility of the final ANN tool, and (ii) the time needed to perform all numerical simulations for data gatheringthe more input variables, the more data points (no. of different FE models) are needed to guarantee acceptable accuracy.The following parameters were deemed unchanged in all numerical simulations: - -Roofing panels: the weight of the panels is small compared to that of the structural steel (if made of glass or polycarbonate) or even negligible (in case of ETFE).This weight is to be computed as part of the additional load q.
-Boundary conditions: the paraboloids lay on a horizontal plane, defining an ellipse (see top view in Fig. 1.2).All nodes belonging to that plane were translationally fixed and rotationally constrained with the same bending stiffness (Kn) used for all grid-shell nodes.
-Bracings: No (cable-)bracings were applied to any grid-shell.Examples of this type of structure are the Cabot Circus in Bristol or the Joe and Rika Mansueto Library in Chicago, among others.
Tab. 1.1 shows the 8 (independent) input variables considered in all parts of this research, along with the corresponding upper and lower limit values they can take.
Each of the 1098 distinct FE models corresponds to a specific combination of values taken by those variables.Tab.1.1 also indicates the ANN input node corresponding to each variable.Further considerations about those variables read (recall Fig. 1.2): -Main dimensions of the paraboloid footprint (L1/2 and L2/2): the aspect ratio -Height of the paraboloid (h): the rise / span ratio is limited to 0.15 ≤ ℎ  2 ⁄ ≤ 0.5.This is the range recommended by Schober (2016) for dome caps under uniform loading.For ratios below 0.14, the material usage and the risk of buckling increase considerably.
-Beam spacing (s): it is the beam spacing in both directions (or the dimension of all grid-shell planar and squared panessee Fig. 1.2), and the values taken lay approximately in the range observed in the long list of built projects referenced by Schober (2016).
-External load (q): It is uniformly distributed over the roof surface, with vertical direction and pointing downwards.It takes random values between 0 kPa and 3.5 kPa.
-Beam cross-section dimensions (a and b): the cross-section of all beams is rectangular and solid.Its dimensions come (roughly) from the range of values employed in the shells cited in Schober (2016).
-Bending Stiffness of Grid-Shell Nodes (Kn): Considering appropriate connection stiffness is quite important, as demonstrated by Hwang (2010) when investigating its effects on grid-shells.From the several bolted systems he studied, rotational stiffness was approximately in the domain of 30 to 130 kNm/rad.Since the present study intends to be also applicable to stiffer connections (bolted or welded), the adopted Kn took values within 20 -50000 kNm/rad, following the distribution shown in Tab.1.2.For the sake of computational time, the rotational stiffness was not split into two variablesone for each bending axis, having assumed KnXX = KnYY = Kn.The 1098-point dataset considered in ANN simulations is available in Abambres and Cabello (2020).

Sensitivity studies
Two sensitivity studies were carried out prior to the 1098 FE analyzes in order to better decide which mesh density and node stiffness values to adopt in the final FE models.Three models with different geometry were used for that purpose: model 323 The first analysis aimed to understand how sensitive the FE models were to the mesh density.The grid-shell beams (aka bars) between nodes have been subdivided into 1, 2, 3 and 4 beam FE.It was adopted Kn = 990 kNm/rad for all of them.Assuming that 4 elements give the most accurate results, Tab. 1.3 (left) shows the remaining results as a percentage of the former.It turned out that any loss in accuracy when predicting the buckling factor with less than 4 subdivisions is indiscernible since it lies within the noise of the convergence.Thus, there isn't much difference in the prediction of the fundamental frequency either.Since the computational time rises with the number of FEs, the authors have opted for 3 FEs per beam.
The second study (Tab. 1.3,right) allowed to determine which node bending stiffness (Kn) yields results that are close enough to those obtained with fixed connections (i.e., infinite stiffness).Three subdivisions were adopted for the beam elements.It was observed that Kn = 50000 kNm/rad yields differences lower than 1% when the results are compared with the fixed connection counterparts, and for that reason that was the adopted upper bound for Kn.(Flood 2008).
The general ANN structure consists of several nodes disposed in L vertical layers (input layer, hidden layers, and output layer) and connected between them, as depicted in Fig. 2. Associated to each node in layers 2 to L, also called neuron, is a linear or nonlinear transfer (also called activation) function, which receives the socalled net input and transmits an output (see Fig. 5).All ANNs implemented in this work are called feedforward, since data presented in the input layer flows in the forward direction only, i.e. every node only connects to nodes belonging to layers located at the right-hand-side of its layer, as shown in Fig. 2. ANN's computing power makes them suitable to efficiently solve small to large-scale complex problems, which can be attributed to their (i) massively parallel distributed structure and (ii) ability to learn and generalize, i.e., produce reasonably accurate outputs for inputs not used during the learning (also called training) phase.Further information on Artificial Neural Networks might be found in previous publications by Abambres et al. or Haykin (2009).

Learning
Each connection between 2 nodes is associated to a synaptic weight (real value), which, together with each neuron's bias (also a real value), are the most common types of neural net unknown parameters that will be determined through learning.Learning is nothing else than determining network unknown parameters through some algorithm in order to minimize network's performance measure, typically a function of the difference

Implemented ANN features
The 'behavior' of any ANN depends on many 'features', having been implemented 15 ANN features in this work (including data pre/post processing ones).For those features, it is important to bear in mind that no ANN guarantees good approximations via extrapolation (either in functional approximation or classification problems), i.e. the implemented ANNs should not be applied outside the input variable ranges used for network training.Since there are no objective rules dictating which method per feature guarantees the best network performance for a specific problem, an extensive parametric analysis (composed of nine parametric sub-analyzes) was carried out to find 'the optimum' net design.A description of all implemented methods, selected from state of art literature on ANNs (including both traditional and promising modern techniques), is presented next -Tabs.2-4 present all features and methods per feature.The whole work was coded in MATLAB (The Mathworks, Inc. 2017), making use of its neural network toolbox when dealing with popular learning algorithms (1-3 in Tab. 4).Each parametric sub-analysis (SA) consists of running all feasible combinations (also called 'combos') of pre-selected methods for each ANN feature, in order to get performance results for each designed net, thus allowing the selection of the best ANN according to a certain criterion.
The best network in each parametric SA is the one exhibiting the smallest average relative error (called performance) for all learning data.
It is worth highlighting that, in this manuscript, whenever a vector is added to a matrix, it means the former is to be added to all columns of the latter (valid in MATLAB).

Dimensional Analysis (feature 2)
The most widely used form of dimensional analysis is the Buckingham's π-theorem, which was implemented in this work as described in Bhaskar and Nigam (1990).

Input Dimensionality Reduction (feature 3)
When designing any ANN, it is crucial for its accuracy that the input variables are independent and relevant to the problem (Gholizadeh et al. 2011, Kasun et al. 2016).
There are two types of dimensionality reduction, namely (i) feature selection (a subset of the original set of input variables is used), and (ii) feature extraction (transformation of initial variables into a smaller set).In this work, dimensionality reduction is never performed when the number of input variables is less than six.The implemented methods are described next.

Linear Correlation
In this feature selection method, all possible pairs of input variables are assessed with respect to their linear dependence, by means of the Pearson correlation coefficient RXY, where X and Y denote any two distinct input variables.For a set of n data points (xi, yi), the Pearson correlation is defined by involving it must be disregarded in the subsequent steps for variable removal.

Auto-Encoder
This feature extraction technique uses itself a 3-layer feedforward ANN called autoencoder (AE).After training, the hidden layer output (y2p) for the presentation of each problem's input pattern (y1p) is a compressed vector (Q2 x 1) that can be used to replace the original input layer by a (much) smaller one, thus reducing the size of the ANN model.In this work, Q2=round(Q1/2) was adopted, being round a function that rounds the argument to the nearest integer.The implemented AE was trained using the 'trainAutoencoder(…)' function from MATLAB's neural net toolbox.In order to select the best AE, 40 AEs were simulated, and their performance compared by means of the performance variable defined in sub-section 3.4.Each AE considered distinct (random) initialization parameters, half of the models used the 'logsig' hidden transfer functions, and the other half used the 'satlin' counterpart, being the identity function the common option for the output activation.In each AE, the maximum number of epochsnumber of times the whole training dataset is presented to the network during learning, was defined (regardless the amount of data) by Concerning the learning algorithm used for all AEs, no L2 weight regularization was employed, which was the only default specification not adopted in 'trainAutoencoder(…)'.

Orthogonal and Sparse Random Projections
This is another feature extraction technique aiming to reduce the dimension of input data Y1 (Q1 x P) while retaining the Euclidean distance between data points in the new feature space.This is attained by projecting all data along the (i) orthogonal or (ii) sparse random matrix A (Q1 x Q2, Q2 < Q1), as described by Kasun et al. (2016).1) Reduce pt-pv-ptt values by 10 units each.
2) For each variable q (row) in the complete input dataset, compute its minimum and maximum values.
3) Select all patterns (if some) from the learning dataset where each variable takes  It might happen that the actual distribution pt-pv-ptt to be used in the simulation is not equal to the one imposed a priori (before step 1).

Input Normalization (feature 5)
The progress of training can be impaired if training data defines a region that is relatively narrow in some dimensions and elongated in others, which can be alleviated by normalizing each input variable across all data patterns.The implemented techniques are the following: Linear Max Abs Lachtermacher and Fuller (1995) proposed a simple normalization technique given by

Nonlinear
Proposed by Pu and Mesbahi (2006), although in the context of output normalization, the only nonlinear normalization method implemented for input data reads where (i) Y1(i, j) is the non-normalized value of input variable i for pattern j, (ii) t is the number of digits in the integer part of Y1(i, j), (iii) sign(…) yields the sign of the argument, and (iv) C(i) is the average of two values concerning variable i, C1(i) and C2(i), where the former leads to a minimum normalized value of 0.2 for all patterns, and the latter leads to a maximum normalized value of 0.8 for all patterns.
Linear Mean Std Tohidi and Sharifi (2014) proposed the following technique ,: ,: where   1(,:) and   1 (,:) are the mean and standard deviation of all non-normalized values (all patterns) stored by variable i.

Logistic
The most usual form of transfer functions is called Sigmoid.An example is the logistic function given by .( 7)

Hyperbolic Tang
The Hyperbolic Tangent function is also of sigmoid type, being defined as .( 8)

Identity
The Identity activation is often employed in output neurons, reading () ss  = .(10)

Output Normalization (feature 7)
Normalization can also be applied to the output variables so that, for instance, the amplitude of the solution surface at each variable is the same.Otherwise, training may tend to focus (at least in the earlier stages) on the solution surface with the greatest amplitude (Flood and Kartam 1994a).Normalization ranges not including the zero value might be a useful alternative since convergence issues may arise due to the presence of many small (close to zero) target values (Mukherjee et al. 1996).Four normalization methods were implemented.The first three follow eq.( 4 = [-1, 1], respectively.The fourth normalization method implemented is the one described by eq. ( 6).

Multi-Layer Perceptron Network (MLPN)
This is a feedforward ANN exhibiting at least one hidden layer.Fig. 2 depicts a 3-2-1 MLPN (3 input nodes, 2 hidden neurons and 1 output neuron), where units in each layer link to nodes located ahead only.The network is said to be partially-connected (PC) since no connections across layers are allowed (between the source and output layers, in this case).At this moment, it is appropriate to define the concept of fullyconnected (FC) ANN.Although traditionally, the network shown in Fig. 2 would be called FC, in this work a FC feedforward network is characterized by having each node connected to every node in a different layer placed forwardany other type of feedforward network is said to be PC.According to Wilamowski (2009), PC MLPNs are less powerful than MLPN where connections across layers are allowed, which usually lead to smaller networks (less neurons).
where ym1p is the value of the m th network input concerning example p.The output of a generic neuron can then be written as (l = 2,…, L) where φl is the transfer function used for all neurons in layer l.

Radial-Basis Function Network (RBFN)
Although having similar topologies, RBFN and MLPN behave very differently due to distinct hidden neuron modelsunlike the MLPN, RBFN have hidden neurons behaving differently than output neurons.According to Xie et al. (2011), RBFN (i) are specially recommended in functional approximation problems when the function surface exhibits regular peaks and valleys, and (ii) perform more robustly than MLPN when dealing with noisy input data.Although traditional RBFN have 3 layers, a generic multi-hidden layer (see Fig. 4) RBFN is allowed in this work, being the generic hidden neuron's model concerning node 'l1l2' (l1 = 1,…,Ql2, l2 = 2,…, L-1) presented in Fig. 6.
In this model, (i)   1  2  and   1  2 (called RBF center) are vectors of the same size (  1  2 denotes de z component of vector   1  2 , and it is a network unknown), being the former associated to the presentation of data pattern p, (ii)   1  2 is called RBF width (a positive scalar) and also belongs, along with synaptic weights and RBF centers, to the set of network unknowns to be determined through learning, (iii)   2 is the user-defined radial basis (transfer) function (RBF), described in eqs.( 20)-( 23), and (iv)   1  2  is neuron's output when pattern p is presented to the network.In ANNs not involving learning algorithms 1-3 in Tab. 4, vectors   1  2  and   1  2 are defined as (two versions of   1  2 where implemented and the one yielding the best results was selected) whereas the RBFNs implemented through MATLAB neural net toolbox (involving learning algorithms 1-3 in Tab. 4) are based on the following definitions ... ... ... ...
Lastly, according to the implementation carried out for initialization purposes (described in 3.3.12),(i) RBF center vectors per hidden layer (one per hidden neuron) are initialized as integrated in a matrix (termed RBF center matrix) having the same size of a weight matrix linking the previous layer to that specific hidden layer, and (ii) RBF widths (one per hidden neuron) are initialized as integrated in a vector (called RBF width vector) with the same size of a hypothetic bias vector.

Hidden Nodes (feature 9)
Inspired by several heuristics found in the literature for the determination of a suitable number of hidden neurons in a single hidden layer net (Aymerich and Serra 1998, Rafiq et al. 2001, Xu and Chen 2008), each value in hntest, defined in eq. ( 15), was tested in this work as the total number of hidden nodes in the model, ie the sum of nodes in all hidden layers (initially defined with the same number of neurons).The number yielding the smallest performance measure for all patterns (as defined in 3.4, with outputs and targets not postprocessed), is adopted as the best solution.The aforementioned hntest is defined by ( ) ( ) ( ) 4,4,4,10,10,10,10] = [1, 1, 1, 10, 10, 10, 10] = min round max 2 + , 4 , , 1500 ln( ) = max min round 0.1 , 1500 , 300 where (i) Q1 and QL are the number of input and output nodes, respectively, (ii) P and Pt are the number of learning and training patterns, respectively, and (iii) F13 is the number of feature 13's method (see Tab. 4).

Connectivity (feature 10)
For this ANN feature, three methods were implemented, namely (i) adjacent layers only connections between adjacent layers are made possible, (ii) adjacent layers + input-outputonly connections between (ii1) adjacent and (ii2) input and output layers are allowed, and (iii) fully-connected (all possible feedforward connections).

Hidden Transfer Functions (feature 11)
Besides functions (i) Logisticeq.( 7), (ii) Hyperbolic Tangenteq.( 8), and (iii) Bilinear eq. ( 9), defined in 3.3.6,the ones defined next were also implemented as hidden transfer functions.During software validation it was observed that some hidden node outputs could be infinite or NaN (not-a-number in MATLABe.g., 0/0=Inf/Inf=NaN), due to numerical issues concerning some hidden transfer functions and/or their calculated input.
In those cases, it was decided to convert infinite to unitary values and NaNs to zero (the only exception was the bipolar sigmoid function, where NaNs were converted to -1).
Other implemented trick was to convert possible Gaussian function's NaN inputs to zero.

Identity-Logistic
In Gunaratnam and Gero (1994), issues associated with flat spots at the extremes of a sigmoid function were eliminated by adding a linear function to the latter, reading  16)

Positive Saturating Linear
In MATLAB neural net toolbox, the so-called Positive Saturating Linear transfer function, ranging in [0, 1], is defined as

Sinusoid
Concerning less popular transfer functions, reference is made in Bai et al. (2014) to the sinusoid, which in this work was implemented as

Radial Basis Functions (RBF)
Although Gaussian activation often exhibits desirable properties as a RBF, several authors (e.g., Schwenker et al. 2001) have suggested several alternatives.Following nomenclature used in 3.3.8,(i) the Thin-Plate Spline function is defined by (ii) the next function is employed as Gaussian-type function when learning algorithms 4-7 are used (see Tab. 4) ( ) (iii) the Multiquadratic function is given by ( ) and (iv) the Gaussian-type function (called 'radbas' in MATLAB toolbox) used by RBFNs trained with learning algorithms 1-3 (see Tab. 4), is defined by where || … || denotes the Euclidean distance in all functions.

Parameter Initialization (feature 12)
The initialization of (i) weight matrices (Qa x Qb, being Qa and Qb node numbers in layers a and b being connected, respectively), (ii) bias vectors (Qb x 1), (iii) RBF center matrices (Qc-1 x Qc, being c the hidden layer that matrix refers to), and (iv) RBF width vectors (Qc x 1), are independent and in most cases randomly generated.For each ANN design carried out in the context of each parametric analysis combo, and whenever the parameter initialization method is not the 'Mini-Batch SVD', ten distinct simulations varying (due to their random nature) initialization values are carried out, in order to find the best solution.The implemented initialization methods are described next.

Midpoint, Rands, Randnc, Randnr, Randsmall
These are all MATLAB built-in functions.Midpoint is used to initialize weight and RBF

Rand [-lim, lim]
This function is based on the proposal in Waszczyszyn (1999), and generates random numbers with uniform distribution in [-lim, lim], being lim layer-dependent and defined by where a and b refer to the initial and final layers integrating the matrix being initialized, and L is the total number of layers in the network.In the case of a bias or RBF width vector, lim is always taken as 0.5.

SVD
Although Deng et al. (2016) proposed this method for a 3-layer network, it was implemented in this work regardless the number of hidden layers.

Mini-Batch SVD
Based on Deng et al. (2016), this scheme is an alternative version of the former SVD.Now, training data is split into min{Qb, Pt} chunks (or subsets) of equal size Pti = max{floor(Pt / Qb), 1}floor rounds the argument to the previous integer (whenever it is decimal) or yields the argument itself, being each chunk aimed to derive Qbi = 1 hidden node.

Learning Algorithm (feature 13)
The most popular learning algorithm is called error back-propagation (BP), a firstorder gradient method.Second-order gradient methods are known to have higher training speed and accuracy (Wilamowski 2011).The most employed is called Levenberg-Marquardt (LM).All these traditional schemes were implemented using MATLAB toolbox (The Mathworks, Inc 2017).

Back-Propagation (BP, BPA), Levenberg-Marquardt (LM)
Two types of BP schemes were implemented, one with constant learning rate (BP) -'traingd' in MATLAB, and another with iteration-dependent rate, named BP with adaptive learning rate (BPA) -'traingda' in MATLAB.The learning parameters set different than their default values are: (i) Learning Rate = 0.01 / cs 0.5 , being cs the chunk size, as defined in 3.3.15.
the online learning ratetheir proposal was adopted in this work.Based on the proposal of Liang et al. (2006), the constant chunk size (cs) adopted for all chunks in mini-batch mode reads cs = min{mean(hn) + 50, Pt}, being hn a vector storing the number of hidden nodes in each hidden layer in the beginning of training, and mean(hn) the average of all values in hn.

Network Performance Assessment
Several types of results were computed to assess network outputs, namely (i) maximum error, (ii) % errors greater than 3%, and (iii) performance, which are defined where (i) dqp is the q th desired (or target) output when pattern p within iteration i (p=1,…, Pi) is presented to the network, and (ii) yqLp is net's q th output for the same data pattern.Moreover, denominator in eq. ( 25) is replaced by 1 whenever |dqp| < 0.05 dqp in the nominator keeps its real value.This exception to eq. ( 25) aims to reduce the apparent negative effect of large relative errors associated to target values close to zero.Even so, this trick may still lead to (relatively) large solution errors while groundbreaking results are depicted as regression plots (target vs. predicted outputs).

Maximum Error
This variable measures the maximum relative error, as defined by eq. ( 25), among all output variables and learning patterns.

Percentage of Errors > 3%
This variable measures the percentage of relative errors, as defined by eq. ( 25), among all output variables and learning patterns, that are greater than 3%.

Performance
In functional approximation problems, network performance is defined as the average relative error, as defined in eq. ( 25), among all output variables and data patterns being evaluated (e.g., training, all data).

Software Validation
Several benchmark datasets/functions were used to validate the developed software, involving low-to high-dimensional problems and small to large volumes of data.Validation results are not presented herein but they were made public in Researcher (2018).
Moreover, several papers involving the successful application of this software have already been published by Abambres and his co-workers.

Parametric Analysis Results
Aiming to reduce the computing time by cutting in the number of combos to be runnote that all features combined lead to hundreds of millions of combos, the whole parametric simulation was divided into nine parametric SAs, where in each one feature 7 only takes a single value.This measure aims to make the performance ranking of all combos within each 'small' analysis more 'reliable', since results used for comparison are based on target and output datasets as used in ANN training and yielded by the designed network, respectively (they are free of any postprocessing that eliminates output normalization effects on relative error values).Whereas (i) the 1 st and 2 nd SAs aimed to select the best methods from features 1, 2, 5, 8 and 13 (all combined), while adopting a single popular method for each of the remaining features (F3: 6, F4: 2, F6: {1 or 7}, F7: 1, F9: 1, F10: 1, F11: {3, 9 or 11}, F12: 2, F14: 1, F15: 1see Tabs.2-4) -SA 1 involved learning algorithms 1-3 and SA 2 involved the ELM-based counterpart, (ii) the 3 rd -7 th SAs combined all possible methods from features 3, 4, 6 and 7, and concerning all other features, adopted the methods integrating the best combination from the aforementioned SAs 1-2, (iii) the 8 th SA combined all possible methods from features 11, 12 and 14, and concerning all other features, adopted the methods integrating the best combination (results compared after postprocessing) among the previous five sub-analyzes, and lastly (iv) the 9 th SA combined all possible methods from features 9, 10 and 15, and concerning all other features, adopted the methods integrating the best combination from the previous analysis.Summing up the ANN feature combinations for all parametric SAs, a total of 475 combos were run for this work (note that this value is much lower than the total number of ANNs simulated).
ANN feature methods used in the best combo from each of the abovementioned nine parametric sub-analyzes, are specified in Tab. 5 (the numbers represent the method number as in Tabs 2-4).Tab.6 shows the corresponding relevant results for those combos, namely (i) maximum error, (ii) % errors > 3%, (iii) performance (all described in section 3, and evaluated for all learning data), (iv) total number of hidden nodes in the model, and (v) average computing time per example (including data preand post-processing).All results shown in Tab.6 are based on target and output datasets computed in their original format, i.e. free of any transformations due to output normalization and/or dimensional analysis.The microprocessor used in this work has the following features: OS: Win10Home 64bits, RAM: 128 GB, Local Disk Memory: 1 TB, CPU: Intel® Core™ i9 7960X @ 2.80-4.20 GHz.
Tab. 5. ANN feature (F) methods used in the best combo from each parametric sub-analysis (SA).

Proposed ANN-Based Model
The proposed model is the one, among the best ones from all parametric SAs, exhibiting the lowest maximum error (SA 9).That model is characterized by the ANN feature methods {1, 2, 1, 2, 2, 7, 5, 1, 3, 3, 3, 5, 3, 1 The proposed model is a single MLPN with 5 layers and a distribution of nodes/layer of 8-11-11-11-1.Concerning connectivity, the network is fully-connected, and the hidden and output transfer functions are all Hyperbolic Tangent and Identity, respectively.The network was trained using the LM algorithm (1500 epochs).After design, the average network computing time concerning the presentation of a single example (including data pre/postprocessing) is 1.06x10 -4 s -Fig.7 depicts a simplified scheme of some of network key features.Lastly, all relevant performance results concerning the proposed ANN are illustrated in 3.7.4.The obtained ANN solution for every data point can be found in Abambres and Cabello (2020), making it possible to compute the exact (with all decimal figures) approximation errors.It is worth recalling that, in this manuscript, whenever a vector is added to a matrix, it means the former is to be added to all columns of the latter (valid in MATLAB).

Input Data Preprocessing
For future use of the proposed ANN to simulate new data Y1,sim (8 x Psim matrix) concerning Psim patterns, the same data preprocessing (if any) performed before training must be applied to the input dataset.That preprocessing is defined by the methods used for ANN features 2, 3 and 5 (respectively 2, 1 and 2see Tab. 2), which should be applied after all (eventual) qualitative variables in the input dataset are converted to numerical (using feature 1's method).Next, the necessary preprocessing to be applied to Y1,sim, concerning features 2, 3 and 5, is fully described.

Dimensional Analysis and Dimensionality Reduction
Since dimensional analysis (d.a.) was not carried out, and the dimensionality reduction (d.r.) tentative hasn´t yielded any result according to the described in 3.
where one recalls that operator '.x' multiplies component i in vector rab by all components in row i of subsequent term (analogous definition holds for './').

ANN-Based Analytical Model
Once determined the preprocessed input dataset {Y1,sim}n after (8 x Psim matrix), the next step is to present it to the proposed ANN to obtain the predicted output dataset {Y5,sim}n after (1 x Psim vector), which will be given in the same preprocessed format of the target dataset used in learning.In order to convert the predicted outputs to their 'original format' (i.e., without any transformation due to normalization or dimensional analysisthe only transformation visible will be the (eventual) qualitative variables written in their numeric representation), some postprocessing is needed, as described in detail in 3.7.3.
Next, the mathematical representation of the proposed ANN is given, so that any user can implement it to determine {Y5,sim}n after , thus eliminating all rumors that ANNs are 'black boxes'.Arrays Wj-s and bs are stored online in Abambres (2020), aiming to avoid an overlong article and ease model's implementation by any interested reader.

Output Data Postprocessing
In order to transform the output dataset obtained by the proposed ANN, {Y5,sim}n after (1 x Psim vector), to its original format (Y5,sim), i.e. without the effects of dimensional analysis and/or output normalization (possibly) taken in target dataset preprocessing prior training, the postprocessing addressed next must be performed.

Non-normalized (just after dimensional analysis) and Original formats
Once obtained {Y5,sim}n after , the following relations hold for its transformation to its non-normalized format { 5, } .. , i.e. just after the dimensional analysis stage, and its original format., (30) since neither output normalization nor dimensional analysis were carried out.

Performance Results
Finally, results yielded by the proposed ANN, in terms of performance variables defined in sub-section 3.4, are presented in this section in the form of several graphs: (i) a regression plot (Fig. 8), where network target and output data are plotted, for each data point, as x-and y-coordinates respectivelya measure of linear correlation is given by the Pearson Correlation Coefficient (R), as defined in eq. ( 1); (ii) a performance plot (Fig. 9), where performance (average error) values are displayed for several learning datasets; and (iii) an error plot (Fig. 10) for functional approximation problems, where values concern all data (iii1) maximum error and (iii2) % of errors greater than 3%.

Fig. 1
Fig. 1.1.1 st buckling mode of a paraboloid shell (model 1030) supported along its perimeter.
Beam Material: structural steel with linear elastic properties according to EN 1993-1-1 (2005), namely Young's modulus E = 210 GPa, shear modulus G = 81 GPa, and Poisson´s ratio  = 0.3.The analyzed models include the self-weight of the steel considering a material density of 7.85 t/m 3 .
between predicted and target (desired) outputs.When ANN learning has an iterative nature, it consists of three phases: (i) training, (ii) validation, and (iii) testing.From previous knowledge, examples or data points are selected to train the neural net, grouped in the so-called training dataset.Those examples are said to be 'labelled' or 'unlabeled', whether they consist of inputs paired with their targets, or just of the inputs themselveslearning is called supervised (e.g., functional approximation, classification) or unsupervised (e.g., clustering), whether data used is labelled or unlabeled, respectively.During an iterative learning, while the training dataset is used to tune network unknowns, a process of cross-validation takes place by using a set of data completely distinct from the training counterpart (the validation dataset), so that the generalization performance of the network can be attested.Once 'optimum' network
3.3.4Training, Validation and Testing Datasets (feature 4)Four distributions of data (methods) were implemented, namely pt-pv-ptt = {80-10-10,70-15-15, 60-20-20, 50-25-25}, where pt-pv-ptt represent the amount of training, validation and testing examples as % of all learning data (P), respectively.Aiming to divide learning data into training, validation and testing subsets according to a predefined distribution pt-pv-ptt, the following algorithm was implemented (all variables are involved in these steps, including qualitative ones after converted to numericsee 3.3.1): either its minimum or maximum value.Those patterns must be included in the training dataset, regardless what pt is.However, if the number of patterns is lower than the rounding of pt * P/100, more patterns should be added to the training set in the following way: a. Compute the number of patterns (Lpt) that need to be added to the initially selected training patterns to equal round(pt * P/100).b.Randomly select 10.000 combinations of Lpt patterns from all those not included in the training set defined prior a).
c.For each combination/scenario in b), add those Lpt patterns to the set of training patterns defined prior a), and label all remaining learning patterns as "validation+testing". d.For each scenario in c), and for each pattern labeled as "validation+testing", check if that pattern has at least one input variable that takes a value not taken by any pattern in the training set.If it hasn´t, then that pattern should be moved to the training set.e.Among all 10.000 scenarios of training and "validation+testing" subsets addressed in b) till d), the "winner" should be the one guaranteeing the amount of training data (Pt*) closest to round(pt * P/100).f.If the winning training set selected in e) guarantees | Pt* / P -pt | ≤ 0.2, then that becomes the training data to be taken for simulation.Otherwise, the training data should be selected according to step 2 in subsection 3.3.4 of Abambres et al. (2018).4) Increase pt-pv-ptt values by 10 units each (to re-obtain the original input values recall step 1).5) In order to select the validation patterns, randomly select pv / (pv + ptt) of those patterns not belonging to the previously defined training dataset.The remainder defines the testing dataset.
i, :) and Y1 (i, :) are the normalized and non-normalized values of the i th input variable for all learning patterns, respectively.Notation ':' in the column index, indicate the selection of all columns (learning patterns).

Fig. 4
Fig. 4 represents a generic MLFN composed of L layers, where l (l = 1,…, L) is a generic layer and 'ql' a generic node, being q = 1,…, Ql its position in layer l (1 is reserved to the top node).Fig. 5 represents the model of a generic neuron (l = 2,…, L), where (i) p represents the data pattern presented to the network, (ii) subscripts m = 1,…, Qn and n = 1,…, l-1 are summation indexes representing all possible nodes connecting to neuron 'ql' (recall Fig. 4), (iii) bql is neuron's bias, and (iv) wmnql represents the synaptic weight connecting units 'mn' and 'ql'.Neuron's net input for the presentation of pattern p (Sqlp) is defined as Q 1
center matrices only (not vectors).All columns of the initialized matrix are equal, being each entry equal to the midpoint of the (training) output range leaving the corresponding initial layer noderecall that in weight matrices, columns represent each node in the final layer being connected, whereas rows represent each node in the initial layer counterpart.Rands generates random numbers with uniform distribution in [-1, 1].Randnc (only used to initialize matrices) generates random numbers with uniform distribution in [-1, 1], and normalizes each array column to 1 (unitary Euclidean norm).Randnr (only used to initialize matrices) generates random numbers with uniform distribution in [-1, 1], and normalizes each array row to 1 (unitary Euclidean norm).Randsmall generates random numbers with uniform distribution in [-0.1, 0.1].
next.All abovementioned errors are relative errors (expressed in %) based on the following definition, concerning a single output variable and data pattern,
input normalization, the new input dataset { 1, }   is defined as function of the previously determined { 1, } .  , and they have the same size, reading
Tab. 1.1.Variables and ranges of values considered in the dataset.

Tab. 3.3. Sensitivity
analysis to determine the suitable (i) number of FEs in each beam (left), and (ii) Kn upper bound (aiming to integrate a fixed connection scenario).

3. Artificial Neural Networks 3.1 Introduction
multi-variate nonlinear regression, besides not requiring a good knowledge of the function shape being modelled Performance results for the best design from each parametric sub-analysis.