Automatic sensor-based detection and classification of climbing activities

This article presents a method to automatically detect and classify climbing activities using inertial measurement units (IMUs) attached to the wrists, feet and pelvis of the climber. The IMUs record limb acceleration and angular velocity. Detection requires a learning phase with manual annotation to construct the statistical models used in the cusum algorithm. Full-body activity is then classified based on the detection of each IMU.

states and more of the static time actively resting (i.e. limb shaking), compared with climbers of intermediate skill.
However, a point of difference with [12] and [13] is that in [14] the variability in limb behaviour was associated with exploration or a more functional use of climbing wall properties. This suggests that during stops climbers may exhibit behaviours that are dedicated to more than managing fatigue.
Ascending requires skills in route finding, which reveals the ability of climbers to interpret the ever-changing structure of the climbing wall design [9], [10]. Route finding is a critical climbing skill that can be identified by differentiating exploratory movements and performatory movements [2]. In [2], a distinction between exploratory and performatory movements was made according to whether a potential hold on a climbing wall was touched, irregardless of whether it was used as a support. For example, the authors of [3] reported that skilled climbers tended to touch fewer than three surface holds before grasping the functional one.
Clearly, an excessive duration spent immobile for route finding, hold exploration or posture regulation is likely to compromise climbing fluency and lead to the onset of fatigue. The aim of this article is to propose a method to automatically detect and quantify some of the major climbing activities: immobility, postural regulation, hold exploration, hold change and traction. As stated in [12] and [13], these activities can only be defined by taking into account the activities of both the limbs and the pelvis.

A. State of the Art
Previous studies like those of [11] and [15] focused on rock climbing and analysed the climbers behaviour. This was accomplished by making video recordings of the climb and having experts manually perform the analysis. This method has several drawbacks, as the possibilities for analysis are limited, the results are of relatively low accuracy, and the process itself is long and tedious. Moreover, a full view of the climbing wall might not be available for video recording in outdoor studies without the use of drones or similar devices.
Automatic measures were reported in [1] and [16], with force sensors placed inside the holds on the climbing walls. Despite the high cost of the experimental devices for long routes, this method is not usable outdoors and requires a long set-up for indoor walls that are not yet equipped.
Given the disadvantages of these methods, wireless sensors placed on the climber might be a good solution, offering 1558-1748 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
easy measurement set-up and quick adaptation to the route environment (indoors or outdoors). In [17], for example, the climber carried a single miniature accelerometer that was used to evaluate different performance coefficients. However, this article did not present an analysis of behaviour or a procedure for distinguishing activities and their distribution along the climb. It also did not assess the activities of the different limbs, which is needed to determine the climber state on the climbing wall. The classical references, e.g. [18], dealing with the detection and classification of events in sport science do not take into account the simultaneous activity of several limbs and do not aim to reconstruct a behavioral state of the subject. This article presents a method using multiple IMUs placed on several body sites, with each IMU containing an accelerometer, a gyroscope and a magnetometer to detect limb and pelvic activities based on their acceleration or angular velocity. This step, presented in Section III, requires learning statistical modelling with a labelled set of climbs, which is accomplished by manual annotation on video recordings of climbing experts. However, these videos are no longer needed once the learning protocol is completed, and Section IV presents a procedure for determining full-body activity by combining the independent detections of limb activity.

B. Protocol
Two male climbers of ability 6a on the French Rating Scale of Difficulty (F-RSD) [19], which corresponds to an intermediate performance level [20], undertook an easy, toproped route (grade of 5c on F-RSD) composed of 20 hand holds for a 10m height. The route was identifiable by colour and set on an artificial indoor climbing wall by two certified route setters who ensured that it matched an intermediate level of climbing performance. The participants were instructed to self-pace their ascent and to climb fluently and without falling. The ascents were preceded by 3 minutes of route preview, as pre-ascent visual inspection is a key parameter of climbing performance [8]. Procedures were explained to the climbers, who then gave written informed consent to participate. 1 Each climber was considered to be in one the following states at any given time: immobility, postural regulation, hold exploration, hold change or traction. As stated previously, a single detection of limb activity is required to describe the full-body state.
Accelerations and angular velocities were collected from the four limbs and pelvis using IMUs located on the right and left wrists, right and left feet, and pelvis. The IMUs combined a triaxial accelerometer (±8G), a triaxial gyroscope (1600°/s) and a triaxial magnetometer (MotionPod, Movea c , Grenoble, France) referenced to magnetic North, sampled at 100Hz. Wireless transmissions to a controller enabled recording with MotionDevTool software (Movea c , Grenoble, France). A wearable device is required to measure climbing activities on a 10m-high climbing wall. Fig. 1. Example of the frame difference between the ground frame (red), where the gravity is known, i.e can be removed, and the sensor (grey box) frame (blue), where the gravity components are unknown and cannot be removed from the recorded acceleration (sum of the green vectors).

C. Recording and Preprocessing
The acceleration of each sensor was determined from the recorded signals. These data were used to synchronise the video recording of the climb with the IMU signals in the learning phase of the detection, as well as to detect activity. Therefore, this method had to be used even on unlabelled sets of climbing.
Although the sensors directly record the acceleration in the sensor frame, the recording cannot be used directly as proper acceleration because of the gravity component. The norm of this component is well-known (9.81m/s 2 ), but it cannot be removed from the sensor frame without knowing the orientation of the sensor in the ground reference frame.
Let a s be the recorded acceleration in the sensor frame. By denoting R the rotation matrix describing the sensor frame in the Earth reference frame (magnetic North, West and vertical up direction), the acceleration a in the Earth reference is then defined as a = Ra s . Once a is obtained, the gravity component can easily be removed (see Figure 1).
To determine R, a complementary filter based algorithm is used [21], [22], based on the three sensor information sources (i.e. accelerometer, gyroscope and magnetometer). The gyroscope measures precise angular changes over very short time durations but cannot be used to track the angle changes by integration due to drift. The accelerometer provides absolute, albeit noisy, measurements of acceleration. By combining the two sensor information sources it was possible to reduce the drift of the gyroscope for sensor orientation tracking. When magnetometer information was added, it was possible to compute the sensor orientation respect to the fixed frame of Earth reference. Figure 2 provides an example of the recorded norm of the processed acceleration signal.
Based on the processed acceleration and the angular velocity, we automatically detected the limb motion using the cusum [23] method.

III. ACTIVITY DETECTION
Activity was detected from each sensor independently. As this method requires a learning phase, this last is also described in detail. Once the learning phase is completed, the paragraph III-A can directly be used.

A. Cusum-Based Detection
The cusum algorithm is often used to determine a change point in a time series of independent random variables based on statistical models. In this case, we assume that a limb is either immobile (state called H 0 ) or mobile (in motion) (state H 1 ) at a given time. It is further assumed that these states are exhaustive and exclusive, i.e. for each sample, the limb is in one of these states and only one. The idea is thus to estimate the state of the limb based on the recorded signals. Let x t be the considered signal. In our case, it will be the norm of either the acceleration or the angular velocity.
The main idea is to assume that x t is a random variable sampled from a distribution fixed by the state of the limb. In other words, if the limb is in state H 0 (respectively H 1 ), then x t ∼ p(.|H 0 ) (respectively x t ∼ p(.|H 1 )). If p(.|H 0 ) and p(.|H 1 ) are known, a likelihood ratio can be computed for a given t (we directly consider t as a discrete variable, due to the sampling process) to estimate the distribution from which t was sampled It is assumed that the state changing periods are much longer than the sampling time and can therefore form a cumulative sum to increase the detection performance. Let S x t be the cumulative sum of the log likelihood ratio The propensity of S x t to be monotonous indicates the state the limb should be assumed to be in. A change in monotony implies a change in the state. An algorithm looking directly at a change in monotony would be subject to many false detections. Instead, the use of positive thresholds λ 0 and λ 1 helps to reduce false alarms.
For example, given state H 0 at t = 0, state H 1 is detected when When a change point is detected, the process starts again by taking the detection time as the new time origin and starting the cumulative sum again from this point.
Although thresholds can be chosen to be equal, we consider a more general framework here by taking different values. These thresholds can influence the false positive detection rate and therefore should be chosen according to a given performance measure or some prior model of detection.
Thresholds as well as the distributions p(.|H 0 ) and p(.|H 1 ) are not known in advance in this case, which is why a learning step is required. A labelled set is used to estimate the distributions and thresholds based on manual annotation, described in the next section.

B. Construction of the Labelled Set
The climbs were video recorded in their entirety, and experienced climbers manually annotated three different climbs by two different climbers. They indicated for each frame of the video the state of each limb.
Because the camera and sensors were not synchronised, the delay between the frame-based manual annotations and the acceleration had to be estimated. The videos were recorded using a fixed camera facing the wall, with a red light attached to the climbers pelvis. The position of the red light on the image therefore gave the position of the pelvis on the wall, up to some correction. A classical Kalman filter was used to track the red light in the frame, and an adaptive filter was required to counterbalance similar colours in the surrounding environment (in this case, the presence of televisions in the recorded picture). Once the red-light position on the image was obtained, lens distortion had to be corrected, followed by parallax correction. An example of automatic tracking after correction is presented in Figure 3.
Based on the obtained trajectory, an approximate acceleration of the pelvis was determined. Using a maximum correlation measure with the sensor-based lateral and vertical accelerations of the pelvis (after a pre-processing to get rid of the aliasing effect due to the pixel-based tracking), the delay between each signal was estimated. Therefore, the manual annotation (based on the video) and the recorded signals (based on the sensors) could be synchronised. An example of correlation is presented in Figure 4.
An example of the synchronised annotation and signal is presented in Figure 6. Based on these annotations, a model was determined for each state. In this example, it appears clearly that the high values of the accelerations match the H 1 hypothesis (mobility). These three annotated climbs, along with the recorded IMU signals, will be used as a labelled set for the learning phase of activity detection. It is also apparent that the red light can be hidden, for example, due to substantial body rotation so that the light no longer faces the camera. On this image, the lens distortion and the parallax were corrected. Fig. 4. Example of the correlation between the sensor acceleration and the video tracking-based acceleration. In this case, the delay maximising the correlation is around 2.2s.

C. Learning Protocol
For now, we consider that x t ∈ R + is either the norm of the acceleration or the norm of the angular velocity. Using the manual annotations, an example of the histogram of the acceleration norm for each hypothesis H 0 and H 1 is presented in Figure 5.
Instead of considering general distributions for p(.|H i ), i = 0, 1, parametric distributions are considered. A χ 2 test is performed for each sensor and each climb and the results are presented in Table I. The values presented in Table I are averages of the p-values for the three annotated climbs. Table I compares the fit between the norm of the acceleration and its squared value to the Gamma distribution, the exponential distribution (which seems to fit the samples in the H 0 hypothesis), and the χ 2 distribution with three degrees of freedom (corresponding, for the squared norm, to a model where each component of the signal is sampled from a Gaussian). It indicates that the Gamma distribution-based model seems a good choice to describe the signals.
To be more specific, we consider the Gamma distributions p(.|H i ) for i = 0, 1 for both states given by where θ i and k i are positive real number and i = 0, 1 represents the state. The coefficients θ i and k i are determined by maximum likelihood. By considering a second order approximation of the digamma function (k)/ (k) [24], they are given byk with Clearly, these parameters change with the nature of the signal x t (norm of the acceleration or norm of the angular velocity).
Based on the signal x t , using these Gamma distributions with parameters determined via Equations (3) and (4), the cumulated log likelihood ratio S x t can be determined. An example is presented, along with the annotations, in Figure 6.
The method presented in Paragraph III-A leads to the determination of a cumulated log likelihood ratio (see Equation (1) where α ∈ [0, 1]. For example, if α = 0 (respectively 1), then only the angular velocity (respectively acceleration) is considered for detection. A comparaison of the performances depending on α is presented in Section III-D. It should Fig. 7. Example of a ROC figure, representing the true positive rate with respect to the false positive rate. Each dot represents an estimation for a given set of thresholds. The best set of thresholds maximising (6) is the one maximising the distance to the diagonal. In this example, the norm of the angular velocity was the signal used for detection and the sensor was attached to the left foot. be noted that although S acc t and S ang t are cumulated log likelihood ratios, S t is no longer one as the only possible distribution associated with such a likelihood would not be unitary. Therefore, maximising (5) does not correspond to maximising a likelihood. The minimisation of S t should be regarded as a minimum contrast estimation method.
Running the cusum algorithm with different thresholds λ i will clearly lead to different estimates. To measure the performance of the detection, the coefficient is used, where T P is the number of true positives (detecting H 1 when it is H 1 ), P (respectively N) the number of elements in H 1 (respectively H 0 ) in the manual annotation, and F P the number of false positives (detecting H 1 when it is H 0 ). This coefficient corresponds to the performance measure used in the learning protocol as it represents, in a ROC curve, twice the distance to the diagonal, indicating a random decision (Figure 7). The thresholds are then determined as the ones maximising c. A comparison of the recorded signal, the annotation and the (optimal) detection is presented in Figure 8. This figure also illustrates the difference between the detection based on other annotations where all the parameters are learnt from a distinct labelled set and the optimal detection based on the signal using the annotation of the signal itself to perform the detection. This optimal detection is not achiveable in practice as it requires the annotation of the considered climb.
The next section presents the performance of this detection via a cross-validation method.

D. Performance and Limitations
A cross-validation method was used to evaluate the performances of the learning algorithm. The method consists of learning the different parameters (k i , θ i , λ i for i = 0, 1), concatenating two out of the three annotated climbs, and In this case, the sensor was attached to the right hand. For better readability, the weighted coefficient α = 0 was chosen; therefore, only the angular velocity was used in the determination of S t . It is notable that the optimal estimation still differs from the annotation. applying these parameters on the third one. The annotation for the third climb makes it possible to determine a performance measure via the coefficient from (6). An optimal coefficient c can also be determined by directly learning all the parameters only using the third climb. This gives an idea of the best achievable performance. Then, permutations between the learning and testing climbs are made and the average coefficient c is considered as the final performance of the algorithm. Table II presents the results from the cross-validation for all the sensors for α = 0, 1 and the value maximising the score (and therefore, better than the extrema value). It appears that the score is roughly similar for the different values of α. However, the optimal value of α (different for each sensor) will be used from now on. For further applications with non-annotated climbs (see Paragraph IV-C), the learning phase is carried out on all three annotated climbs.
The optimal detection did not provide a very high performance measure in all cases. This may have had several causes: • Missing a Movement During Manual Annotation: Due to the lack of visibility of the concerned limb. • Delay or Different Movement Period: As the annotation is manual, a delay might occur between a movement and its detection by the person annotating the video. • Sensor Is Hit During the Climb: For example, this occurs when the climber claps his hands together, creating an acceleration peak of both wrist sensors. • Defective Orientation Estimation: As the acceleration signal requires the sensor orientation, a wrong orientation estimation will directly add bias to the acceleration because the gravity component will no longer be aligned with the vertical (according to the sensor). Another source of error comes from a wrong estimation of the delay between the manual annotation and the sensor based signals using the correlation (Figure 4). A wrong delay could lead to a less distinct support between p(.|H 0 ) and p(.|H 1), decreasing deeply the performances as the log likelihood ratio would be closer to 0. However, following the correlation analysis in Figure 4, we point out that the delay-based error is less important than the error induced by delay due to the manual annotation.
The next section presents how these binary detections (state H 1 or H 0 ) from each sensor can be classified to describe a full-body state and how this is used to measure exploration during a climb.

A. Full-Body Activity
Based on state detection of the four limbs and pelvis, we defined four exclusive states matching the different activities of the climber: • Immobility: All limbs are immobile and the pelvis is immobile. • Postural Regulation: All limbs are immobile and the pelvis is moving. • Hold Interaction: At least one limb is moving and the pelvis is immobile. • Traction: At least one limb is moving and the pelvis is moving.
Immobility is the state when the climber is not moving at all. He might be resting due to fatigue or looking at the route to determine its climbing path, while his limbs remain immobile.
Postural regulation is the adjustment of the climbers centre of mass while his limbs stay on the same holds. This might consist of a body rotation to be able to catch a hold that would not be reachable with the previous body configuration.
Hold interaction is the movement of a limb while maintaining the pelvis (and therefore the global position of the climber on the wall) immobile. This is a change in the hold in use before the next traction, a change in the position and orientation of the hand/foot on an hold for better adapted use of the hold, or successive limb movement to determine which hold is most appropriate for the next traction. In the next section, we present a more detailed classification to differentiate actual use of a hold from hold exploration.
Traction is the state when the climber is moving (generally upward) using at least one limb. Although the limb might not be moving as substantially as during a hold change, this state is still easily detected and its definition seems to fit the climbers actual traction phases on the video.

B. Limb State
It quickly appears that the state of hold interaction covers a wide spectrum of activities. It is not directly possible to determine whether the climber is actually using a hold or moving a limb towards different holds to check whether they are reachable and, if so, how to use them (in this case, the limb can remain on the same hold but change its orientation). Consequently, new states for each limb are considered as substates of the Hold interaction defined in Section IV-A. These sub-states are the following: • Immobility: When a limb is detected as being immobile.
• Use: When a limb is moving during traction.
• Change: The last movement before traction, or the final change in hold (or change in limb orientation on the same hold) before being used. • Exploration: All movements except the last one before traction. An example is the case when the climber is trying several holds before choosing the one he will be using for traction. A summary of the different states is presented in Figure 9 and an example of the full state analysis of a climb is presented in Figure 10.

C. Example of Application
This section presents a simple example of application based on the previous algorithm: measuring the ratio between exploratory and performatory movements.
As previously noted, this ratio can be used to differentiate skilled and unskilled climbers [3], which makes automatic detection of the ratio very useful. Based on the present algorithm, Exploration and Change encompass all the exploratory movements for each limb, whereas Use reflects only the performatory movements. The results based on a total of 94 climbs by three experts and three beginners on a 10m-high climbing route are presented in Figure 11. A distinction between experts and beginners clearly emerges based on the ratio between these quantities.
Like previously stated, it has been shown [3] that this ratio can be used to differentiate skilled and unskilled climbers. There is therefore an interest in automatically detecting this ratio. Based on the present algorithm, the number of exploratory movements will gather, for each limb, the number of Exploration and Change while the number of performatory movement only counts the Use activity periods. Results based on a total of 94 climbs realized by 3 experts and 3 beginners on a 10 meters high climbing route are presented in Figure 11. A distinction between experts and beginners clearly appears considering the ratio between these quantities.
It should be noted that some of the values from this figure seem rather high for a 10m-high climbing route. This is partly due to the computation of the sensor orientation, which is still somewhat inexact. If the orientation is not properly determined, the computed acceleration contains residuals from the gravity component, inducing a mobile phase detection further on and misleading the counting process. One possibility to eliminate this issue might be to only consider the angular velocity value, as the performances do not differ significantly between angular velocity and acceleration use. Another factor might be individual differences between the climbers from the Fig. 11. Comparison between the number of holds touched by the hands (exploratory movement) and the number of holds used (performatory movement) during a climb. Results for the feet are quite similar. A classification clearly appears based on the ratio between these quantities. labelled set used for the learning protocol and the climbers used in this application. To prevent this individual dependent measure, the learning phase can be accomplished using a dedicated climber recording and the learnt parameters only for this specific climber.

V. CONCLUSION
This article presents a method for the automatic detection and classification of climbers activities based on multiple IMUs. From a learning phase requiring manual annotations, a statistical model is built for the norms of acceleration and angular velocity. This model is used in a cusum algorithm to detect a binary movement state for each limb with an attached sensor. The concatenation of the states of each sensor is used to determine and classify full-body activity. A more detailed classification is then used to measure exploration during the climb.
Determining exploratory activity during climbing is useful as it provides a measure of skill and learning performances.
Future works will focus on the use of this method in a learning protocol for indoor climbing, measuring the occurrence and distribution of exploration and immobility in the participants. Another study on immobility is planned to determine which body configurations are mainly used during learning and how these configurations evolve from climb to climb during a learning protocol.