Machine Learning (ML) offers great potential when it comes to effectively supporting company processes. At a joint SAP and IBsolution hackathon (October 18-20), participating companies had the chance to validate the use of ML in personnel planning. The goal of the event was to develop a model that uses algorithms to make automated predictions about whether an employee will leave the company within the next twelve months. The HR area is particularly suitable for the use of machine learning, as a large amount of data on employees – job changes within the company, time recording, training completed, etc. – is available there. In addition, it is possible to anonymize and pseudonymize the data with SAP HANA Cloud so that only authorized persons can see all attributes.
The companies participating in the hackathon were Zeiss, Testo Industrial Services, Bundeswehr, Bosch Rexroth and Syskron, the digitalization unit of Krones AG. The individual teams took the opportunity to familiarize themselves with innovative machine learning technologies. In advance, SAP and IBsolution had defined a business case for the hackathon participants to work on. At the center of this case was an HR business analyst who wanted to predict which employees might leave the company within the next twelve months. Such a forecast would enable managers to take timely countermeasures to prevent unwanted employee turnover.
The analyst anticipates that machine learning can uncover hidden patterns in employee data that provide valuable insights regarding an employee’s potential departure from the company. The data can be prepared in SAP Data Warehouse Cloud (DWC), and the predictive models can be built directly on the database using either SAP Analytics Cloud (SAC) Smart Predict capabilities or SAP HANA Machine Learning capabilities. The HR business analyst wants to visualize the predictions provided by the ML models within a SAC dashboard so that management can easily consume them and is enabled to make quick decisions about employees. To do this, data generated during model validation can be accessed to identify, for example, which columns in the data set affect the outcome and how.
The first step in tackling the task was to load the existing data into SAP DWC, the SAP HANA Cloud database or directly into SAP Analytics Cloud and prepare it so that it could be used for machine learning. In addition, some teams made use of the ability to enrich data with information from other data sets via joins. Columns that contained data of insufficient quality were hidden using projections to avoid distortions.
Machine learning models were built using SAP HANA’s built-in machine learning capabilities: Automated Predictive Library (APL) and Predictive Analysis Library (PAL). There were basically two approaches to this: Either participants used SAC’s predictive scenarios suitable for citizen data scientists, i.e., employees from business departments without a technical background, or they accessed the SAP HANA Cloud machine learning packages APL and PAL directly using Python or R via a data scientist’s usual environment such as Jupyter Notebooks or RStudio.
While APL automatically selects the best model with the best parameters based on the data, PAL takes a more flexible approach and gives the data scientist, for example, the option to freely choose the model and make further adjustments to the model parameters.
It is important to mention that SAC Predictive scenarios use APL in SAP HANA Cloud – the same feature that can be used by the data scientist. In the case of binary classification of data, APL often provides the best results because it has sophisticated techniques such as gradient boosting. In some cases, however, other algorithms provide a better result, for example a Random Forest, which may require a data scientist to implement the whole thing directly on SAP HANA instead of going the SAC route.
The final results presentations made it clear that the participants had worked intensively on the specified business case and the machine learning functionalities of SAP HANA Cloud during the three hackathon days. Thus, they succeeded in developing predictive models that forecast whether an employee will leave the company within the next twelve months. In addition, the participants also took away many valuable suggestions as to what potential machine learning also holds for other areas of the company and how this potential can be leveraged.