Announcing the release of Florence, TruQua’s Machine Learning platform built to integrate with SAP Systems

CHICAGO, IL, APRIL 03, 2018— TruQua Enterprises, LLC, a leading SAP software and services firm, today announced the release of Florence, the first Machine Learning platform built to integrate with SAP Systems.  Deployed on the cloud, Florence provides a platform for the simple, and secure deployment of Machine Learning algorithms, with secure connectivity to SAP business systems.

As Machine Learning capabilities evolve, so does the need to integrate Data Science, Finance and Logistics groups quickly, with minimal disruption to existing business processes, which is how the idea for Florence originated. TruQua developed Florence to simplify the usage of Machine Learning models within SAP. Utilizing the Florence platform, SAP technologies such as S/4HANA, SAP HANA, Cloud Applications and Business Intelligence applications can now integrate seamlessly with Machine Learning predictions, without the need for additional infrastructure or software licenses.

With the announcement of Florence, TruQua has created three targeted offerings designed to help customers on their Machine Learning journey. These offerings include:

  1. First Steps with Florence – For customers who are just getting started with Machine Learning. A six-week engagement where customers work with Data Scientists to define the use case, build and initial model and integrate it with their business systems.
  2. Targeted Business Need – A five-week engagement to create an initial model and integrate it with existing business systems.
  3. Bridging Data Science and Finance- A two-week engagement to integrate Machine Learning models into a customer’s existing business systems.

For more information including demos, documents and use cases visit: florence.truqua.com

About TruQua Enterprises
TruQua Enterprises is an IT services, consulting, and licensed SAP development partner that specializes in providing “True Quality” SAP solutions to Fortune 500 companies with integrated, end-to-end analytic solutions.  Through project management, software innovation, thought leadership, implementation and deployment strategies, TruQua’s team delivers high value services through its proprietary knowledge base of software add-ons, development libraries, best practices, solution research and blueprint designs. TruQua has also been certified as a Great Place to Work and ranked #11 by Fortune Magazine’s “The 50 Best Companies to Work for in Chicago” in the Small and Medium Sized Companies category. For more information, please visit www.TruQua.com or follow us on twitter @TruQuaE.

#  #  #

TruQua Enterprises Media Contact
Allison Martin
Marketing
allison.martin@truqua.com

Key Business Factors in Machine Learning: Part 2- Testing 3 Machine Learning Techniques to predict Divvy Bike Ride Predictions

Authored by: 
Annie Liu, Consultant, TruQua Enterprises
JS Irick, Lead Developer and Principal Consultant, TruQua Enterprises
Daniel Settanni, Senior Cloud Architect, TruQua Enterprises
Geoffrey Tang, Consultant, TruQua Enterprises

In part one of “Key Business Factors in Machine Learning” (https://www.truqua.com/key-business-factors-machine-learning-part-1-predicting-employee-turnover/), we explored how Machine Learning can categorize data.  We also reviewed the business’s role in model development.  In this blog, we will look at creating Machine Learning algorithms to predict values.  In particular, we will be looking at Sales Demand for Bicycle rentals.  Divvy Bikes is Chicago’s own bike sharing system and, with over 10 million unique rides taken a year, the largest in North America.

The dataset used in this article combines all of Divvy Bike’s 10+ Million rides from 2015, along with hourly weather data and the Chicago Cubs schedule to observe the effect of external factors on rider traffic (for a different presentation on adapting Machine Learning models for discrete locations).

In this example, we are going to test three different popular Machine Learning technique: Machine Learning with Logistic Regression, Support Vector Machines and Random Forest algorithm models to predict the number of bikes that will be in service for a given hour of a Divvy station.

Refining the dataset

The Divvy Bike/Weather/Cubs dataset in this article is much more complicated than the Employee Attrition dataset in part 1, featuring over 55 different factors.

Key Business Factors in Machine Learning
Key Business Factors in Machine Learning

Two of the factors in the model can be generalized into groups to help more efficiently train the model.  These factors express time as integers – Day of the Year and Day of the week.  When expressing them as integer, their meaning is actually obscured from the model.

Certain algorithmic techniques can work around this obfuscation, but it can be much more efficient to perform an initial grouping to accelerate the model development.

Day of the week and day of the year have obvious groupings; however, your business data may have groupings that are not immediately obvious to the data scientists.

Here we see the impact of creating a “Season” category for day of the year:

Key Business Factors in Machine Learning

Similarly, we can see that there is a large effect on demand based on the day of the week.  Weekdays have a huge 5PM spike that is not seen on weekends.  Therefore, we can greatly increase the initial accuracy of our models by changing day of the week into a Weekday/Weekend grouping.

Key Business Factors in Machine Learning

Investigating the Data

Key Business Factors in Machine Learning

Removing outliers can be a critical step in increasing model fit.  However, it is important to define just what an outlier is in the context of your business process.  For example, removing outliers from our bike ride dataset can help refine our model (the Fourth of July causes a demand spike which is obvious to anyone familiar with the US.  The demand spike for the “Air and Water Show” would not be apparent to non-Chicagoans.).  We don’t want our model to try and fit a factor that is not present in our model.

Key Business Factors in Machine Learning

However, compare this to Fraud detection algorithms that exist only to detect outliers.  If we were to remove the outliers from that model, we would end up with a 100% accurate, since all the fraudulent transactions were removed from the dataset.

However, compare this to Fraud detection algorithms that exist only to detect outliers.  If we were to remove the outliers from that model, we would end up with a 100% accurate, since all the fraudulent transactions were removed from the dataset.

Modeling with the Dataset

Once the dataset has been prepared, it is time to develop, train and test with different Machine Learning algorithms.

In this example, we looked at three very different algorithms – Logistic Regressions, Support Vector Machines, and Random Forests.

Logistic Regression

Logistic Regression is a statistical method for analyzing a dataset where there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (MEDCALC).

Support Vector Machines

Support vector machines (SVMs) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other (Wikipedia).

Random Forest

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set (Wikipedia).

Modeling Results

Key Business Factors in Machine Learning
As can be seen, Logistic Regression was the clear winner for this scenario, but the optimal model typically isn’t obvious at the start, and may even be a surprise at the end. Further accuracy gains required separating out the data by departure station, as there are significant model differences between each station (for example, downtown station usage is very resistant to changes in weather and is almost entirely dependent on the weekday/weekend grouping).
For more information on how you can put Machine Learning to work at your own organization, contact us today at info@truqua.com. Our team of consultants and data scientists are on-hand and ready to assist.  For companies with robust data science organizations, we offer several project accelerators to easily and securely combine your business data with your Data Scientists’ Machine Learning algorithms.

Key Business Factors in Machine Learning: Part 1- Predicting Employee Turnover

Authored by:
Annie Liu, Consultant, TruQua Enterprises
JS Irick, Lead Developer and Principal Consultant, TruQua Enterprises
Daniel Settanni, Senior Cloud Architect, TruQua Enterprises

As industry leaders ramp up their investments in Machine Learning, there is a growing need to communicate effectively with Data Scientists. Without a true understanding of both the technology and business factors involved in the Machine Learning scenario, it is impossible to create long term solutions.

In Part 1 of this 2-part blog series, we will work through the first of two Machine Learning examples and describe the communication and collaboration necessary to successfully leverage Machine Learning for business scenarios.

Machine Learning algorithms are very good at predicting outcomes for many different types of scenarios by analyzing existing data and learning how it relates to the known outcomes (what you’re trying to predict).  Two of the most common types of machine learning algorithms are classification and regression.

With classification, the predicted values are fixed, meaning there are a limited number of outcomes, such as determining if a customer will make a purchase or not.  Regressions on the other hand, make continuous numerical predictions, such as determining the lifetime value of a customer. In each case, it is critical that the Data Scientist understands both the inputs (the source of the individual factors and how they are created) and the business event you are trying to categorize or predict.

Next-Gen Technologies Investment

Categorizing Example: Employee Turnover

 

Understanding Machine Learning and business goals

First, let’s look at an example that demonstrates how to use Machine Learning to perform categorization. In this case, we are trying to better predict Employee Turnover. So, the goal of the machine learning algorithm is to categorize current employees as “Likely to Leave” or “Unlikely to Leave”. The categorization will be based on factors we have about each employee.

However, our goal is slightly different. Our business requirement is to identify the employees likely to leave so that actions can be taken to retain the employees. Before we continue, it is important to understand the cost of both a false positive and a false negative with regards to your business.

False Positive: An employee that is not going to leave is flagged as likely to leave.

False Negative: An employee leaves despite no indication from the machine learning algorithm.

In this case, False Negatives are costlier than False Positives. The algorithm with the best fit (overall performance) may not be the most effective for your business if it does not appropriately weigh the cost of the outcomes.

Business Requirement 1
Communicating the available data with the Data Scientists

Machine learning algorithms need to be developed and trained on historical data, so for each historical employee we have features that we believe are related to whether an employee stays or leaves, as well as whether they remain at the company.

When undertaking a Machine Learning project, it’s critical to work with a partner who will take the time to understand the various features that can be used within the model. If the data scientist does not understand the inputs into the model, it is likely to end up with models that perform well in testing, but poorly in production. This is called “overfitting”.

This communication with the Data Scientist can also lead to the inclusion of additional valuable external data that were initially missing from the model.

Business Requirement 2

Let’s look at the factors in the Employee Turnover dataset.

SAP Analytics

There are three important items to note here:

1. Satisfaction level is self-reported and people are notoriously poor self-reporters.
2. The job role column is labeled “sales” in the input dataset. While descriptive column names are nice, they are no replacement for a good data dictionary.
3. Salary is a simple “High/Medium/Low” value, but is not normalized for job role.

Refining the dataset

Once we have reviewed the factors, as well as the business event we are trying to model, we need to better understand how they relate to each other. An analysis should occur on the relationships between factors and results, as well as between individual factors. Here we see a chart describing the correlations between our various factors, and whether the employee stayed with the company.

Employee Satisfaction Level

When looking at the relationships, we start to understand the correlations between our data. This step should reveal a number of data relationships which make intuitive sense, and may show some surprising results.

1. Number of current projects and number of hours worked are related. [Intuitive] 2. Employees with a longer tenure are less likely to leave. [Intuitive] 3. There is a slight negative relationship between satisfaction and retention, [Surprising]

When looking at the relationships between data, we can also find highly correlated associations. This can help determine factors to either combine or remove.

Additionally, it is necessary to look at the numerical data to determine if we should change certain values to ranges/buckets. For example, look at the relationship between monthly hours and employee retention.

Predicting Employee Turnover

Note the monthly hours for employees that were not retained. This should make intuitive sense, as the only thing worse than working too much, is working too little. Rather than use monthly hours as a value, our model would be better served by defining categories for monthly hours.

 

Predicting Employee Turnover
Model Development

Once the data set has been analyzed, model development can begin. This is generally an iterative process, going through a number of different model types, as well as re-examining the initial data set.

While this iterative process is being performed, it is important to look at the output of the models, not just this fit. This is where the definition of your business goal, as well as communication with an experienced Data Scientist is critical. For example, a fraud detection algorithm that never detects fraud is over 99% accurate. Fit is not enough.

Predicting Employee Turnover

For our employee retention example, we tested three popular machine learning algorithms. Below you can see the Fit of each of the three models, more importantly you can see the output for a subset of the testing data.

Predicting Employee Turnover

We have taken an abbreviated look at how a data scientist might approach this scenario, but in the real world this is only a part of the solution. There are still questions surrounding how the model is served, how it is consumed within the business process and how a strategy is devised in order to retrain the model with updated data.

If you have questions, we have the answers. TruQua’s team of consultants and data scientists merge theory and practice to help customers gain deeper insights into their data for more informed decision making. For more information or to schedule a complimentary workshop that identifies what Machine Learning scenarios make sense for your business, contact us today at info@truqua.com.

Machine Learning…..Demystified

By Daniel Settanni, Senior Cloud Development Architect, TruQua Enterprises

Artificial Intelligence (AI), Machine Learning (ML), Predictive Analytics, Blockchain – with so many different buzzwords, it can be a challenge to understand how they are applicable to your business. Here’s a short primer to help customers make sense of Machine Learning in the Enterprise.

What is Machine Learning?
There are two definitions that seem to be the most commonly referenced when discussing the meaning of Machine Learning. Arthur Samuel coined the phrase Machine Learning in 1959 as a “Field of study that gives computers the ability to learn without being explicitly programmed.”

In 1998, Tom Mitchell added some clarity by stating: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”  Or in short – Machine Learning means that a computer’s performance improves with experience.

While both definitions are accurate, neither provides absolute clarity, so let’s explore the concept of Machine Learning through an example.

An Example of Machine Learning
A large credit organization is struggling to detect fraudulent transactions. They have years’ worth of historical transactions, including those that have been identified as fraudulent. Their goal is to detect suspicious transactions in real time. Putting this example into Tom Mitchell’s formula can be displayed as the following:

E = the experience of analyzing historical transactions
T = the task of reviewing transactions
P = the performance of the program in identifying fraudulent transactions

This is an example of supervised learning, which is another way of saying that the algorithm will be taught by the historical data. This is possible because the data is labeled (i.e., historical fraudulent transactions are known).

When the data isn’t labeled, unsupervised learning is required. An example of unsupervised learning would be market segment analysis. Here, Machine Learning is learning from the data and making its own connections and insights, as opposed to being taught from past outcomes.

The response, or output of Machine Learning can be described as regression or classification.

With regression, the response is a set of continuous values – think of a curve that predicts home prices based on size. A home price could be found for a home of any size.

Classification identifies group membership. For example, Machine Learning could classify images based on their content (i.e., this image contains a car, this one contains a rooster, etc.). Our fraudulent transaction example above is an example of classification.  The Machine Learning algorithm is classifying transactions as either fraudulent, or non-fraudulent.

Machine Learning Implementation Process
Now that you have a high-level overview of what Machine Learning can do, you might be wondering what it takes to implement. The basic process includes the following steps:

1. Understanding the problem that needs to be solved
2. Analyzing and preparing the data
3. Identifying potential algorithm(s)
4. Training,testing and tweaking several Machine Learning models
5. Integrating Machine Learning with existing systems and processes

Another key question you’ll need to ask is, who can do all of this? In some cases, a software vendor can deliver Machine Learning capabilities out-of-the-box. This works best when a problem is well defined and common within a specific business or industry process.

For example, SAP’s Cash Management Application is a perfect example of a solution that can harness the full benefits of machine learning.

But what if an out-of-the-box solution doesn’t exist? This is where you’ll need to go a step further and employ the skills of a data scientist and an area where TruQua can help.

Conclusion
A key thing to keep in mind with Machine Learning is that similar to most projects having to do with data analysis, if your data is inaccurate or full of discrepancies, you won’t achieve a positive end result. As with any project, it’s critical to pick your the right partner and solution provider.

How TruQua can help
TruQua’s team of consultants and data scientists merge theory and practice to help customers gain deeper insights and better visibility into their data for more informed decision-making, utilizing the latest predictive analytic and Machine Learning capabilities from SAP. Contact us today and learn how TruQua can help:

  • Improve business processes
  • Enhance decision making
  • Direct, optimize, and automate decisions to meet defined business goals