Chapter 11:
Neural Networks
175
CHAPTER ELEVEN:
NEURAL NETWORKS
CONTEXT AND PERSPECTIVE
Juan is a statistical performance analyst for a major professional athletic team. His team has been
steadily improving over recent seasons, and heading into the coming season management believes
that by adding between two and four excellent players, the team will have an outstanding shot at
achieving the league championship. They have tasked Juan with identifying their best options
from among a list of 59 experienced players that will be available to them. All of these players
have experience, some have played professionally before and some have many years of experience
as amateurs. None are to be ruled out without being assessed for their potential ability to add star
power and productivity to the existing team. The executives Juan works for are anxious to get
going on contacting the most promising prospects, so Juan needs to quickly evaluate these athletes’
past performance and make recommendations based on his analysis.
LEARNING OBJECTIVES
After completing the reading and exercises in this chapter, you should be able to:
Explain
what a neural network is, how it is used and the benefits of using it.
Recognize the necessary format for data in order to perform neural network data mining.
Develop a neural network data mining model in RapidMiner using a training data set.
Interpret the model’s outputs and apply them to a scoring data set in order to deploy the
model.
ORGANIZATIONAL UNDERSTANDING
Juan faces high expectations and has a delivery deadline to meet. He is a professional, he knows
his business and knows how important the intangibles are in assessing athletic talent. He also
Data Mining
for the Masses
176
knows that those intangibles are often manifest by athletes’ past performance. He wants to mine a
data set of all current players in the league in order to help find those prospects that can bring the
most excitement, scoring and defense to the team in order to reach the league championship.
While salary considerations are always a concern, management has indicated to Juan that their
desire is to push for the championship in the upcoming season, and they are willing to do all they
can financially to bring in the best two to four athletes Juan can identify. With his employers’
objectives made clear to him, Juan is prepared to evaluate each of the 59 prospects’ past statistical
performance in order to help him formulate what his recommendations will be.
DATA UNDERSTANDING
Juan knows the business of athletic statistical analysis. He has seen how performance in one area,
such as scoring, is often interconnected with other areas such as defense or fouls. The best
athletes generally have strong connections between two or more performance areas, while more
typical athletes may have a strength in one area but weaknesses in others. For example, good role
players are
often good defenders, but can’t contribute much scoring to the team. Using league data
and his knowledge of and experience with the players in the league, Juan prepares a training data
set comprised of 263 observations and 19 attributes. The 59 prospective athletes Juan’s team
could acquire form the scoring data set, and he has the same attributes for each of these people.
We will help Juan build a
neural network, which is a data mining methodology that can predict
categories or classifications in much the same way that decision trees do, but neural networks are
better at finding the strength of connections between attributes, and it is those very connections
that Juan is interested in. The attributes our neural network will evaluate are:
Player_Name: This is the player’s name. In our data preparation phase, we will set its
role to ‘id’, since it is not predictive in any way, but is important to keep in our data set
so that Juan can quickly make his recommendations without having to match the data
back to the players’ names later. (Note that the names in this chapter’s data sets were
created using a random name generator. They are fictitious and any similarity to real
persons is unintended and purely conincidental.)
Position_ID: For the sport Juan’s team plays, there are 12 possible positions. Each one
is represented as an integer from 0 to 11 in the data sets.
Shots: This the total number of shots, or scoring opportunities each player took in their
most recent season.
Chapter 11: Neural Networks
177
Makes: This is the number times the athlete scored when shooting during the most
recent season.
Personal_Points: This is the number of points the athlete personally scored during the
most recent season.
Total_Points: This is the total number of points the athlete contributed to scoring in
the most recent season. In the sport Juan’s team plays, this statistic is recorded for each
point an athlete contributes to scoring. In other words, each time an athlete scores a
personal point, their total points increase by one, and every time an athlete contributes
to
a teammate scoring, their total points increase by one as well.
Assists: This is a defensive statistic indicating the number of times the athlete helped his
team get the ball away from the opposing team during the most recent season.
Concessions: This is the number of times the athlete’s play directly caused the opposing
team to concede an offensive advantage during the most recent season.
Blocks: This is the number of times the athlete directly and independently blocked the
opposing team’s shot during the most recent season.
Block_Assists: This is the number of times an athlete collaborated with a teammate to
block the opposing team’s shot during the most recent season. If recorded as a block
assist, two or more players must have been involved. If only one player blocked the
shot, it is recorded as a block. Since the playing surface is large and the players are
spread out, it is much more likely for an athlete to record a block than for two or more
to record block assists.
Fouls: This is the number of times, in the most recent season, that the athlete
committed a foul. Since fouling the other team gives them an advantage, the lower this
number, the better the athlete’s performance for his own team.
Years_Pro: In the training data set, this is the number of years the athlete has played at
the professional level. In the scoring data set, this is the number of year experience the
athlete has, including years as a professional if any, and years in organized, competitive
amateur leagues.
Career_Shots: This is the same as the Shots attribute, except it is cumulative for the
athlete’s entire career. All career attributes are an attempt to assess the person’s ability
to perform consistently over time.
Career_Makes: This is the same as the Makes attribute, except it is cumulative for the
athlete’s entire career.