Data Mining
for the Masses
182
are called neurons. The thicker and darker the neuron is between nodes, the strong the affinity
between the nodes. The graph begins on the left, with one node for each predictor attribute.
These can be clicked on to reveal the attribute name that each left-hand node represents. The
hidden layer performs the comparison between all attributes, and the column of nodes on the right
represent the four possible values in our predicted (label) attribute: Role_Player, Contributor,
Franchise Player, or Superstar.
Figure 11-5. A graphical view of our neural network showing
different strength neurons
and the four nodes for each of the possible Team_Value categories.
Switch to the ExampleSet tab in results perspective. Again, as with past predictive models, we can
see that four new special attributes have been generated by RapidMiner. Each of our 59 athlete
prospects have a prediction as to their Team_Value category, with accompanying confidence
percentages.
Data Mining for the Masses
184
All 59 prospects are now predicitvely categorized. We know how confident RapidMiner is based
on
our training data, and Juan can now proceed to…
DEPLOYMENT
Juan wanted to quickly and easily assess these 59 prospects based on their past performance. He
can deploy his model by responding to management with several different outputs from our neural
network. First, he can click twice on the prediction(Team_Value) column heading to bring all of
the Superstars to the top. (Superstar is the last of our values in alphabetical order, so it is first in
reverse alphabetical order).
Figure 11-8. The scoring data set’s predicted values, with Superstars sorted to the top.
The ten athletes with superstar potential are now shown at the top. Furthermore, the confidence
for two of them, John Mcguire and Robert Holloway have confidence(Superstar) percentages of
100%. Juan may want to go ahead and quickly recommend that management take a hard look at
these two athletes. Gerald Luna and Ian Tucker are extremely close as well, with only slight
probabilities of being Franchise Players instead of Superstars. Even Franchise Players are athletes
with huge potential upsides, so the risk of pursuing either of these two players is minimal. There
are a couple of others with predicted superstar status and confidences above 90%, so Juan has a
solid list of players to work from.
Data Mining for the Masses
186
be an athlete worth taking a hard look at. He may just be the final piece to the puzzle of bringing
Juan’s franchise the championship at the end of next season.
Of course Juan must continue to use his expertise, experience and evaluation of other factors not
represented in the data sets, to make his final recommendations. For example, while all 59
prospects have some number of years experience, what if their performance statistics have all been
amassed against inferior competition? It may not be representative of their ability to perform at
the professional level. While the model and its predictions have given Juan a lot to think about, he
must still use his experience to make good recommendations to management.
CHAPTER SUMMARY
Neural networks try to mimic the human brain by using artificial ‘neurons’ to compare attributes to
one another and look for strong connections. By taking in attribute values, processing them, and
generating nodes connected by neurons, this data mining model can offer predictions and
confidence percentages, even amid uncertainty in some data. Neural networks are not as limited
regarding value ranges as some other methodologies.
In their graphical representation, neural nets are drawn using nodes and neurons. The thicker or
darker the line between nodes, the stronger the connection represented by that neuron. Stronger
neurons equate to a stronger ability by that attribute to predict. Although the graphical view can
be difficult to read, which can often happen when there are a larger number of attributes, the
computer is able to read the network and apply the model to scoring data in order to make
predictions. Confidence percentages can further inform the value of an observation’s prediction,
as was illustrated with our hypothetical athlete Lance Goodwin in this chapter. Between the
prediction and confidence percentages, we can use neural networks to find interesting observations
that may not be obvious, but still represent good opportunities to answer questions or solve
problems.