Data Mining for the Masses

Yüklə 4,8 Kb.

Pdf görüntüsü

səhifə	51/65
tarix	08.10.2017
ölçüsü	4,8 Kb.
	#3815

1 ... 47 48 49 50 51 52 53 54 ... 65

CHAPTER SUMMARY

Data Mining for the Masses
182
are called neurons.  The thicker and darker the neuron is between nodes, the strong the affinity
between  the  nodes.    The  graph  begins  on  the  left,  with  one  node  for  each  predictor  attribute.
These  can  be  clicked  on  to  reveal  the  attribute  name  that  each  left-hand  node  represents.    The
hidden layer performs the comparison between all attributes, and the column of nodes on the right
represent  the  four  possible  values  in  our  predicted  (label)  attribute:    Role_Player,  Contributor,
Franchise Player, or Superstar.

Figure 11-5. A graphical view of our neural network showing different strength neurons
and the four nodes for each of the possible Team_Value categories.

Switch to the ExampleSet tab in results perspective.  Again, as with past predictive models, we can
see that four new special attributes have been generated by RapidMiner.  Each of our 59 athlete
prospects  have  a  prediction  as  to  their  Team_Value  category,  with  accompanying  confidence
percentages.

Chapter 11: Neural Networks
183

Figure 11-6. Meta data for neural network predictions in the scoring data set.

Change to Data View using the radio button. By now the results of this type of predictive data
mining model should look fairly familiar to you.

Figure 11-7. Predictions and confidences for our neural net model.

Data Mining for the Masses
184
All 59 prospects are now predicitvely categorized.  We know how confident RapidMiner is based
on our training data, and Juan can now proceed to…

DEPLOYMENT

Juan wanted to quickly and easily assess these 59 prospects based on their past performance.  He
can deploy his model by responding to management with several different outputs from our neural
network.  First, he can click twice on the prediction(Team_Value) column heading to bring all of
the Superstars to the top.  (Superstar is the last of our values in alphabetical order, so it is first in
reverse alphabetical order).

Figure 11-8. The scoring data set’s predicted values, with Superstars sorted to the top.

The ten athletes with superstar potential are now shown at the top.  Furthermore, the confidence
for two of them, John Mcguire and Robert Holloway have confidence(Superstar) percentages of
100%.  Juan may want to go ahead and quickly recommend that management take a hard look at
these  two  athletes.    Gerald  Luna  and  Ian  Tucker  are  extremely  close  as  well,  with  only  slight
probabilities of being Franchise Players instead of Superstars.  Even Franchise Players are athletes
with huge potential upsides, so the risk of pursuing either of these two players is minimal.  There
are a couple of others with predicted superstar status and confidences above 90%, so Juan has a
solid list of players to work from.

Chapter 11: Neural Networks
185
But Juan knows that these players are likely already on the radar screen for many other teams in
the  league  as  well.    Perhaps  he  should  look  at  a  few  potential  alternatives,  that  aren’t  quite  as
obvious to everyone.  Juan might be able to score a real win by thinking creatively, and his savvy
and  experience  has  told  him  that  sometimes  the  best  player  acquisitions  aren’t  always  the  most
obvious ones. Click on confidence(Franchise_Player) twice.

Figure 11-9. The scoring data set’s predicted values, with highest Franchise_Player
confidences sorted to the top.

There are 11 predicted Franchise Players in the list of 59 prospects.  Perhaps Juan could suggest to
management  that a  solid,  long-term  building  block  player  could  be  Fred  Clarke.   Clarke  may  be
easier to persuade to come to the team because fewer teams may already be in contact with him,
and he may be less expensive in terms of salary than most of the superstars will be.  This makes
sense, but there may be an even better player to pursue.  Consider Lance Goodwin on Row 20.
Goodwin is predicted to be a Franchise Player, so Juan knows he can play—consistently and at a
high level. He would be a solid and long term acquisition for any team. But add to this Goodwin’s
confidence  percentage  in  the  Superstar  column.    Our  neural  network  is  predicting  that  there  is
almost an 8% chance that Goodwin will rise to the level of Superstar.  With 10 years of experience,
Goodwin  may  be  poised  to  reach  the  pinnacle  of  his  career  within  the  next  season  or  two.
Although he was not the first or most obvious choice in the data set, Goodwin certainly appears to

Data Mining for the Masses
186
be an athlete worth taking a hard look at.  He may just be the final piece to the puzzle of bringing
Juan’s franchise the championship at the end of next season.
Of course Juan must continue to use his expertise, experience and evaluation of other factors not
represented  in  the  data  sets,  to  make  his  final  recommendations.    For  example,  while  all  59
prospects have some number of years experience, what if their performance statistics have all been
amassed against inferior competition?  It may not be representative of their ability to perform at
the professional level.  While the model and its predictions have given Juan a lot to think about, he
must still use his experience to make good recommendations to management.

CHAPTER SUMMARY

Neural networks try to mimic the human brain by using artificial ‘neurons’ to compare attributes to
one another and look for strong connections.  By taking in attribute values, processing them, and
generating  nodes  connected  by  neurons,  this  data  mining  model  can  offer  predictions  and
confidence percentages, even amid uncertainty in some data.  Neural networks are not as limited
regarding value ranges as some other methodologies.

In their graphical representation, neural nets are drawn using nodes and neurons.  The thicker or
darker the line between nodes, the stronger the connection represented by that neuron.  Stronger
neurons equate to a stronger ability by that attribute to predict.  Although the graphical view can
be  difficult  to  read,  which  can  often  happen  when  there  are  a  larger  number  of  attributes,  the
computer  is  able  to  read  the  network  and  apply  the  model  to  scoring  data  in  order  to  make
predictions.  Confidence percentages can further inform the value of an observation’s prediction,
as  was  illustrated  with  our  hypothetical  athlete  Lance  Goodwin  in  this  chapter.    Between  the
prediction and confidence percentages, we can use neural networks to find interesting observations
that  may  not  be  obvious,  but  still  represent  good  opportunities  to  answer  questions  or  solve
problems.

Yüklə 4,8 Kb.

Dostları ilə paylaş:

1 ... 47 48 49 50 51 52 53 54 ... 65