Scikit-Learn
7
As seen in the example above, it uses
train_test_split()
function of scikit-learn to split
the dataset. This function has the following arguments:
X, y
: Here,
X
is the
feature matrix
and
y
is the
response vector
, which need to
be split.
test_size
: This represents the ratio of test data to the total given data. As in the
above example, we are setting
test_data = 0.3
for 150 rows of X. It will produce
test data of 150*0.3 = 45 rows.
random_size
: It is used to guarantee that the split will always be the same. This
is useful in the situations where you want reproducible results.
Train the Model
Next, we can use our dataset to train some prediction-model. As discussed, scikit-learn
has wide range of
Machine Learning (ML) algorithms
which have a consistent interface
for fitting, predicting accuracy, recall etc.
In the example below, we are going to use KNN (K nearest neighbors) classifier. Don’t go
into
the details of KNN algorithms, as there will be a separate chapter for that. This
example is used to make you understand the implementation part only.
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)
from sklearn.neighbors import KNeighborsClassifier
from
sklearn import metrics
classifier_knn = KNeighborsClassifier(n_neighbors=3)
Scikit-Learn
8
classifier_knn.fit(X_train, y_train)
y_pred = classifier_knn.predict(X_test)
# Finding accuracy by comparing actual response values(y_test)with predicted
response value(y_pred)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
# Providing sample data and the model will make prediction out of that data
sample = [[5, 5, 3, 2], [2, 4, 3, 5]]
preds = classifier_knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds] print("Predictions:",
pred_species)
Dostları ilə paylaş: