Data Mining
for the Masses
234
but that it would be unethical, we can write code to require usernames and
passwords, making it more difficult for users to get into each other’s personal
information. Further, we can write a code of conduct, usually referred to as an
Acceptable Use Policy, which dictates what users can and cannot do. The policy is
not a law, that is, it is not enacted or enforced by a government, but it is an
agreement to abide by certain rules or risk losing the privilege of using the site’s
services.
Social Norms: This form of determining what is ethical is based on what is
acceptable in our society. As we look around us, interact with our friends, family,
neighbors, and associates, ethical bounds can be established by what is acceptable
to these people. Often, if we would be embarrassed, humiliated or otherwise
shamed by our behavior, if we find ourselves wanting to hide what we’re doing
from others, we have a strong indication that our activity is not ethical. We can
also contribute to the establishment of social norms as ethical
guides by making our
own expectations of what is acceptable clear to others.
Organizational Standard Operating Procedures: Ethical standards can often be
established by creating a set of acceptable practices for your organization. Such an effort
should be undertaken by company leadership, with input from a broad cross-section of
employees. These should be well-documented and communicated to employees, and
reviewed regularly. Checks and balances can be built into work processes to help ensure
that workers are adhering to established procedures.
Professional Code of Conduct: Similar to organizational operating standards,
professional codes of conduct can help to establish boundaries of ethical conduct. The
aforementioned Association for Computing Machinery maintains a Code of Ethics and
Professional Conduct that is an excellent resource for computing professionals seeking
guidance (
http://www.acm.org/about/code-of-ethics
). Other organizations also have
codes of conduct that could be consulted in order to frame ethical decision making in data
mining.
Immanuel Kant’s Categorical Imperative: Immanuel Kant was a German
philosopher and anthropologist who lived in the 1700’s. Among his extensive writings on
Chapter 14:
Data Mining Ethics
235
ethical morality, Kant’s Categorical Imperative is perhaps his most famous. This maxim
states that if a given action cannot ethically be taken by
anyone in a certain situation, then it
should not be taken at all. In data mining, we could use this philosophy to determine:
Would it be ethical for any business to collect and mine these data? What would be the
outcome if every business mined data in this way? If the answers to such questions are
negative and appear to be unethical, then we should not undertake the data mining project
either.
Rene Descartes’ Rule of Change: Rene Descartes was a French philosopher and
mathematician who like Kant, wrote extensively about moral decision making. His rule of
change reflects his mathematical background. It states that if an act cannot be taken
repeatedly, it is not ethical to do that act even once. Again to apply this to data mining, we
can ask: Can I collect and mine these data on an ongoing basis without causing problems
for myself, my organization, our customers or others? If you cannot do it repeatedly,
according to Decartes, then you shouldn’t do it at all.
There are a few other ways that are not quite as specifically defined that you can use to seek out
ethical boundaries. There is the old adage known as the Golden Rule, which dictates that we
should treat others the way we hope they would treat us. There are also philosophies that help us
to consider how our actions might be perceived by others and how they might make them feel.
Some ethical frameworks are built around actions that will bring the greatest good to the largest
number of people.
CONCLUSION
We can protect privacy by aggregating data, anonymizing observations through removal of names
and personally identifiable information, and by storing it in secure and protected environments.
When you are busy working with numbers, attributes and observations, it can be easy to forget
about the people behind the data. We should be cautious when data mining models might brand a
person as a certain risk. Be sensitive to peoples’ feelings and rights. When appropriate, ask for the
their permission to gather and use data about them. Don’t rationalize a justification for your data
mining project—ensure that you’re doing fair and just work that will help and benefit others.