Introduction to Bioinformatics
A Course Material
András Budinszky, Péter Gál, Sándor Pongor
Pázmány Péter Catholic University
Faculty of Information Technology
Budapest, Hungary
budinszky@itk.ppke.hu, gal@enzim.hu, pongor@icgeb.org
Summary – In this paper we discuss the de-
velopment of an introductory course for bioin-
formatics. We list the necessary requirements,
the competency aimed to achieve, the topics
covered by the chapters, and finally some con-
siderations for teaching.
Keywords - molecular biology; component;
bioinformatics; course
I.
I
NTRODUCTION
One way bioinformatics can be broadly de-
fined as the management of the life sciences.
As an applied science it uses computer pro-
grams to process data archived by modern
molecular biology and thus to derive useful
new information.
The importance of bioinformatics has
grown enormously in the last decade due to
the advance and development of high-
throughput data acquisition methods primarily
that of sequencing. High-throughput sequenc-
ing techniques (e.g. next generation DNA
sequencing) generate a flood of valuable se-
quence data which is a challenge for the scien-
tists.
The aim of developing this course material
is to provide the students basic knowledge in
bioinformatics. The subject of this course is
meant to strengthen the bioinformatics prob-
lem solving competency of the students as
well as their ability to communicate with life
science professionals who are the ultimate
users of bioinformatics. By taking this course,
the students should be able to determine the
types of questions the computer programs
(“tools”) – developed to work with genome
and protein data archives – can answer, and to
use these tools to gain answers to such ques-
tions.
In addition, our teaching material can also
be useful for biologists who want to under-
stand the algorithms that are behind the fre-
quently used applications of the net (e.g.
BLAST).
A.
Prerequisites
Students are supposed to have taken a
course on molecular biology and have some
basic knowledge of biochemistry and molecu-
lar biology. Nevertheless, at the beginning of
the course a biology primer summarizes the
biological fundamentals necessary for this
course so the students can start from an equal
level.
They should have also completed an intro-
ductory database course since the data to be
processed by the bioinformatics tools are
stored in databases.
In addition, the students definitely should
be competent computer users; however, we do
not require knowledge of any specific pro-
gramming language, because during the
presentations very few algorithm details are
discussed and – when they are – they are pro-
vided in a so-called pseudo language which
can be understood without any programming
background.
B.
Some considerations
Bioinformatics is a relatively new area of
science; consequently it is a novel subject of
teaching.
11
In developing the teaching material we
used the latest editions of standard bioinfor-
matics textbooks, and the numerous websites
of universities, research institutes and public
databases (e.g. NCBI) related to this subject.
Of course we also used our experience in
teaching bioinformatics which has accumulat-
ed during the last decade.
Our approach is somewhat different from
the conventional way of teaching bioinformat-
ics. As our referee wrote “… this is the first
comprehensive bioinformatics course in Hun-
gary which is suitable for teaching students
who have only basic knowledge of biology.
For the first time the teaching material col-
lects the algorithms used in bioinformatics in
a way which is understandable not only for
mathematicians. After the course the students
will be able to understand, apply and even
further develop the most frequently used bio-
informatics algorithms.”
The choice of topics in bioinformatics is
very wide. Since this course is limited to one
semester, we had to restrict ourselves to an
essential core of material covering the most
standard bioinformatics tasks and had to leave
some areas untouched (e.g. drug discovery,
protein structure).
Another but smaller scale problem: bioin-
formatics is not standardized and – depending
on the authors – the meaning of terms might
change somewhat. In the associated terminol-
ogy file we provide the meanings for terms we
found most commonly accepted.
II.
R
ESULT
During the development of the course ma-
terial we created 12 chapters with 465 slides.
A number of the chapters were necessary
to be developed for providing background
information:
either
biology/database
knowledge, or detailed method descriptions of
various biological data collections.
The first two chapters provide reviews of
molecular biology and databases. This helps
students to get on an equal level of the pre-
requisites.
Chapter 3 and 4 cover the most widely
known areas of bioinformatics, namely the
sequence alignment algorithms and the strate-
gies of BLAST in details. The students are
thought a couple of particularly key points in
these chapters:
The cost and the importance of ex-
pected execution time is introduced.
The difference between exhaustive al-
gorithms (Needleman-Wunsch and
Smith-Waterman) and heuristic algo-
rithms (FASTA and BLAST) is em-
phasized.
The fifth chapter deals with the generation
of DNA databases: DNA cloning and se-
quencing. The students can have an insight
into the most frequently used molecular biol-
ogy methods to manipulate DNA.
The sixth chapter summarizes our current
knowledge of proteomics. Proteomics is a
brand new subject since the high-throughput
methods for analyzing proteomes are lagged
behind the methods of DNA analysis. Howev-
er, it is not difficult to predict that proteomics
will be one of the most important areas of
bioinformatics in the future
Chapter 7 discusses the different DNA and
protein sequencing algorithms.
In Chapter 8 we give a picture about the
methods suitable for analyzing gene expres-
sion. The most important method is the DNA
microarray which is discussed in detail. We
also deal with the more conventional methods
(e.g. EST databases) and the application of
gene expression data.
The ninth chapter details different algo-
rithms that can be used for gene prediction in
DNA sequences.
Chapter 10 discusses how various data
mining techniques are used for clustering
genes based on their functionalities.
12