Concept
of Database
Database
and database systems have become an essential part of our everyday life
nowadays. Example of database are preparation of census record, preparation of
result sheet, deposit and withdrawal from bank, airline reservation, accessing
computerized library, purchase item from supermarket etc. in all these case
databases are used.
Data:
Data is the known facts and figure about a person, place, event or things
which can be record in the form of number, text, picture, audio, video, etc.
Example: Manoj K.C., Data is
processed to produce meaningful and useful information.
Information: Information
is the processed value which we get after processing data by the computer.
Information is very important and useful to us to make right decision at right
time. Database system provides us the right information by processing the
collected data in the database.
SN
|
Name
|
Height
|
1
|
Manoj
|
5
|
2
|
Shankar
|
6
|
3
|
Niroj
|
7
|
Field:
A field is the property or attribute of a table. Example SN, Name, Weight are
fields.
Records:
A record is the collection of interrelated fields. Example there are three
records in above table.
Table:
A table is the arrangement of rows and columns. Each table must have unique
name and must be simple. A row defines a record and a column defines a field.
Data
Management
Two
types of data Management
1.
File
Management Systems 2.
Database Management System
Flat
file or file based system
Before
the database system has come in use, people used to keep records in file based
or flat file system. The flat file based system works well when the no of
records is small and there is not necessity for cross-referencing and
processing of information in the file.
Example Quark X, Excel
Limitation
of flat file system
I.
Inconsistent data
|
II.
Duplication of data.
|
III.
Data dependence
|
IV.
Incompatible file formats
|
V.
Fixed queries
|
VI.
Speed of operation is slow
|
Database: A database is an organized collection of
logically related data that contain information.
Database: A database
is an organized collection of related data where data can be known facts and figure about
person, place, event or things. For e.g. phone diary, result sheet. Database
contains only data not information so it should be analyzed to get required
information.
The
database is called is also called repository or container for a collection
of data file. Example: - University database maintains information
about students, courses, and grade.
So, database is a collection of tables,
views, index, other object to serve a specific purpose such as searching,
sorting, recombining data etc. in computer system
Database Management System: A general-purpose DBMS is a software system designed to
allow the definition, creation, querying, update, and administration of
databases. In other words, it is a set of computer software or program used to
control the reading and writing of data from and to a database.
Oracle,
Microsoft SQL Server, DB2, MySQL, dbase,MS-Access, etc. are DBMS software.
Database
Management System that maintains relationship between multiple data file is
called relational database Management System (RDBMS).
Advantages
of database over flat or file based system
1.
Reduction of data redundancies
|
2.
Shared data
|
3.
Support for concurrent transactions
|
4.
Data independent
|
5.
Improved integrity
|
6.
Unforeseen queries can be answered
|
7.
Efficient data access
|
8.
Multiple user interface
|
|
9.
Improved security
|
10.
Improved backup and recovery
|
Functions
of DBMS
The
major functions of DBMS are as follows:
1.
Creating database file: The
creation of database file involves in naming the database file and
determining the required table, field type, field width etc.
2.
Entering database record:
Database
application is used to insert records in the database file by checking the
redundancy and validation of data.
3.
Sorting the database
records: DBMS is used to sort the records on the basis of
numerical or alphabetical in ascending or descending order.
4.
Deleting records: DBMS
is used to delete the unnecessary records from the database.
5.
Updating records:
DBMS is used to modify the existing records according to the requirement of
the users.
6.
Searching records:
DBMS searches the specific record according to the users need.
7.
Merging database file: DBMS
is used to merge the records from two different files into a single database
file.
8.
Copying records:
It is used to make duplicate copy of the complete database or copying only
required records to the next database file.
9.
Printing reports: DBMS
is used to print the required records for reports.
10.
Backup database:
It is used to back up the database for the recovery of the database from
the accidental data loss or corruption.
11.
Provide security to data:
DBMS
protects the data in the database from unauthorized access or modification.
Objective
of DBMS
I.
Making access to the data easy for the
user.
II.
Provide quick response to the user’s
request for the data.
III.
Making the latest modification to the
database available immediately.
IV.
Eliminate redundant (duplicate) data from
the database.
V.
Allows multiple users to share the
database at one time
VI.
Allows the growth of database system.
VII.
Provide data security by protection the
data from physical harm and unauthorized access.
Disadvantages
of DBMS
·
Expensive: Database software is very expensive for
large computer systems. It also requires overhead costs for maintain and
integrity functions.
·
Changing technology:
It is fast changing technology.
·
Needs technical training:
It is complex to understand and implements. So, proper training is required for
stagg to worj properly in the database system.
·
Backup is needed:
It needs to explicit back up. This adds costs as new storage space are needed
to hold the data.
Database
Model / Data Model
The
process of designing database schema is called database modeling. The database
modeling describes the structure of database such as no. of tables, data
types, relationship, constraints etc. Hence database modeling is called
data modeling.
Purpose
of data modeling
1.
To design the fully normalized database.
|
2.
To improve the performance of database system.
|
3.
To reduce the data redundancy.
|
4.
To established proper relationship between the
tables in the database.
|
5.
To improve faster sorting and indexing.
|
6.
To simplify the design of structure of table.
|
Database
design principles: The main goal of database design is to
manage data in such a way to minimize data redundancy, anomalies and efficient
in executing without losing any information.
The
database designing should consider following points
i.
Reduce redundancy: For a good database design, the
redundancy or repetition of data must be omitted. Repetition of the data in two
or more tables can cause the loss of data and decrease the efficient of database
design.
ii.
Normal forms: To make database efficient, database
design has identified a number of special forms, properties and constraints
that a table may possesses to achieve certain goal such as minimizing
redundancy. These forms are called normal forms. There are 7 normal forms: First
normal form (1NF), Second normal form (2NF), Third normal form (3NF),
Boyce-Codd normal form (BCNF) Fourth normal form (4NF), Fifth normal form (5NF)
and Sixth Normal form (6NF).
iii.
Reducing functional dependency: The
relation with a set of attribute is said to functionally determine another
value. This process is written as XàY if and only if
each value is associate with precisely to one Y value. Here, X is determinant
attribute and Y of dependent. In simple word, if X value is known, Y value is
certainly known.
Example: Customer table
Customer_number
|
Last_name
|
First_name
|
Phone_number
|
01
|
K.C.
|
Manoj
|
9847678700
|
02
|
Sharma
|
Shankar
|
9867623602
|
03
|
Kadel
|
Sujata
|
9867146209
|
Here,
In customer table, last name (B) is functionally dependent on customer number
(A).
So,
B is functionally dependent on A. often written as A®B
If
B is functionally dependent on A, So that A functionally determines B
iv.
Normalization: In
a design of RDBMS, the process of organizing data to minimize redundancy is
called Normalization. Normalization involves dividing
large table into small tables and defining relationship between them. So
that additions, deletion and modification of a field can be made in just one
table and propagate through a rest of the database via the define relationship.
Example: Not normalized
table
Roll no
|
Name
|
Address
|
Phone
|
Cell number
|
1
|
Manoj
|
Sitalchaya
|
068520145
|
9867633008
9805238046
|
2
|
Shankar
|
Srijan tole
|
069520147
|
9087466872
9885265456
|
3
|
Swostika
|
Ramrekha
|
068520149
|
9080676498
9833252542
|
Example of 1NF
Roll No
|
Name
|
Address
|
Phone
|
Cell number
|
1
|
Manoj
|
Sitalchaya
|
068520145
|
9867633008
|
1
|
Manoj
|
Sitalchaya
|
068520145
|
9805238046
|
2
|
Shankar
|
Srijan tole
|
069520147
|
9087466872
|
2
|
Shankar
|
Srijan tole
|
069520147
|
9885265456
|
3
|
Swostika
|
Ramrekha
|
068520149
|
9080676498
|
3
|
Swostika
|
Ramrekha
|
068520149
|
9833252542
|
Types of Database Model: Various
types of database models are as follows
A) Hierarchical
Model: It is one of the oldest database models. This model
arranges the files used in the database in top-down structure which is similar
to an upside-downside tree.
Advantages
a.
It is easiest model of database.
b.
Searching is fast and easy if parent
is known.
c.
It has one or more attributes.
d.
This model is very efficient in handling
' one-to-many 'relationship.
Disadvantages
- It is old fashioned and outdated database model.
- Modification and addition of the child node is very hard. Hence, it is non-flexible database model.
3.
It can't handle 'many-to- many' relationship.
4.
It increases redundancy because same
data is to be repeated in different places.
Fig.
Hierarchical Model
B) Network
Model: In this model, each child can be linked with more than
one parent. So the records can be accessed from more than one parent, which are
linked. This model is more flexible and has multidimensional connection.
Fig. Network Model
Advantages
1.
More flexible than hierarchical model.
2.
Reduces data redundancy because
similar data is not stored in more than one file.
3.
Searching is faster because of
multidimensional pointers.
4.
It accepts many-to-many relations, so
it is more flexible.
5.
The network model is simple and easy
to design.
Disadvantages
1.
It is very complex to design
2.
Needs long program to handle the
relationship.
3.
Pointer, need in the database, model
increase overheard of database storages.
4.
Less Security model because data can
be accessed from any parent.
C) Relational
Model: The relational model uses a collection of table to
represent both data and relationships among these data. In relational model,
data are organized into table (i.e. rows and columns). These tables are
called relations. Rows of relations are generally referred as tuples and
columns are referred as attribute. A row in a table represents a
relationship among set of data values.
Advantages
1.
Since one table is linked to other
tables with some common fields, rules implemented on one table can be easily
implemented to another table.
2.
Referential integrity can be easily
implemented.
3.
The database has very less data
redundancy.
4.
Normalization of database is possible.
5.
Rapid data processing and searching is
possible.
Disadvantages
1.
It is more complex than other model
due to relationship (link) with other tables.
2.
Too many rules make the database not
very user-friendly.
3.
It needs more powerful computers and
data storage devices.
Key terms used in Relational Database Model
I.RDBMS: It
stands for Relational Database Management System. The database system which is
used to manage and manipulate the data in the relational database model is
called RDBMS. RDBMS stores the data in different tables and relationship
between the tables can be formed by using common field (Column).
Advantages:
Reduces data redundancy.
It
supports client-server technology where data is stored in the server and it is
shared with multiple client at same time.
Example:
SQL Server, MySQL, Oracle
II. Domain:
A domain is a set of permissible value that can be given to an attribute or
field of entity. Each column or field has specific domain and value to these
attributes or field will not accept outside of their domain.
Example:
Roll_no
|
Name
|
Class
|
Age
|
Cell_number
|
01
|
Manoj
|
12
|
18
|
9847678720
|
02
|
Shankar
|
12
|
19
|
9867623672
|
03
|
Swostika
|
12
|
20
|
9867146299
|
If
domain of Age attribute is set between 17-24, the value outside the
range are invalid.
III. Attribute:
Attributes are the properties which describe the entity. Attribute are also
called elements, property or field of tables.Example: A student entity may be described
by roll no, address, name etc.
Attribute
to an entity is of various types
Simple
and Composite: The attribute that cannot be further
divided into smaller parts and represents the basic meaning is called simple
attribute. Example: roll no
Composite:
Attribute that can be further divide into smaller unit and each individual
contain specific meaning
Example
name which can be further divided into last name, first name, middle name.
Single
value and Multivalued: Attributes having a single value for a particular entity is
called single value attribute.
Example: age
Attributes that consists more than one values for a particular
entity is called multivalued attribute. Example: phone number.
Tuple: Each row in the table represent a
record is called a tuple. Each tuple in a relation or table represents a
particular entity instance. i.e horizontal row is called tuple
D)
Entity Relationship Model (E-R) Model:
It is a graphical representation of entities and relationship in a database.ER
model is a logical structure developed to facilities database design.
ER
model has 4 components
Entity:
An entity is an object or things in real world such as person, car, house, employee,
university etc. Example: The student id 1V7001 uniquely identifies a particular
student in such organization. The complete set of fields or columns is called
entity instance.
Entity
identifier:
An entity has an attribute whose values are distinct for each individual
entity. It is done using primary key or unique constraints in a table.
Example Student ID
Relationship: A relation is a link or association
between two or more entities in a database. These should be a common field
between the two entities for relationship and is established by linking the
foreign key in a child table with primary key in a master table.
Example: relation between teacher and student entity
There are three types of relationship
I.
One –To- one relationship: If one instance of an entity is
related with one instance of another entity.
Example: The relationship between
college entity and principal entity.
II.
One- To-Many Relationship: If one instance of an entity is
related with many instance of other entity. Example: The relationship between
college and student entity as one college has many students.
III.
Many -To- Many Relationship: If many instance of one entity are
related with many instance of another entity.
Example: Teacher entity and student
entity consider as many to many relationship.
Normalization:
Normalization is the process of
breaking down a big table into many smaller tables with few fields. It is the
process of organizing data in a database to reduce data redundancy and
inconsistency and to make data model more flexible and easier to maintain.
Rules used for normalizing the
database are known as Normal Form. Normal forms are used to ensure that tables
in database are fully normalized.
Many normal forms are defines, The
1NF, 2NF,3NF, Boyce-Codd Normal form, 4NF, 5NF, Domain Key Normal Form (DKNF)
and 6NF. If a database is in 6NF, then the database is fully normalized.
The most used are
·
First
Normal Form (1NF)
·
Second
Normal Form (2NF)
·
Third
Normal Form (3NF)
Advantages
1.
It
reduces the data redundancy.
2.
It
improves faster sorting and index creation
3.
It
improves the performance of the database
4.
It
simplifies the structure of tables
5.
It
avoids the loss of information
Disadvantages
1. It is complex to design due to the
relationship between different tables.
2. Requires more CPU cycle, large
memory and input / output device to data process software.
3. Requires more joins to get the
desire result. A poorly-written query can bring the database down.
4. Maintenance overhead. At the higher
the level of normalization, a large number of tables are created in the
database.
Types
of normalization
The
following table is in denormalized form:
Consider
the schema and its tabular format with data
Student_Teacher(
Roll_no, Name, Address, Phone, DOB, Tid, Dept, Tname, Cell_number)
Roll_no
|
Name
|
Address
|
Phone
|
DOB
|
Tid
|
Dept
|
Tname
|
Cell_number
|
1
|
Manoj
|
Sijalchaya
|
068520001
|
2052/01/01
|
12
|
Management
|
Santosh
|
9849477735
9849422200
|
2
|
Shankar
|
Srijan tole
|
069552002
|
2052/02/02
|
11
|
Science
|
Narayan
|
9832456789
9823645478
|
3
|
Swostika
|
Ramrekha
|
068520001
|
2053/03/03
|
10
|
Humanities
|
Balkrishna
|
9898456200
|
This
table is not in Normal form because data field Cell_number consists multiple
data on it.
First
Normal Form: In the above table, field Cell_number
holds multiple values in one cell. The purpose of 1NF is to eliminate repeating
groups of attribute in an entity.
After
applying the rule of 1NF
Student_Teacher
(Roll_no, Name, Address, Phone, DOB, Tid, Dept, Tname, Cell_number)
Roll_no
|
Name
|
Address
|
Phone
|
DOB
|
Tid
|
Dept
|
Tname
|
Cell_number
|
1
|
Manoj
|
Sitalchaya
|
068520001
|
2052/01/01
|
12 |
Management
|
Santosh
|
9849477735
|
1
|
Manoj
|
Sitalchaya
|
068520001
|
2052/01/01
|
12 |
Management
|
Santosh
|
9849422200
|
2
|
Shankar
|
Srijan tole
|
069552002
|
2052/02/02
|
11
|
Science
|
Narayan
|
9832456789
|
2
|
Shankar
|
Srijan tole
|
069552002
|
2052/02/02
|
11
|
Science
|
Narayan
|
9823645478
|
3
|
Swostika
|
Ramrekha
|
068520004
|
2053/03/03
|
10
|
Humanities
|
Balkrishna
|
9898456200
|
This
table is in 1NF because every field of the table is atomic.
Second
Normal Form: A relation is said to be in 2NF if
the relation must first fulfill the requirements to be in First Normal Form
(1NF) and each non-key attribute in the relation must be functionally dependent
upon the primary key.
Student(Roll_no,Name,Address,Phone,DOB)
Teacher(Tid,Tname,
Cell_number)
Std_Tea(Roll_no,Tid)
Roll_no
|
Name
|
Address
|
Phone
|
DOB
|
1
|
Manoj
|
Sitalchaya
|
068520001
|
2052/01/01
|
2
|
Shankar
|
Srijan tole
|
069552002
|
2052/02/02
|
3
|
Swostika
|
Ramrekha
|
068520004
|
2053/03/03
|
Tid
|
Tname
|
Cell_number
|
12
|
Santosh
|
9849477735
|
12
|
Santosh
|
9849422200
|
11
|
Narayan
|
9832456789
|
11
|
Narayan
|
9823645478
|
10
|
Balkrishna
|
9898456200
|
Roll_no
|
Tid
|
1
|
12
|
2
|
11
|
3
|
10
|
Third
Normal Form: A relation is said to be in 1NF, a
table must be in 2NF and all attribute that are not dependent upon the primary
key must be eliminated.
Student1 (Roll_no, Name, Address,
Phone)
Student2 (Roll_no, DOB)
Roll_no
|
Name
|
Address
|
Phone
|
1
|
Manoj
|
Sitalchaya
|
068520001
|
2
|
Shankar
|
Srijan tole
|
069552002
|
3
|
Swostika
|
Ramrekha
|
068520004
|
Roll_no
|
DOB
|
1
|
2052/01/01
|
2
|
2052/02/02
|
3
|
2053/03/03
|
Denormalization: Denormalization
is the reverse process of normalization. Denormalization is the process of
designing fewer tables with more fields in a database. Denormalization means
combining many tables into single table
SQL Structured
Query Language is a special-purpose
programming language designed for managing data held in
a relational
database management system (RDBMS).
SQL
consists of a data
definition language and a data manipulation
language. The scope of SQL includes data
insert, query, update and delete, schema creation and modification, and data access control.
Syntax:
select <column1,column2> from <table_name> where <condition>
Example:
select rollno, name, phone from student where class=12.
SQL
has different statement for DDL, DCL, DML
Features
of SQL
1.
It
is a non procedural language.
2.
It
is English like language
3.
It
can process a single record as well as sets of records at a time.
4.
It
is 4th GL as all SQL statements define what is to be done rather than how it is
to be done.
5.
SQL
is a data sub-language consisting of their built in language: Data definition
language(DDL), Data Manipulation language (DML) and data Control language (DCL)
6.
It
insulates the user from underlying structure and algorithm.
7.
It
has the facilities for defining database tables, views, security, integrity,
transaction control, index etc.
DCL:
Data Control Language statements are used to change the permission associated
with the database user. DCL commands are normally available to Database
Administrator. DCL statements are GRANT, DENY and REVOKE.
DDL:
Data Definition Language statements are used to create, modify and drop
database and database object like table, index, views etc.
DDL
statements are CREATE, ALTER and DROP.
DM:L
Data Manipulation Language statements are used to retrieve required data from
database, insert (data) into database, modify existing data and delete
unnecessary data from database.
Centralized versus Distributes database
Centralized
database: The centralized database has one central computer
called database server to store all the data and files and it provides services
to the entire client in the network. Central database services are responsible
for processing data. It is suitable for small organization which has different
department.
All
clients are responsible for presenting the required records to the users in the
format they desired.
fig: Centralized database
Advantages
i.
Low
cost to set up.
ii.
High
performance.
iii.
Centralization
of all data in a single computer called server.
iv.
Easier
to manage and manipulate data and database.
v.
High
security as a single Database Administrator can control the whole database system.
vi.
Suitable
for small organization with different departments.
vii.
It
is easier to manage and manipulate the data as data is stored only in a server
computer.
viii.
Easier
data access.
Disadvantages
1. Cannot cover large area and not
suitable for lager organization.
2. Database is location dependent,
cannot be access from other places.
3. It does not support globalized
connection.
Distributed database: It is a set of database
stored on multiple computers that appears to applications as a single database.
The users can simultaneously access and modify data in several databases in a
network. The computer in a distributed system communicates with each other
through various communication media such as high speed buses or telephones.
The
main difference between centralized database and distributed database is that,
in the centralized database, the data resides in one single centralized
computer while in distributed data is stored in several sets under the control
of local distributed DBMS components which are under the control of distributed
database system.
·
Useful
for large organizations which spread all around the world
·
Data security is very important because of
hacked or damaged during data transmission
·
Large number of users difficult to set
appropriate permission to them.
Advantages
1. Data sharing and distribution
controlled all over the world.
2. Improved reliability for users.
3. Improved availability of data.
4. Economy on operation and data
sharing.
5. Modular growth can support
Disadvantages
1. Higher software development cost.
2. Greater potential for bugs and
hacked.
3. Increased processing overhead for
client and server computers.
4. More complex in database design
5. Less security model because data may
travel continent to continent
6.
Internet
|
Difference
between Centralized and distributed database system
Centralized
Database
|
Distributed System
|
1.
Centralized database system is simple type
|
1.
Distributed database system is complex
|
2.
They are located on particular location
|
2.
They are located in many geographical locations
|
3.
It consists of only one sever
|
3.
It contains serves in several location
|
4.
It is only suitable for small organizations and
small scale operation
|
4.
It is suitable for large organizations.
|
5.
There is less chance of data lost
|
5.
More chances of data hacking , theft ans lost.
|
6.
Maintenance s easy and security is high.
|
6.
Maintenance is not easy and security is low as
compared to centralized database system
|
7.
Failure of server makes the whole system down.
|
7.
Failure of one server does not make the whole system
dowm
|
8.
There is no features of loads balancing
|
8.
There is feature of load balancing.
|
9.
Data traffic rate is high
|
9.
Data traffic rate is low
|
10.
Cost of centralized database system is low
|
10.
Cost of distributed system is high.
|
Data
Security: Data security is the method of keeping data protected
from corrupted, modification by unauthorized user, leakage and physical harms.
Data plays very important roles to give right information, decision and
planning at right time. So, data must remain correct and unchanged and for this
privacy must be adopted to make the data secure.
Method
of data securities are as follows
1.
Regular
backup of database in disk, tapes, optical disks etc. in order to prevent from
accidental loss.
2. Using
password to login in system to prevent data from unauthorized access to the
database.
3. Specifying
the specific roles to every user of the database for granting the appropriate
permission to them.
4. Making
physical prevention by using stabilizer and UPS to supply a regular power
through which we can prevent hardware and software from high electricity
voltage and irregular power supply.
5. Keeping
the system in safe room or place with lock and key or under the supervision of
watchman to prevent from theft.
6. Implementing
software protection like antivirus, firewalls, antispyware, etc.
Data
integrity: Data integrity refers to validity of data
contained in a database. Data integrity refers to the consistency and accuracy
of data that is stored in a database. Collected data should be consistent and
accurate as it it the basic elements of database to provide right information
at right time, to right decision and make right planning for future.
Types
of data integrity: There are 3 types of data integrity
Domain
Integrity: Domain or field integrity refers to the set of data
values that are valid for a column or field. For example: Age of the student
must be in a range of 0 to 25.
RN
|
Name
|
Address
|
Age
|
1
|
Manoj
|
Srijana
tole
|
19
|
2
|
Shankar
|
Srinagar
|
18
|
Entity
Integrity: It specifies that all rows in a table have
a unique identifies as the primary key value.
Entity Integrity is maintained in a database through
primary key and unique constraints.
RN
|
Name
|
Address
|
Age
|
Class
|
1
|
Sujata
|
Srijana
tole
|
19
|
12
|
2
|
Purnima
|
Srinagar
|
18
|
11
|
Referential Integrity: It
related between two tables in a database. It ensures that the relationship
between the primary key in the master table and foreign key in a child table
are always maintained.
Data
dictionary: It is a system file that contain
information of system or data of data i.e metadata. Data dictionary is used to
collect, document and organized specific facts about the system including the
content of data flow, data store, external entities and process.
It
describes the type of data that is stored. All database system to keep track of
data help user to find data when they needed.
Database
administrator: DBA is a person who is responsible for
maintaining the RDBMS in an organization because the same data resources of the
database are shared by the staffs, and users. Such a chief person who is
assigned to take care of database and DBMS software is called Database
Administrator.
Responsibilities
of DBA
1.
Installing and upgrading database server.
|
2.
Working with program developers.
|
3.
Monitoring the database server and tuning
accordingly.
|
4.
Transferring data from one computer to another
|
5.
Using the database storage properly.
|
6.
Scheduling jobs and events.
|
7.
Performing backup and recovery of data.
|
8.
Replicating data in multiple servers.
|
9.
Managing database users and security.
|
|
Qualities
of a good DBA
1.
The
DBA should have depth knowledge of OS in which data server is running.
2.
She/he
should have sound knowledge of SQL.
3.
She/he
should have sound knowledge of good database design.
4.
She/he
should have understanding of network architecture.
5.
Good
knowledge of database server.