santoshspoudel: August 2019

Concept of Database

Database and database systems have become an essential part of our everyday life nowadays. Example of database are preparation of census record, preparation of result sheet, deposit and withdrawal from bank, airline reservation, accessing computerized library, purchase item from supermarket etc. in all these case databases are used.

Data: Data is the known facts and figure about a person, place, event or things which can be record in the form of number, text, picture, audio, video, etc. Example: Manoj K.C., Data is processed to produce meaningful and useful information.

Information: Information is the processed value which we get after processing data by the computer. Information is very important and useful to us to make right decision at right time. Database system provides us the right information by processing the collected data in the database.

SN	Name	Height
1	Manoj	5
2	Shankar	6
3	Niroj	7

Field: A field is the property or attribute of a table. Example SN, Name, Weight are fields.

Records: A record is the collection of interrelated fields. Example there are three records in above table.

Table: A table is the arrangement of rows and columns. Each table must have unique name and must be simple. A row defines a record and a column defines a field.

Data Management

Two types of data Management

1. File Management Systems 2. Database Management System

Flat file or file based system

Before the database system has come in use, people used to keep records in file based or flat file system. The flat file based system works well when the no of records is small and there is not necessity for cross-referencing and processing of information in the file.

Example Quark X, Excel

Limitation of flat file system

I. Inconsistent data	II. Duplication of data.
III. Data dependence	IV. Incompatible file formats
V. Fixed queries	VI. Speed of operation is slow

Database: A database is an organized collection of logically related data that contain information.

Database: A database is an organized collection of related data where data can be known facts and figure about person, place, event or things. For e.g. phone diary, result sheet. Database contains only data not information so it should be analyzed to get required information.

The database is called is also called repository or container for a collection of data file. Example: - University database maintains information about students, courses, and grade.

So, database is a collection of tables, views, index, other object to serve a specific purpose such as searching, sorting, recombining data etc. in computer system

Database Management System: A general-purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases. In other words, it is a set of computer software or program used to control the reading and writing of data from and to a database.

Oracle, Microsoft SQL Server, DB2, MySQL, dbase,MS-Access, etc. are DBMS software.

Database Management System that maintains relationship between multiple data file is called relational database Management System (RDBMS).

Advantages of database over flat or file based system

1. Reduction of data redundancies	2. Shared data	3. Support for concurrent transactions
4. Data independent	5. Improved integrity	6. Unforeseen queries can be answered
7. Efficient data access	8. Multiple user interface
9. Improved security	10. Improved backup and recovery

Functions of DBMS

The major functions of DBMS are as follows:

1. Creating database file: The creation of database file involves in naming the database file and determining the required table, field type, field width etc.

2. Entering database record: Database application is used to insert records in the database file by checking the redundancy and validation of data.

3. Sorting the database records: DBMS is used to sort the records on the basis of numerical or alphabetical in ascending or descending order.

4. Deleting records: DBMS is used to delete the unnecessary records from the database.

5. Updating records: DBMS is used to modify the existing records according to the requirement of the users.

6. Searching records: DBMS searches the specific record according to the users need.

7. Merging database file: DBMS is used to merge the records from two different files into a single database file.

8. Copying records: It is used to make duplicate copy of the complete database or copying only required records to the next database file.

9. Printing reports: DBMS is used to print the required records for reports.

10. Backup database: It is used to back up the database for the recovery of the database from the accidental data loss or corruption.

11. Provide security to data: DBMS protects the data in the database from unauthorized access or modification.

Objective of DBMS

I. Making access to the data easy for the user.

II. Provide quick response to the user’s request for the data.

III. Making the latest modification to the database available immediately.

IV. Eliminate redundant (duplicate) data from the database.

V. Allows multiple users to share the database at one time

VI. Allows the growth of database system.

VII. Provide data security by protection the data from physical harm and unauthorized access.

Disadvantages of DBMS

· Expensive: Database software is very expensive for large computer systems. It also requires overhead costs for maintain and integrity functions.

· Changing technology: It is fast changing technology.

· Needs technical training: It is complex to understand and implements. So, proper training is required for stagg to worj properly in the database system.

· Backup is needed: It needs to explicit back up. This adds costs as new storage space are needed to hold the data.

Database Model / Data Model

The process of designing database schema is called database modeling. The database modeling describes the structure of database such as no. of tables, data types, relationship, constraints etc. Hence database modeling is called data modeling.

Purpose of data modeling

1. To design the fully normalized database.	2. To improve the performance of database system.
3. To reduce the data redundancy.	4. To established proper relationship between the tables in the database.
5. To improve faster sorting and indexing.	6. To simplify the design of structure of table.

Database design principles: The main goal of database design is to manage data in such a way to minimize data redundancy, anomalies and efficient in executing without losing any information.

The database designing should consider following points

i. Reduce redundancy: For a good database design, the redundancy or repetition of data must be omitted. Repetition of the data in two or more tables can cause the loss of data and decrease the efficient of database design.

ii. Normal forms: To make database efficient, database design has identified a number of special forms, properties and constraints that a table may possesses to achieve certain goal such as minimizing redundancy. These forms are called normal forms. There are 7 normal forms: First normal form (1NF), Second normal form (2NF), Third normal form (3NF), Boyce-Codd normal form (BCNF) Fourth normal form (4NF), Fifth normal form (5NF) and Sixth Normal form (6NF).

iii. Reducing functional dependency: The relation with a set of attribute is said to functionally determine another value. This process is written as XàY if and only if each value is associate with precisely to one Y value. Here, X is determinant attribute and Y of dependent. In simple word, if X value is known, Y value is certainly known.

Example: Customer table

Customer_number	Last_name	First_name	Phone_number
01	K.C.	Manoj	9847678700
02	Sharma	Shankar	9867623602
03	Kadel	Sujata	9867146209

Here, In customer table, last name (B) is functionally dependent on customer number (A).

So, B is functionally dependent on A. often written as A®B

If B is functionally dependent on A, So that A functionally determines B

iv. Normalization: In a design of RDBMS, the process of organizing data to minimize redundancy is called Normalization. Normalization involves dividing large table into small tables and defining relationship between them. So that additions, deletion and modification of a field can be made in just one table and propagate through a rest of the database via the define relationship.

Example: Not normalized table

Roll no	Name	Address	Phone	Cell number
1	Manoj	Sitalchaya	068520145	9867633008 9805238046
2	Shankar	Srijan tole	069520147	9087466872 9885265456
3	Swostika	Ramrekha	068520149	9080676498 9833252542

Example of 1NF

Roll No	Name	Address	Phone	Cell number
1	Manoj	Sitalchaya	068520145	9867633008
1	Manoj	Sitalchaya	068520145	9805238046
2	Shankar	Srijan tole	069520147	9087466872
2	Shankar	Srijan tole	069520147	9885265456
3	Swostika	Ramrekha	068520149	9080676498
3	Swostika	Ramrekha	068520149	9833252542

Types of Database Model: Various types of database models are as follows

A) Hierarchical Model: It is one of the oldest database models. This model arranges the files used in the database in top-down structure which is similar to an upside-downside tree.

Advantages

a. It is easiest model of database.

b. Searching is fast and easy if parent is known.

c. It has one or more attributes.

d. This model is very efficient in handling ' one-to-many 'relationship.

Disadvantages

It is old fashioned and outdated database model.
Modification and addition of the child node is very hard. Hence, it is non-flexible database model.

3. It can't handle 'many-to- many' relationship.

4. It increases redundancy because same data is to be repeated in different places.

Fig. Hierarchical Model

B) Network Model: In this model, each child can be linked with more than one parent. So the records can be accessed from more than one parent, which are linked. This model is more flexible and has multidimensional connection.

Fig. Network Model

Advantages

1. More flexible than hierarchical model.

2. Reduces data redundancy because similar data is not stored in more than one file.

3. Searching is faster because of multidimensional pointers.

4. It accepts many-to-many relations, so it is more flexible.

5. The network model is simple and easy to design.

Disadvantages

1. It is very complex to design

2. Needs long program to handle the relationship.

3. Pointer, need in the database, model increase overheard of database storages.

4. Less Security model because data can be accessed from any parent.

C) Relational Model: The relational model uses a collection of table to represent both data and relationships among these data. In relational model, data are organized into table (i.e. rows and columns). These tables are called relations. Rows of relations are generally referred as tuples and columns are referred as attribute. A row in a table represents a relationship among set of data values.

Advantages

1. Since one table is linked to other tables with some common fields, rules implemented on one table can be easily implemented to another table.

2. Referential integrity can be easily implemented.

3. The database has very less data redundancy.

4. Normalization of database is possible.

5. Rapid data processing and searching is possible.

Disadvantages

1. It is more complex than other model due to relationship (link) with other tables.

2. Too many rules make the database not very user-friendly.

3. It needs more powerful computers and data storage devices.

Key terms used in Relational Database Model

I.RDBMS: It stands for Relational Database Management System. The database system which is used to manage and manipulate the data in the relational database model is called RDBMS. RDBMS stores the data in different tables and relationship between the tables can be formed by using common field (Column).

Advantages: Reduces data redundancy.

It supports client-server technology where data is stored in the server and it is shared with multiple client at same time.

Example: SQL Server, MySQL, Oracle

II. Domain: A domain is a set of permissible value that can be given to an attribute or field of entity. Each column or field has specific domain and value to these attributes or field will not accept outside of their domain.

Example:

Roll_no	Name	Class	Age	Cell_number
01	Manoj	12	18	9847678720
02	Shankar	12	19	9867623672
03	Swostika	12	20	9867146299

If domain of Age attribute is set between 17-24, the value outside the range are invalid.

III. Attribute: Attributes are the properties which describe the entity. Attribute are also called elements, property or field of tables.Example: A student entity may be described by roll no, address, name etc.

Attribute to an entity is of various types

Simple and Composite: The attribute that cannot be further divided into smaller parts and represents the basic meaning is called simple attribute. Example: roll no

Composite: Attribute that can be further divide into smaller unit and each individual contain specific meaning

Example name which can be further divided into last name, first name, middle name.

Single value and Multivalued: Attributes having a single value for a particular entity is called single value attribute.

Example: age

Attributes that consists more than one values for a particular entity is called multivalued attribute. Example: phone number.

Tuple: Each row in the table represent a record is called a tuple. Each tuple in a relation or table represents a particular entity instance. i.e horizontal row is called tuple

D) Entity Relationship Model (E-R) Model: It is a graphical representation of entities and relationship in a database.ER model is a logical structure developed to facilities database design.

ER model has 4 components

Entity: An entity is an object or things in real world such as person, car, house, employee, university etc. Example: The student id 1V7001 uniquely identifies a particular student in such organization. The complete set of fields or columns is called entity instance.

Entity identifier: An entity has an attribute whose values are distinct for each individual entity. It is done using primary key or unique constraints in a table.

Example Student ID

Relationship: A relation is a link or association between two or more entities in a database. These should be a common field between the two entities for relationship and is established by linking the foreign key in a child table with primary key in a master table.

Example: relation between teacher and student entity

There are three types of relationship

I. One –To- one relationship: If one instance of an entity is related with one instance of another entity.

Example: The relationship between college entity and principal entity.

II. One- To-Many Relationship: If one instance of an entity is related with many instance of other entity. Example: The relationship between college and student entity as one college has many students.

III. Many -To- Many Relationship: If many instance of one entity are related with many instance of another entity.

Example: Teacher entity and student entity consider as many to many relationship.

Normalization: Normalization is the process of breaking down a big table into many smaller tables with few fields. It is the process of organizing data in a database to reduce data redundancy and inconsistency and to make data model more flexible and easier to maintain.

Rules used for normalizing the database are known as Normal Form. Normal forms are used to ensure that tables in database are fully normalized.

Many normal forms are defines, The 1NF, 2NF,3NF, Boyce-Codd Normal form, 4NF, 5NF, Domain Key Normal Form (DKNF) and 6NF. If a database is in 6NF, then the database is fully normalized. The most used are

· First Normal Form (1NF)

· Second Normal Form (2NF)

· Third Normal Form (3NF)

Advantages

1. It reduces the data redundancy.

2. It improves faster sorting and index creation

3. It improves the performance of the database

4. It simplifies the structure of tables

5. It avoids the loss of information

Disadvantages

1. It is complex to design due to the relationship between different tables.

2. Requires more CPU cycle, large memory and input / output device to data process software.

3. Requires more joins to get the desire result. A poorly-written query can bring the database down.

4. Maintenance overhead. At the higher the level of normalization, a large number of tables are created in the database.

Types of normalization

The following table is in denormalized form:

Consider the schema and its tabular format with data

Student_Teacher( Roll_no, Name, Address, Phone, DOB, Tid, Dept, Tname, Cell_number)

Roll_no	Name	Address	Phone	DOB	Tid	Dept	Tname	Cell_number
1	Manoj	Sijalchaya	068520001	2052/01/01	12	Management	Santosh	9849477735 9849422200
2	Shankar	Srijan tole	069552002	2052/02/02	11	Science	Narayan	9832456789 9823645478
3	Swostika	Ramrekha	068520001	2053/03/03	10	Humanities	Balkrishna	9898456200

This table is not in Normal form because data field Cell_number consists multiple data on it.

First Normal Form: In the above table, field Cell_number holds multiple values in one cell. The purpose of 1NF is to eliminate repeating groups of attribute in an entity.

After applying the rule of 1NF

Student_Teacher (Roll_no, Name, Address, Phone, DOB, Tid, Dept, Tname, Cell_number)

Roll_no	Name	Address	Phone	DOB	Tid	Dept	Tname	Cell_number
1	Manoj	Sitalchaya	068520001	2052/01/01	12	Management	Santosh	9849477735
1	Manoj	Sitalchaya	068520001	2052/01/01	12	Management	Santosh	9849422200
2	Shankar	Srijan tole	069552002	2052/02/02	11	Science	Narayan	9832456789
2	Shankar	Srijan tole	069552002	2052/02/02	11	Science	Narayan	9823645478
3	Swostika	Ramrekha	068520004	2053/03/03	10	Humanities	Balkrishna	9898456200

This table is in 1NF because every field of the table is atomic.

Second Normal Form: A relation is said to be in 2NF if the relation must first fulfill the requirements to be in First Normal Form (1NF) and each non-key attribute in the relation must be functionally dependent upon the primary key.

Student(Roll_no,Name,Address,Phone,DOB)

Teacher(Tid,Tname, Cell_number)

Std_Tea(Roll_no,Tid)

Roll_no	Name	Address	Phone	DOB
1	Manoj	Sitalchaya	068520001	2052/01/01
2	Shankar	Srijan tole	069552002	2052/02/02
3	Swostika	Ramrekha	068520004	2053/03/03

Tid	Tname	Cell_number
12	Santosh	9849477735
12	Santosh	9849422200
11	Narayan	9832456789
11	Narayan	9823645478
10	Balkrishna	9898456200

Roll_no	Tid
1	12
2	11
3	10

Third Normal Form: A relation is said to be in 1NF, a table must be in 2NF and all attribute that are not dependent upon the primary key must be eliminated.

Student1 (Roll_no, Name, Address, Phone)

Student2 (Roll_no, DOB)

Roll_no	Name	Address	Phone
1	Manoj	Sitalchaya	068520001
2	Shankar	Srijan tole	069552002
3	Swostika	Ramrekha	068520004

Roll_no	DOB
1	2052/01/01
2	2052/02/02
3	2053/03/03

Denormalization: Denormalization is the reverse process of normalization. Denormalization is the process of designing fewer tables with more fields in a database. Denormalization means combining many tables into single table

SQL Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS).

SQL consists of a data definition language and a data manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control.

Syntax: select <column1,column2> from <table_name> where <condition>

Example: select rollno, name, phone from student where class=12.

SQL has different statement for DDL, DCL, DML

Features of SQL

1. It is a non procedural language.

2. It is English like language

3. It can process a single record as well as sets of records at a time.

4. It is 4th GL as all SQL statements define what is to be done rather than how it is to be done.

5. SQL is a data sub-language consisting of their built in language: Data definition language(DDL), Data Manipulation language (DML) and data Control language (DCL)

6. It insulates the user from underlying structure and algorithm.

7. It has the facilities for defining database tables, views, security, integrity, transaction control, index etc.

DCL: Data Control Language statements are used to change the permission associated with the database user. DCL commands are normally available to Database Administrator. DCL statements are GRANT, DENY and REVOKE.

DDL: Data Definition Language statements are used to create, modify and drop database and database object like table, index, views etc.

DDL statements are CREATE, ALTER and DROP.

DM:L Data Manipulation Language statements are used to retrieve required data from database, insert (data) into database, modify existing data and delete unnecessary data from database.

Centralized versus Distributes database

Centralized database: The centralized database has one central computer called database server to store all the data and files and it provides services to the entire client in the network. Central database services are responsible for processing data. It is suitable for small organization which has different department.

All clients are responsible for presenting the required records to the users in the format they desired.

fig: Centralized database

Advantages

i. Low cost to set up.

ii. High performance.

iii. Centralization of all data in a single computer called server.

iv. Easier to manage and manipulate data and database.

v. High security as a single Database Administrator can control the whole database system.

vi. Suitable for small organization with different departments.

vii. It is easier to manage and manipulate the data as data is stored only in a server computer.

viii. Easier data access.

Disadvantages

1. Cannot cover large area and not suitable for lager organization.

2. Database is location dependent, cannot be access from other places.

3. It does not support globalized connection.

Distributed database: It is a set of database stored on multiple computers that appears to applications as a single database. The users can simultaneously access and modify data in several databases in a network. The computer in a distributed system communicates with each other through various communication media such as high speed buses or telephones.

The main difference between centralized database and distributed database is that, in the centralized database, the data resides in one single centralized computer while in distributed data is stored in several sets under the control of local distributed DBMS components which are under the control of distributed database system.

· Useful for large organizations which spread all around the world

· Data security is very important because of hacked or damaged during data transmission

· Large number of users difficult to set appropriate permission to them.

Advantages

1. Data sharing and distribution controlled all over the world.

2. Improved reliability for users.

3. Improved availability of data.

4. Economy on operation and data sharing.

5. Modular growth can support

Disadvantages

1. Higher software development cost.

2. Greater potential for bugs and hacked.

3. Increased processing overhead for client and server computers.

4. More complex in database design

5. Less security model because data may travel continent to continent

Internet

More difficult for general integrity

Difference between Centralized and distributed database system

Centralized Database	Distributed System
1. Centralized database system is simple type	1. Distributed database system is complex
2. They are located on particular location	2. They are located in many geographical locations
3. It consists of only one sever	3. It contains serves in several location
4. It is only suitable for small organizations and small scale operation	4. It is suitable for large organizations.
5. There is less chance of data lost	5. More chances of data hacking , theft ans lost.
6. Maintenance s easy and security is high.	6. Maintenance is not easy and security is low as compared to centralized database system
7. Failure of server makes the whole system down.	7. Failure of one server does not make the whole system dowm
8. There is no features of loads balancing	8. There is feature of load balancing.
9. Data traffic rate is high	9. Data traffic rate is low
10. Cost of centralized database system is low	10. Cost of distributed system is high.

Data Security: Data security is the method of keeping data protected from corrupted, modification by unauthorized user, leakage and physical harms. Data plays very important roles to give right information, decision and planning at right time. So, data must remain correct and unchanged and for this privacy must be adopted to make the data secure.

Method of data securities are as follows

1. Regular backup of database in disk, tapes, optical disks etc. in order to prevent from accidental loss.

2. Using password to login in system to prevent data from unauthorized access to the database.

3. Specifying the specific roles to every user of the database for granting the appropriate permission to them.

4. Making physical prevention by using stabilizer and UPS to supply a regular power through which we can prevent hardware and software from high electricity voltage and irregular power supply.

5. Keeping the system in safe room or place with lock and key or under the supervision of watchman to prevent from theft.

6. Implementing software protection like antivirus, firewalls, antispyware, etc.

Data integrity: Data integrity refers to validity of data contained in a database. Data integrity refers to the consistency and accuracy of data that is stored in a database. Collected data should be consistent and accurate as it it the basic elements of database to provide right information at right time, to right decision and make right planning for future.

Types of data integrity: There are 3 types of data integrity

Domain Integrity: Domain or field integrity refers to the set of data values that are valid for a column or field. For example: Age of the student must be in a range of 0 to 25.

RN	Name	Address	Age
1	Manoj	Srijana tole	19
2	Shankar	Srinagar	18

Entity Integrity: It specifies that all rows in a table have a unique identifies as the primary key value.

Entity Integrity is maintained in a database through primary key and unique constraints.

RN	Name	Address	Age	Class
1	Sujata	Srijana tole	19	12
2	Purnima	Srinagar	18	11

Referential Integrity: It related between two tables in a database. It ensures that the relationship between the primary key in the master table and foreign key in a child table are always maintained.

Data dictionary: It is a system file that contain information of system or data of data i.e metadata. Data dictionary is used to collect, document and organized specific facts about the system including the content of data flow, data store, external entities and process.

It describes the type of data that is stored. All database system to keep track of data help user to find data when they needed.

Database administrator: DBA is a person who is responsible for maintaining the RDBMS in an organization because the same data resources of the database are shared by the staffs, and users. Such a chief person who is assigned to take care of database and DBMS software is called Database Administrator.

Responsibilities of DBA

1. Installing and upgrading database server.	2. Working with program developers.
3. Monitoring the database server and tuning accordingly.	4. Transferring data from one computer to another
5. Using the database storage properly.	6. Scheduling jobs and events.
7. Performing backup and recovery of data.	8. Replicating data in multiple servers.
9. Managing database users and security.

Qualities of a good DBA

1. The DBA should have depth knowledge of OS in which data server is running.

2. She/he should have sound knowledge of SQL.

3. She/he should have sound knowledge of good database design.

4. She/he should have understanding of network architecture.

5. Good knowledge of database server.

santoshspoudel

Sunday, August 11, 2019

Database _class XII