santoshspoudel: Database

Data and Database Representation

Computer system organized data in hierarchy that begins with bits, proceeds to character /bytes, fields, record, file and database.

A bit represent smallest unit of data that can be processes by computer. It is either 0 or 1.

A group of 8 bit is called byte and it represents single character (letter, number, a symbol)

Logical grouping of character or a complete number forms fields.

Logical grouping of related field is called record. Example id, name, salary, phone number etc forms a record.

Logical grouping of related record is called file

Example record of all employees in an organization.

Logical grouping of related files is called database.

Example set of database field such as employee, department

Data Management

Two types of data Management

1. File Management Systems

2. Database Management System

Database: A database is an organized collection of logically related data that contain information. The database is called is also called repository or container for a collection of data file.

Example university database maintains information about students, curses, and grade.

Database Management System: A general-purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases. In other words, it is a set of computer software or program used to control the reading and writing of data from and to a database.

Oracle, Microsoft SQL Server, DB2, MySQL, dbase, MS-Access etc. are DBMS software.

Database Management System that maintains relationship between multiple data file is called relational database Management System (RDBMS).

Database System: It consists of database, Database Management System and application program. The application software that uses DBMS for data Management is called database system. Example Library Management system.

File Management System (FMS): It is also called flat file system. It stores data in a plain text file such as notepad, word file etc. A file management system is a type of software that manages data files in a computer system. It has limited capabilities and is designed to manage individual or group files, such as special office documents and records. It may display report details, like owner, creation date, state of completion and similar features useful in an office environment.

Advantages OF DBMS

1. Data Redundancy: Data redundancy means duplication of same data. Flat file system suffers from the problem of high data redundancy. Example record of a student may appear in library data files as well as examination data files. This redundancy leads to higher storage and access cost. Database system reduce the problem if data redundancy.

2. Data inconsistency: Database system can remove the problem of data inconsistency of automatically propagating data updates in one file in a database in data file. Data inconstancy occurs if changed data is reflected in data files in one place but not elsewhere in the system, Example if library data file contain cell number of the student as 9856700001 and examination data stores as 985670002 then we can say that data is inconsistent. Flat file systems suffer from data inconstancy.

3. Data isolation: As the data is stored in various files and file may be in different format, writing new application program to retrieve data is difficult in flat file system

Example:

Student Data file

D01,Divya,BSC,Baglung

D02,Manish ,BSC,Kathmandu

Book Data file

D01 Divya BSC Baglung

D02 Manish BSC Kathmandu

4. Difficulty in accessing data: File processing systems do not allow required data to be retrieved in efficient and convenient way. Example suppose we want to see the record of all customer who has balance less than 500, then we need to extract the data file manually or write a program to retrieve data. First is time consuming and is tedious and costly because if we need to see the record of customer who has balance less than 1000 then we need a new program. But in database system, it is very easy to write general program to generate different list on the basis of different criteria.

5. Integrity problem: Integrity means correctness of data before and after execution of a transaction. Example if maximum salary is 15000 then we have the integrity constraint salary<=15000. Integrity maintain correctness of data.

6. Atomicity problem: Transaction must execute at it's entirely or not at all. If the execution is not atomic, it leaves database in incorrect state. Example: Transaction that transfer 500 from account A to Account B. If the execution of transaction is failed at the specific point, it causes 500 to be deducted from account A without depositing it in account B. Thus it leads to an inconsistent state. File processing system do not guarantee atomic execution of transaction, hence it may occur. Database system guarantees atomicity of execution of transaction.

Example: Transaction that transfer 500 from account A to account B. If the execution of deducted to

7. Concurrent –access anomalies: Many systems allow multiple users to update data simultaneously. Consider a bank account containing 500. If two customer withdraw 100 and 50 at the same time, both transaction reads the old balance and withdraw from that old balance which will result in 400 and 450 respectively which is incorrect (inconsistent). Database system support concurrent of transaction on the same data without resulting into inconsistent.

8. Security problem: In database system, we create different user account and provide different authorized to different user. Thus we are able to hide certain information from other user. Example in a banking system, payroll personnel need to do only information about bank employee, they do not need to access information about customer account, but in file processing system do not allow us to create user account thus all user have equal access to the data. Due to this, difficulty in maintain security in file system.

Drawbacks

1. Initial Investment: To use database we need to purchase database management system as well as power computer to run it as database server. So, cost is high.

2. Dedicated Staff: To management database system efficiently we need to hire a technical sound dedicated staff called database administrator.

3. Overhead: we need to update and maintain database system time to time in order to make it effective with new technology. Thus overhead cost is high for DBMS.

When flat file system suitable when

1. System to be developed is simple and small.

2. Have to manage few data.

3. Security s not major concern.

4. Concurrent access is not need.

Application of DBMS

1. Airlines and railways: Airlines and railways use online databases for reservation and for displaying the schedule information.

2. Banking: Banks use databases for customer inquiry, account, loans, and other transactions.

3. Education: Schools and colleges use database for course registration, result and other information.

4. Telecommunications: Telecommunication departments use database to store information about the communication network, telephone numbers, record of calls, generating monthly bills etc.

5. Credit card transactions: Database is used for keeping track of purchase on credit card in order to generate monthly statements.

6. E-commerce: Integration of heterogeneous information sources

7. Health care information systems and electronic patient record: Databases are used for maintain the patient health care detail.

8. Digital libraries and digital publishing: Databases are used for management and delivery of large bodies of textual and multimedia data.

9. Finance: Data bases are used for storing information such as sales, purchase of stocks and bonds or data useful for online trading.

10. Sales: Databases are used to store product, customer and transaction details.

11. Human resources: Organization use databases for storing information about their employees, salaries, benefits, taxes and generating salary checks.

Database Instance and schemas

The overall structure of the database is called database schema. In relational model, the schema specifies it name, name of each field (attribute or column) and type of each field.

Example:

Employee(eid:string,endame:string,address:string,salary:interger,age:integer)

The collection of information stored in the database at a particular moment is called instance of the database. It is the actual content of the database at a particular point in time. Database instance changes frequently with every insertion, deletion and update operations performed in data stored in a database.

Example:

eid	ename	address	salary	age
001	Shashi	baglung	20000	22
002	Milan	Pokhara	30000	21

Database Model: Various types of database models are as follows

A) Hierarchical Model: It is one of the oldest database models. This model arranges the files used in the database in top-down structure which is similar to an upside-downside tree.

B) B) Network Model: In this model, each child can be linked with more than one parent. So the records can be accessed from more than one parent, which are linked. This model is more flexible and has multidimensional connection.

C) E-R Model: It is a graphical representation of entities and relationship in a database.ER model is a logical structure developed to facilities database design.

ER model has 4 components

D) Relational Model: All data is maintained in the form of tables known as relations consisting of rows and columns. Each row (record) represents an entity and column (field) represents and attribute of an entity. The relationship between the two tables is implemented through common attribute in the tables.

Table: Customer Table: Account

Customer Name	Address	Acc_no	Acc_no	Balance
Bijula	Sitalchaya	A01	A01	300
Puman	Srijan tole	A01	A01	400
Bikram	Ramrekha	A03	A03	500

E) Object-Oriented data Model: It is based on object –oriented programming paradigm. A core object oriented data model consists of

· Object and object identifier: Any real working entity is uniformly modeled as an object.

· Attributes and methods: Every object has state and a behavior. The state and behavior encapsulated in an object are accessed or invoked from outside any through explicit message passing.

· Class: Grouping of all objects which share the same set of attribute and methods.

· Class hierarchy and inheritance: Derived a new class (subclass) from existing class (super class)

Database Languages: Languages that are used to interact with database management systems are called database languages. DBMS provides two languages:

1. DDL: Data Definition Language statements are used to create, modify and drop database and database object like table, index, views etc.DDL statements are CREATE, ALTER and DROP.

2. Data Manipulation language (DML): Data Manipulation Language statements are used to retrieve required data from database, insert (data) into database, modify existing data and delete unnecessary data from database. DML statements are Insert, Delete, Update, and Select.

Two class of DML:

1. Procedural DML: A user specifies what data are required and how to get those data. These are low level data manipulation. Example relational algebra

2. Non-procedural DML: A user specifies what data are needed without specify how to get those data. These are high level data manipulation language. Example Structure query language(SQL)

Select * from account where balance <1000;

Database Architecture:

1. Centralized Architecture: The database system, application programs and user-interface all are executed on a single system and dummy terminals are connected to it are used only to display the information.

2. Client/Server Database System: The centralized database has one central computer called database server to store all the data and files and it provides services to the entire client in the network. Central database services are responsible for processing data. It is suitable for small organization which has different department.

All clients are responsible for presenting the required records to the users in the format they desired.

There are two approaches to implement client /server architecture

a. Two-tier architecture: The user interface and application programs (Open Database Connectivity (ODBC), Java Database Connectivity (JDBC)) are places on client side and database system (DBMS) on the server side.

b. Three -tier architecture: It adds intermediate layer application server between client and database server. The client communicates with the application server which in turns communicate with database server

Advantages

i. Low cost to set up.

ii. High performance.

iii. Centralization of all data in a single computer called server.

iv. Easier to manage and manipulate data and database.

v. High security as a single Database Administrator can control the whole database system.

vi. Suitable for small organization with different departments.

vii. It is easier to manage and manipulate the data as data is stored only in a server computer.

viii. Easier data access.

Disadvantages

1. Cannot cover large area and not suitable for lager organization.

2. Database is location dependent, cannot be access from other places.

It does not support globalized connection

3. Distributed database: It is a set of database stored on multiple computers that appears to applications as a single database. The users can simultaneously access and modify data in several databases in a network. The computer in a distributed system communicates with each other through various communication media such as high speed buses or telephones.

The main difference between centralized database and distributed database is that, in the centralized database, the data resides in one single centralized computer while in distributed data is stored in several sets under the control of local distributed DBMS components which are under the control of distributed database system.

· Useful for large organizations which spread all around the world

· Data security is very important because of hacked or damaged during data transmission

· Large number of users difficult to set appropriate permission to them.

Advantages

1. Data sharing and distribution controlled all over the world.

2. Improved reliability for users.

3. Improved availability of data.

4. Economy on operation and data sharing.

5. Modular growth can support

Disadvantages

1. Higher software development cost.

2. Greater potential for bugs and hacked.

3. Increased processing overhead for client and server computers.

4. More complex in database design

5. Less security model because data may travel continent to continent

6. More difficult for general integrity

Data warehousing It is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and decision making. It involves

o Data integration is the process of standardizing the data definitions and data structures of multiple data source by using a common structure providing a unified view of the data for enterprise -level planning and decision making.

o Data cleaning is the process of detection and correcting incorrect, irrelevant and out of date, corrupt, redundant and inconsistence record from a record set, table or database.

o Data consolidation refers to the collection and integration of data from multiple sources into a single destination. During this process, different data sources are put together into a single data store.

Data Mining: It is defined as extracting information from huge set of data i.e. data mining is the procedure of mining knowledge from data. The information or knowledge extract can be used for market analysis, fraud detection, customer retention, production control, science exploration etc.

Application: Data mining is widely used in different fields:

1. Financial Data Analysis: Banks and financial institutions use data mining for loan payment prediction, customer credit policy analysis, detection of money laundering, financial crimes etc.

2. Retail Industry: Data mining helps in retail industry helps in identifying customer buying patterns and trends that lead to improved quality of customer service and customer retention and satisfaction.

3. Telecom Industry: Data mining in telecom industry helps in identifying the telecommunication patterns, catch fraudulent activities, make better use of resource and improve quality of service.

4. Biological Data Analysis: Data mining helps in bioinformatics. It can be used in alignment, indexing, similarity search and comparative analysis multiple nucleotide sequence, discovery of structure patterns and analysis of genetic network and protein pathway etc.

5. Scientific Application: A large amount of data sets in being generated because of the fast numerical simulation in various fields such as climate and ecosystems modeling, chemical engineering, fluid dynamics etc where data mining is used.

6. Intrusion Detection: Intrusion refers to any kind of action that threatens integrity, confidentially or the availability of network resources.

Computational Biology: It is the science of using biological data to develop algorithm and relation among various biological systems. It involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological and behavioral and social system. It applies concept of computer science statistics and mathematics to the problem in biology. The main goal is to discover new biology and knowledge about living system.

Computational Nano science: It is the study of structures and material on the scale of nanometers. Nanotechnology aims to gain control of structure and devices at the atomic, molecular and super molecular levels.

Software Tools for computational Nanoscience: Molecular workbench (MW). It is a modeling tool and contains many ready used models. It allows us to create our own simulation.

Space Data: Data collected from space with the help of satellites is called space data. Satellites collect data and send it to different stations at the earth surface. Data contain about weather condition, data about other planet, and data about forest, oceans, Himalayas etc. Space Data Routers (SRD) will allow space agencies, academic institution and research center to share space data generated by a single or multiple mission.

santoshspoudel

Tuesday, January 29, 2019

Database_BSC

No comments:

Post a Comment