Data
and Database Representation
Computer system organized data in hierarchy
that begins with bits, proceeds to character /bytes, fields, record, file and
database.
A bit represent smallest unit of
data that can be processes by computer. It is either 0 or 1.
A group of 8 bit is called byte and
it represents single character (letter, number, a symbol)
Logical grouping of character or a
complete number forms fields.
Logical grouping of related field
is called record. Example id, name, salary, phone number etc forms a record.
Logical grouping of related record
is called file
Example record of all employees in
an organization.
Logical grouping of related files
is called database.
Example set of database field such
as employee, department
Data
Management
Two
types of data Management
1. File Management Systems
2. Database Management System
Database: A
database is an organized collection of logically related data that contain
information. The database is called is also called repository or container for
a collection of data file.
Example
university database maintains information about students, curses, and grade.
Database
Management System: A general-purpose DBMS is
a software system
designed to allow the definition, creation, querying, update, and
administration of databases. In other words, it is a set of computer software
or program used to control the reading and writing of data from and to a
database.
Oracle,
Microsoft SQL Server, DB2, MySQL, dbase, MS-Access etc. are DBMS software.
Database
Management System that maintains relationship between multiple data file is
called relational database Management System (RDBMS).
Database
System: It consists
of database, Database Management System and application program. The
application software that uses DBMS for data Management is called database
system. Example Library Management system.
File
Management System (FMS):
It is also called flat file system. It stores data in a plain text file such as
notepad, word file etc. A
file management system is a type of software that manages data files in a
computer system. It has limited capabilities and is designed to manage
individual or group files, such as special office documents and records. It may
display report details, like owner, creation date, state of completion and
similar features useful in an office environment.
Advantages
OF DBMS
1. Data Redundancy: Data redundancy means duplication
of same data. Flat file system suffers from the problem of high data
redundancy. Example record of a student may appear in library data files as
well as examination data files. This redundancy leads to higher storage and
access cost. Database system reduce the problem if data redundancy.
2.
Data
inconsistency:
Database system can remove the problem of data inconsistency of automatically
propagating data updates in one file in a database in data file. Data
inconstancy occurs if changed data is reflected in data files in one place but
not elsewhere in the system, Example if library data file contain cell number
of the student as 9856700001 and
examination data stores as 985670002 then we can say that data is inconsistent.
Flat file systems suffer from data inconstancy.
3. Data isolation: As
the data is stored in various files and file may be in different format,
writing new application program to retrieve data is difficult in flat file
system
Example:
Student Data file
|
D01,Divya,BSC,Baglung
|
D02,Manish
,BSC,Kathmandu
|
Book Data file
|
D01 Divya BSC
Baglung
|
D02 Manish BSC
Kathmandu
|
4.
Difficulty
in accessing data:
File processing systems do not allow required data to be retrieved in efficient
and convenient way. Example suppose we want to see the record of all customer
who has balance less than 500, then we need to extract the data file manually
or write a program to retrieve data. First is time consuming and is tedious and
costly because if we need to see the record of customer who has balance less
than 1000 then we need a new program. But in database system, it is very easy
to write general program to generate different list on the basis of different
criteria.
5.
Integrity
problem: Integrity means correctness of data before and
after execution of a transaction. Example if maximum salary is 15000 then we
have the integrity constraint salary<=15000. Integrity maintain correctness
of data.
6.
Atomicity
problem: Transaction
must execute at it's entirely or not at all. If the execution is not atomic, it
leaves database in incorrect state. Example: Transaction that transfer 500 from
account A to Account B. If the execution of transaction is failed at the
specific point, it causes 500 to be deducted from account A without depositing
it in account B. Thus it leads to an inconsistent state. File processing system
do not guarantee atomic execution of transaction, hence it may occur. Database
system guarantees atomicity of execution of transaction.
Example: Transaction that transfer 500 from
account A to account B. If the execution of deducted to
7.
Concurrent
–access anomalies: Many
systems allow multiple users to update data simultaneously. Consider a bank
account containing 500. If two customer withdraw 100 and 50 at the same time,
both transaction reads the old balance and withdraw from that old balance which
will result in 400 and 450 respectively which is incorrect (inconsistent).
Database system support concurrent of transaction on the same data without
resulting into inconsistent.
8. Security problem: In database system, we create
different user account and provide different authorized to different user. Thus
we are able to hide certain information from other user. Example in a banking
system, payroll personnel need to do only information about bank employee, they
do not need to access information about customer account, but in file
processing system do not allow us to create user account thus all user have
equal access to the data. Due to this, difficulty in maintain security in file
system.
Drawbacks
1. Initial Investment: To use database we need to
purchase database management system as well as power computer to run it as
database server. So, cost is high.
2.
Dedicated
Staff: To
management database system efficiently we need to hire a technical sound
dedicated staff called database administrator.
3.
Overhead: we
need to update and maintain database system time to time in order to make it
effective with new technology. Thus overhead cost is high for DBMS.
When flat file system suitable when
1.
System
to be developed is simple and small.
2.
Have
to manage few data.
3.
Security
s not major concern.
4.
Concurrent
access is not need.
Application of DBMS
1.
Airlines
and railways:
Airlines and railways use online databases for reservation and for displaying
the schedule information.
2.
Banking: Banks use databases for customer
inquiry, account, loans, and other transactions.
3.
Education: Schools and colleges use database
for course registration, result and other information.
4.
Telecommunications: Telecommunication departments use
database to store information about the communication network, telephone
numbers, record of calls, generating monthly bills etc.
5.
Credit
card transactions:
Database is used for keeping track of purchase on credit card in order to
generate monthly statements.
6.
E-commerce: Integration of
heterogeneous information sources
7.
Health
care information systems and electronic patient record: Databases are used for maintain
the patient health care detail.
8.
Digital
libraries and digital publishing: Databases
are used for management and delivery of large bodies of textual and multimedia
data.
9.
Finance:
Data bases are
used for storing information such as sales, purchase of stocks and bonds or
data useful for online trading.
10.
Sales:
Databases are used
to store product, customer and transaction details.
11.
Human
resources: Organization
use databases for storing information about their employees, salaries, benefits,
taxes and generating salary checks.
Database
Instance and schemas
The
overall structure of the database is called database schema. In relational
model, the schema specifies it name, name of each field (attribute or column)
and type of each field.
Example:
Employee(eid:string,endame:string,address:string,salary:interger,age:integer)
The
collection of information stored in the database at a particular moment is
called instance of the database. It is the actual content of the
database at a particular point in time. Database instance changes frequently
with every insertion, deletion and update operations performed in data stored
in a database.
Example:
eid
|
ename
|
address
|
salary
|
age
|
001
|
Shashi
|
baglung
|
20000
|
22
|
002
|
Milan
|
Pokhara
|
30000
|
21
|
Database
Model: Various
types of database models are as follows
A) Hierarchical Model: It is one of the
oldest database models. This model arranges the files used in the database in
top-down structure which is similar to an upside-downside tree.
B) B)
Network Model: In this model,
each child can be linked with more than one parent. So the records can be
accessed from more than one parent, which are linked. This model is more
flexible and has multidimensional connection.
C)
E-R
Model: It is a graphical representation of
entities and relationship in a database.ER model is a logical structure
developed to facilities database design.
ER
model has 4 components
D) Relational Model: All data is maintained in the form
of tables known as relations consisting of rows and columns. Each row (record)
represents an entity and column (field) represents and attribute of an entity.
The relationship between the two tables is implemented through common attribute
in the tables.
Table: Customer
Table: Account
Customer
Name
|
Address
|
Acc_no
|
Acc_no
|
Balance
|
|
Bijula
|
Sitalchaya
|
A01
|
A01
|
300
|
|
Puman
|
Srijan
tole
|
A01
|
A01
|
400
|
|
Bikram
|
Ramrekha
|
A03
|
A03
|
500
|
E) Object-Oriented data Model: It is based on object –oriented
programming paradigm. A core object oriented data model consists of
·
Object
and object identifier:
Any real working entity is uniformly modeled as an object.
·
Attributes
and methods: Every
object has state and a behavior. The state and behavior encapsulated in an
object are accessed or invoked from outside any through explicit message
passing.
·
Class: Grouping of all objects which
share the same set of attribute and methods.
·
Class
hierarchy and inheritance:
Derived a new class (subclass) from
existing class (super class)
Database Languages: Languages
that are used to interact with database management systems are called database
languages. DBMS provides two languages:
1. DDL: Data Definition Language statements
are used to create, modify and drop database and database object like table,
index, views etc.DDL statements are CREATE, ALTER and DROP.
2.
Data
Manipulation language (DML):
Data Manipulation Language statements are used to retrieve required data from
database, insert (data) into database, modify existing data and delete
unnecessary data from database. DML statements are Insert, Delete, Update, and Select.
Two class of DML:
1.
Procedural
DML: A user
specifies what data are required and how to get those data. These are low level
data manipulation. Example relational algebra
2.
Non-procedural
DML: A user
specifies what data are needed without specify how to get those data. These are
high level data manipulation language. Example Structure query language(SQL)
Select
* from account where balance <1000;
Database Architecture:
1. Centralized Architecture: The database system, application
programs and user-interface all are executed on a single system and dummy
terminals are connected to it are used only to display the information.
2.
Client/Server
Database System: The centralized database has one central computer
called database server to store all the data and files and it provides services
to the entire client in the network. Central database services are responsible
for processing data. It is suitable for small organization which has different
department.
All clients are
responsible for presenting the required records to the users in the format they
desired.
There are two
approaches to implement client /server architecture
a. Two-tier
architecture: The user interface and application programs (Open Database Connectivity
(ODBC), Java Database Connectivity (JDBC)) are places on client side and
database system (DBMS) on the server side.
b. Three
-tier architecture: It adds intermediate layer application server between
client and database server. The client communicates with the application server
which in turns communicate with database server
Advantages
i.
Low cost to set up.
ii.
High performance.
iii.
Centralization of all data in a single computer called
server.
iv.
Easier to manage and manipulate data and database.
v.
High security as a single Database Administrator can control
the whole database system.
vi.
Suitable for small organization with different departments.
vii.
It is easier to manage and manipulate the data as data is
stored only in a server computer.
viii.
Easier data access.
Disadvantages
1. Cannot cover large area and not
suitable for lager organization.
2. Database is location dependent,
cannot be access from other places.
It does not support globalized
connection
3.
Distributed database: It
is a set of database stored on multiple computers that appears to applications
as a single database. The users can simultaneously access and modify data in
several databases in a network. The computer in a distributed system
communicates with each other through various communication media such as high
speed buses or telephones.
The
main difference between centralized database and distributed database is that,
in the centralized database, the data
resides in one single centralized computer while in distributed data is stored
in several sets under the control of local distributed DBMS components which
are under the control of distributed database system.
·
Useful
for large organizations which spread all around the world
·
Data
security is very important because of hacked or damaged during data
transmission
·
Large
number of users difficult to set appropriate permission to them.
Advantages
1. Data sharing and distribution
controlled all over the world.
2. Improved reliability for users.
3. Improved availability of data.
4. Economy on operation and data
sharing.
5.
Modular growth can support
Disadvantages
1. Higher software development cost.
2. Greater potential for bugs and
hacked.
3. Increased processing overhead for
client and server computers.
4. More complex in database design
5. Less security model because data may
travel continent to continent
6.
More difficult for general integrity
Data
warehousing It is
the process of constructing and using a data warehouse. A data warehouse is
constructed by integrating data from multiple heterogeneous sources that
support analytical reporting, structured and decision making. It involves
o Data integration is the process of standardizing
the data definitions and data structures of multiple data source by using a
common structure providing a unified view of the data for enterprise -level
planning and decision making.
o Data cleaning is the process of detection and
correcting incorrect, irrelevant and out of date, corrupt, redundant and
inconsistence record from a record set, table or database.
o Data consolidation refers to the collection and
integration of data from multiple sources into a single destination. During
this process, different data sources are put together into a single data store.
Data
Mining: It is
defined as extracting information from huge set of data i.e. data mining is the
procedure of mining knowledge from data. The information or knowledge extract
can be used for market analysis, fraud detection, customer retention,
production control, science exploration etc.
Application:
Data mining is
widely used in different fields:
1.
Financial
Data Analysis: Banks
and financial institutions use data mining for loan payment prediction,
customer credit policy analysis, detection of money laundering, financial
crimes etc.
2.
Retail
Industry: Data mining helps in retail industry helps in
identifying customer buying patterns and trends that lead to improved quality
of customer service and customer retention and satisfaction.
3.
Telecom
Industry: Data
mining in telecom industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource and improve quality of
service.
4.
Biological
Data Analysis: Data mining helps in bioinformatics. It can
be used in alignment, indexing, similarity search and comparative analysis
multiple nucleotide sequence, discovery of structure patterns and analysis of
genetic network and protein pathway etc.
5.
Scientific
Application: A
large amount of data sets in being generated because of the fast numerical
simulation in various fields such as climate and ecosystems modeling, chemical engineering,
fluid dynamics etc where data mining is used.
6.
Intrusion
Detection:
Intrusion refers to any kind of action that threatens integrity, confidentially
or the availability of network resources.
Computational
Biology: It is the
science of using biological data to develop algorithm and relation among
various biological systems. It involves the development and application of
data-analytical and theoretical
methods, mathematical modeling and computational simulation techniques to the
study of biological and behavioral and social system. It applies concept of
computer science statistics and mathematics to the problem in biology. The main
goal is to discover new biology and knowledge about living system.
Computational
Nano science: It
is the study of structures and material on the scale of nanometers.
Nanotechnology aims to gain control of structure and devices at the atomic,
molecular and super molecular levels.
Software
Tools for computational Nanoscience:
Molecular workbench (MW). It is a modeling tool and contains many ready used models.
It allows us to create our own simulation.
Space
Data: Data
collected from space with the help of satellites is called space data. Satellites
collect data and send it to different stations at the earth surface. Data
contain about weather condition, data about other planet, and data about
forest, oceans, Himalayas etc. Space Data Routers (SRD) will allow space agencies,
academic institution and research center to share space data generated by a
single or multiple mission.
No comments:
Post a Comment