u02.Part1.InformationSystems.Files.notebook
October 10, 2017
Unit 2. Part 1 Information Systems. Files
1
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
Index 1. Information Systems 1.1 Software categories 1.2 Definition 1.3 Enterprise Information Systems 1.4 Human Resources 2. Persistent memory 2.1 Files 2.1.1 Sequential File 2.1.2 Random Access File 2.1.3 Sequential Indexed File 2.2 Physical Level 2.2.1 Access Time 2.2.2 Input/Output Flow 3. Types of Information Systems 3.1 Process-oriented IS 3.2 Data-oriented IS
2
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
1. Information Systems (IS) 1.1 Software categories • System Software. This software is not intended for the end user. Instead, it is useful for other applications. E.g.: OS, Plugin, Components, ...
• Application Software. This software solves specific needs, usually in companies. E.g.: Sales Management, Human Resources Management, ... • Scientific and Engineering Software. It is focused on intensive calculus and simulation. E.g.: Building Structures, Weather forecast, ... • Embedded Software. It is usually in a machine component: used in cars, appliances, ... It has very little resources, so it has to fit properly with strong limitations.
3
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
1.1 Software categories (continued) • Product Line Software. This software aims to help with new products through intensive reuse of core assets; the Eclipse platform is an example of this type of software. • Web Applications. This software has its own characteristics. The net is used intensively, passing data back and forth from sources to end users.
• AI Software. These programs use tools and algorithms in a very different manner from others. E.g.: Speech recognition, neural networks, ...
These categories are not mutually exclusive!
We will focus on custom made (WEB) APPLICATION SOFTWARE (we cannot study every category).
4
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
1.2 Information Systems. Definition An information system is any combination of technology related to information and human activities that employ this technology; aiming at giving support to the operation, management or decisionmaking.
The software for Information Systems is a type of software that manages specific information through databases and gives support to a series of human activities; the context is usually a company. Watch video "Information Systems" https://www.youtube.com/watch?v=Qujsd4vkqFI
Watch video "The 5 Components of an Information System" https://www.youtube.com/watch?v=XlcolUHMnh0
5
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
1.3 Enterprise Information Systems An information system includes only the relevant information for the company and the necessary tools for managing that information.
An enterprise IS consists of:
• Material Resources: documents, equipment, disks, ... • Human Resources: personnel using the information • Protocols: Rules that information must follow (format, model, etc)
6
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
1.4 Human Resources Different Roles, tasks and responsabilities:
Project Manager: planning, scope and viability Domain Expert: knows the requirements Developers
Arquitect: outlines the arquitecture and the technology Technical Analyst: designs a subset of the arquitecture
Programmer: expert of the technology, implementation Quality Expert: checks the requirements have been met
7
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2. Persistent Memory Traditionally data have been stored in files over magnetic media.
The term file is used in the scope of operating systems (OS) in a more generic sense. Obviously, this subject does not deal with that program files, neither discuss text files, graphics files, etc.. Just files where structured data records are stored in databases, which are usually the ones used in the IS.
8
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.1 Files. 2.1.1 Sequential file In these files, the data is organized sequentially in the order it was recorded. To read the latest information you have to read all the previous records. That is, reading the record number nine involves first reading the previous eight records.
9
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.1.2 Direct / Random Access File You can read a particular position of the file, knowing its position (usually in bytes). All we need to know is the size of the record, which in this type of files must be all the same. Assuming each record occupies 100 bytes, the fifth record starts at position 400. By placing the file pointer in that position we can start reading.
10
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
Problem 1 The figure aside depicts a random access file and the reading of a single record takes 1ms. a) How many "readings" do we need to get the "Moriarity" record? b) Could you design an algorithm to speed up the access time from paragraph a)? How many accesses does it take? (clue: the file is ordered by name)
11
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
Problem 2 We have seen so far how to acces data with two strategies: 1) the random access (very useful for accessing records by position even if you do not know the content of the record), so we could think of the random access as a pointer to the desire record. 2) the sequential access (very useful for reading the content of records, so we could think of the sequential access as an access to the desired value).
Consider the case of a sequential file such as the one of the figure aside (not all the records have the same length). We willll intend to speed up the time it takes to read only a few records attending to some specific criteria. For instance, we want to access the information regarding an employee named "Garson". Also suppose the cost of reading a single record is 1ms. a) How much time does it take to get the "Garson" record? b) Design an arquitecture for speeding up the access time. If you can reduce the time from paragraph a), then you're done! You can use use any number of files you need and also both: the sequential and the random access.
12
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.1.3 Indexed File Two files are used for data, the first one stores records sequentially. The second table has pointers to several positions of the first one. This second file is the index, a table with some attribute values and their position within the file.
13
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.1.4 Indexed File. A more realistic approach
The index file has only a few entries which indicate the position of a certain attribute value in the file (every 10, 15, 20, ... file records an entry is added in the index). The main file must be always ordered (and so it is ordered from time to time). When looking up a record, the key value is sought in the index table, which will give the start position for the search. From that position the record is searched sequentially until it is found.
14
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.2 Physical Level (Read, Write, Seek) 2.2.1 Acess time
Access time = seek + latency + transfer
Watch video "Storage Devices" https://www.youtube.com/watch?v=ZDITqacAkFQ
https://www.youtube.com/watch?v=G2EfxglM_mQ
15
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
2.2.2 Input/Output Flow Schema
Consider Blocks of 100 records and working sequentially.
Every 100 records (readings or writings) by the application only 1 Block will be read or written by the OS.
The OS will perform the proper translations between the physical and logical addresses.
The use of buffers improves dramatically tranfer rates.
16
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
3 Types of Information Systems 3.1 Process-Oriented
These systems are called "File Management Systems". Systems using Office Software such as Word, Excel, etc., fall into this category. Many companies use this schema and have no litlle problems. Drawbacks: • • • • • • •
Redundancy Inconsistency Difficult to query data (which is shared among the files) Application data depends upon the physical level Storage cannot be optimized Concurrency issues Security Management almost impossible
17
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
3.2 Data-Oriented
In these systems data are centralized in a common Database, that is, there is only one logical structure that gives support to all data and where applications connect to. These systems are the ones we are going to study. Advantages: • • • • • • •
Redundancy is minimized Data integrity (inconsistency is avoided) Easy to query data Application data is independent upon the physical level Storage can be optimized Concurrency is made efficient Security Management enhanced
18
u02.Part1.InformationSystems.Files.notebook
October 10, 2017
19