File Organisation

FILE ORGANISATION

A file is organised to ensure that records are available for processing. Before a file is created, the application to which the file will be used must be carefully examined. Clearly, a fundamental consideration in this examination will concern the data to be recorded on the file. But an equally important and less obvious consideration concerns how the data are to be placed on the file.

(A) Sequential : It is the simplest method-to store and retrieve data from a file. Sequential organisation simply means storing and sorting in physical form on tape or disk. In a sequential organisation a record can be added only at the end of the file. That is, in a sequential tile, records are stored one after the other without concern for the actual value of the data in the records. It is not possible to insert a record in the middle of the file without re-writing the file. In a sequential file update, transaction records are in the same sequence as in the master file. Records from both files are matched, one record at a time, resulting in an updated master file.

It is a characteristic of sequential files that all records are stored by position; the first one is at the first position, the second one occupies the second position and third is at third and so on. There are no addresses or location assignments in sequential files.

To read a sequential file, the system always starts at the beginning of the file. If the record sought is somewhere in the file, the system reads its ways upto it, one record at a time.

Using the key field, in a sequential file the records have been arranged into ascending or descending order according to a key field. This key field may be numeric, alphabetic, or a combination.of both, but it must occupy the same place in each record, as it forms the basis for determining the order in which the records will appear on the file. The advantages and disadvantages of the Sequential File organisation are given below :

Advantages

  1. Simple to understand this approach
  2. Locating a record requires only the record key.
  3. Efficiency and economical if the activity rate is high.
  4. Relatively inexpensive I/O media and devices may be used.
  5. Files may be relatively easy to reconstruction since a good measure of built in backup is usually available.

Disadvantages

  1. Entire file must be processed even when the activity rate is low.
  2. Transactions must be sorted and placed in sequence prior to processing.
  3. Timeliness of data in file deteriorates while batches are being accumulated.
  4. Data redundancy is typically high since the same data may be stored in several files sequenced on different keys.

(B) Random or Direct

For a proposed system, when the sequential files are assumed as a disadvantages, another file organisation called Direct organisation is used. As with a sequential file, each record in a direct file must contain a key field. However, the records need not appear on the file in key field sequence. In addition any record stored on a direct file can be accessed if its location or address is known. All previous records need not to be accessed. The problem, however, is to determine how to store the data records so that given the key field of the desired record, its storage location on the field can be determined. In other words, if the program known the record key, it can determine the location address of a record and retrieve it independently of any other records in the file.

It would be ideal if the key field could also be the location of the record on the file. This method is known as direct addressing method. This is quite simple method but the requirements of this method often prevent its use.

Therefore, before a direct organised file can be created, a formula or method must be devised to convert the key field value for a record to the addreSs or location of the record on the field. This formula or method is generally called an algorithm. Otherwise called the Hashing addressing. Hashing refers to the process of deriving a storage address from a record key. There are many algorithms to determine the storage location using key field. One of the algorithm is :

Division by Prime

In this procedure, the actual key is divided by any prime number. Here the modular division is used, that is, quotient is discarded and the storage location is signified by the remainder. If the key field consists of large number of digits, for instance, 10 digits (e.g., 2345632278), then strip off the first or last 4 digits and then apply the division by prime method.

The advantages and disadvantages of direct file organisation are as follows :

Advantages

  1. Immediate access to records for enquiry and updating purposes is possible.
  2. Immediate updating of several files as a result of single transaction is possible.
  3. Time taken for sorting the transactions can be saved.

Disadvantages

  1. Records in the on-line file may be exposed, the risk of a loss of accuracy and a procedure for special backup and reconstruction is required.

2, As compared to sequentially organised, this may be less efficient in using the storage space.

  1. Adding and deleting of records is more difficult than with sequential files. Relatively expensive hardware and software resources are required.

(C) Indexed

The third way of accessing records stored in the system is through an index. The basic form of an index includes a record key and the storage address for a record. To find a record, when the storage address is unknown it is necessary to scan the records. However, if an index is used, the search will be faster since it takes less time to search an index than an entire file of data.

To find a specific record when the file is stored under an indexed organisation, the index is searched first to find the key of the record wanted. When it is found, the corresponding storage address is noted and then the program can access the record directly. This method uses a sequential scan of the index, followed by direct access to the appropriate record. The index helps to speed up the search compared with a sequential file, but it is slower than the direct addressing.

The indexed files are generally maintained on magnetic disk or on a mass storage system. The primary differences between direct and indexed organised files are as follows :

Direct organised files utilise an algorithm to determine the location of a record, whereas indexed organised files utilize and index to locate a record to be randomly accessed. The advantages and disadvantages of indexed sequential file organisation are as follows :

Advantages

  1. Permits the efficient and economical use of sequential processing techniques when the activity rate is high.
  2. Permits quick access to records in a relatively efficient way. This activity is a small fraction of the total workload.

Disadvantages

  1. Less efficient in the use of storage space than some other alternatives.
  2. Access to records may be slower using indices than when transform algorithms are used.
  3. Relatively expensive hardware and software resources are required.

Leave a Reply

Your email address will not be published. Required fields are marked *