Using a random (hashed) file allows access a specific master entry according to the value of the entry’s search field (or master key) directly regardless of the location in the data. Hashes usually use a division-remainder algorithm to determine a record’s position (records with same position are grouped in lists).
Using an index file is a popular way to gain access to records in a file where the requests are unpredictable (i.e. not in sequential order, according to the data order). Essentially the index file structure (which may or may not be ordered itself) is associated with a particular search key and there may be many indexes, based on different search keys, to allow for fast access to different file queries. Index files require maintenance (although primary keys may be automatically indexed) and can be compared to directory structures of modern file storage systems included in operating systems.
The advantages of one or other are largely classified in terms of storage space required, speed of retrieval and ease of record management.
Hashing requires no storage space (except possibly for record management) and is faster than indexing for record retrieval and management. Essentially, hashing, over indexing, increases speed, makes transfers easier, improves retrieval, optimises data searching and reduces resource overhead.
Whilst it may seem that using the random (hashed) file approach is mostly desirable as it usually requires less storage, is faster on retrieval and largely easier to use than indexing, indexing does have the advantage that, unlike hashing, it can be relied upon to produce unique keys. Hashing relies on keys that, given different search terms, can point to the same location and solving this problem by the development of an algorithm that cannot do this is still the subject of research (Rego, A. (n.d.)). In addition, where multiple or sequential key retrieval is required, the hashed approach is either impractical or cannot be used (unless a hash index is used), as it uses a single algorithm, whereas the index approach can handle these quickly and easily.
Hence my criteria for choosing hashing or indexing for different applications would be:
- What type of retrieval is the user looking for? Single v Sequential v Multiple key retrieval
- Is space or performance more important? Subject to other considerations it may be worth sacrificing space to gain performance or flexibility in record retrieval.
- Will the application be focussed on record insertion, editing or deleting or on information retrieval? Is it more important that the average response time is improved or the worst-case response time.
- Can the application’s performance afford the periodic maintenance of indexes?
Glenn, J. (2009) Computer Science: An Overview. 10th ed. Boston: Pearson Education.
Hoffer, J. Prescott, M. & McFadden, F. (2004) Modern Database Management. 7th Edition. Prentice Hall.
Rego, A. (n.d.). Do migrating secondaries give you migraines? [Online]. Available at: http://www.adager.com/TechnicalPapersHTML/Migraines.html (Accessed: 20 Dec 2009).