FAULT-TOLERANT DISTRIBUTED COMPUTING ON NETWORKS OF WORKSTATIONS

Period of Performance: 01/01/1993 - 12/31/1993

$50K

Phase 1 SBIR

Recipient Firm

Scientific Comput Assoc
One Century Tower, 265 Church Street
New Haven, CT 99999
Principal Investigator

Abstract

A "HYPERCOMPUTER" IS THE PARALLEL COMPUTER THAT EMERGES WHEN ONE SUMS THE UNUSED CYCLES OVER ALL THE COMPUTER NODES ON A LOCAL AREA NETWORK. HOWEVER, FAULT-TOLERANCE IS A CRITICAL ISSUE FOR LONGER-LIVED APPLICATIONS RUNNING ON TYPICAL LANS WHERE NODE FAILURES ARE NOT UNCOMMON. PROCESS CHECKPOINT AND RESTART MECHANISMS THAT ALLOW PARALLEL NETWORK COMPUTATIONS TO PROCEED EVEN AS THEIR CONSTITUENT PROCESSES FAIL ARE BEING DEVELOPED. THE RESULTING SYSTEM IS A FAULT-TOLERANT, EXTENSIBLE, COST-EFFECTIVE SUPERCOMPUTER BASED ON LOW COST, HIGH PERFORMANCE SCIENTIFIC WORKSTATIONS CONNECTED VIA A LOCAL AREA NETWORK.