01 — From Batch Systems to Time-Sharing: The Invention of the Operating System

Technical Overview

Operating systems were not planned. They emerged from economic necessity: expensive hardware demanded maximum utilization, and maximizing utilization required software to manage the hardware between programs. The evolution from batch systems to time-sharing describes a progression of ideas — spooling, multiprogramming, virtual memory, interactive use — each solving the previous approach's dominant inefficiency. By 1969, with CTSS and Multics, all the fundamental OS concepts had been discovered: memory protection, virtual memory, processes, scheduling, filesystems, user authentication, and interactive command interfaces. Everything that followed was refinement, performance engineering, and scaling. This file traces the ideas in their historical order so you can understand why each concept exists.

Prerequisites

Basic understanding of what a CPU, memory, and I/O devices are
Awareness of what a program is and what "executing" a program means
Familiarity with the mainframe era (IBM System/360, OS/360 concepts)

Phase 1: No Operating System (1945–1952)

The first computers had no OS. The programmer was the operator. At ENIAC, a computation began by the programmer rewiring the machine's plugboard connections — the program was in the wiring. On later stored-program machines like the IBM 701, the programmer:

Reserved the machine (scheduled time on a shared calendar)
Loaded the program by toggling front-panel switches or loading a paper tape
Started execution
Watched the machine run (or not — debugging meant observing the console lights)
Collected output from the printer
Freed the machine for the next user

The problem: Machines cost thousands of dollars per hour. Most of that time was spent in steps 1, 3, 5 — not computation. A machine might spend 80% of its time idle while programmers debugged, loaded tapes, or waited for printouts.

Timeline of one "computation" in 1950:

0:00  Programmer arrives with card deck
0:05  Previous job finishes; programmer walks to machine room
0:10  Loads cards (drops deck, repunches two cards)
0:25  Starts execution
0:28  Program crashes — halts with error lights
0:35  Programmer reads core dump printout
1:15  Programmer corrects punch card, reloads
1:20  Program runs to completion
1:35  Printer finishes output
1:45  Programmer leaves with results

Machine was actually computing: ~35 minutes
Machine was idle/waiting: ~70 minutes
CPU utilization: ~33%

Phase 2: Batch Processing and Monitor Programs (1952–1958)

The Closed Shop

The solution to idle machine time was the closed shop: programmers no longer had direct access to the machine. They submitted card decks to professional operators who ran jobs in sequence. The machine was never idle between jobs — an operator kept the next job ready as the previous one finished.

But even this left inefficiency: loading a new job still took minutes (mounting tapes, loading cards). Enter the monitor program.

Monitor Programs: The First OS

A monitor was a small resident program that: 1. Automatically loaded the next job when the current one finished 2. Provided simple I/O routines (read card, write printer) that jobs called 3. Recovered from simple job failures without operator intervention

Early Monitor Memory Layout:

+------------------+  High memory
|                  |
|   User job area  |  <- Current job loaded here
|   (variable)     |
|                  |
+------------------+
|  Monitor         |  <- Always resident; job cannot overwrite it
|  (resident OS)   |  <- System calls: read card, write line, end job
|  (few KB)        |
+------------------+  Low memory

Job transition (without operator):
1. Job completes, transfers control to monitor (CALL END)
2. Monitor reads next job from card reader
3. Monitor loads job into user area
4. Monitor transfers control to job (JUMP user_start)

The IBM FMS (FORTRAN Monitor System, 1956) was one of the first practical monitors. It ran on the IBM 704, supporting FORTRAN programs. A job consisted of: a control card identifying the job, FORTRAN source cards, a compile step (FORTRAN → machine code), and execution. FMS automated this sequence without operator intervention between steps.

IBM IBSYS (1960) was the more sophisticated monitor for the IBM 7090/7094, supporting FORTRAN, COBOL, and assembler jobs, with a library system for standard routines.

Limitations of Simple Batch Processing

No I/O overlap: The CPU sat idle while printing, or while reading cards for the next job. I/O devices were 100–1000x slower than the CPU.
No protection: A runaway job could overwrite the monitor and crash the machine, requiring operator intervention.
Sequential jobs: Only one job in memory; the CPU ran at full speed only during actual computation, which was often less than 50% of job time.

Phase 3: Spooling — Overlapping I/O and Computation (1957–1962)

SPOOL: Simultaneous Peripheral Operations On-Line.

The key insight: if you have a fast disk (or later, magnetic tape), you can decouple slow I/O devices from the CPU: - A reader program runs on a separate small computer (or later in a separate CPU context) and reads card input to disk - The main CPU processes jobs by reading from and writing to disk - A writer program prints from disk to the line printer

Spooling Architecture:

    [Card Reader]                        [Line Printer]
    slow (1000 cpm)                      slow (1200 lpm)
         |                                     |
         v                                     ^
    +--------+                          +--------+
    | Reader |                          | Writer |
    | task   |                          | task   |
    +--------+                          +--------+
         |                                     |
         v                                     ^
    +-------------------------------------------+
    |              Spool Disk                   |
    |  [Job queue]      [Output queue]          |
    |  - Job A cards    - Job A printout        |
    |  - Job B cards    - Job B printout        |
    |  - Job C cards                            |
    +-------------------------------------------+
              |                   ^
              v                   |
    +-------------------------------------------+
    |              Main CPU                     |
    |  Reads jobs from disk  |  Writes output   |
    |  (fast disk I/O)       |  to disk (fast)  |
    +-------------------------------------------+

Result: CPU sees fast I/O (disk vs physical card/printer)
        Card reader and printer run at full speed in parallel
        CPU utilization dramatically improved

The first spooling system was implemented on the UNIVAC 1103 in 1957 at General Motors and North American Aviation, using a small satellite computer to handle I/O. IBM's HASP (Houston Automatic Spooling Priority) system, written by users at NASA's Manned Spacecraft Center (Houston) in 1967 for the IBM 360, became the most widely used spooling system on mainframes and was eventually incorporated into IBM's own software.

Phase 4: Multiprogramming — Multiple Jobs in Memory (1960–1965)

The CPU Utilization Problem

Even with spooling, a job that was waiting for I/O (reading a tape, waiting for a disk seek) kept the CPU idle. A typical scientific computation might be 70% compute, 30% I/O — so CPU utilization was 70% at best. A typical business data processing job might be 10% compute, 90% I/O — CPU utilization of only 10%.

Multiprogramming: keep multiple jobs in memory simultaneously. When Job A waits for I/O, run Job B. When Job B finishes, switch back to Job A.

Single programming vs Multiprogramming (time-space diagram):

Single programming:
Time: 0   1   2   3   4   5   6   7   8
      [--CPU--][I/O---][--CPU--][I/O---]
      CPU busy: 50%

Multiprogramming (2 jobs):
Time: 0   1   2   3   4   5   6   7   8
Job A [--CPU--][I/O---][--CPU--][I/O---]
Job B          [--CPU--][I/O---][--CPU--]
      CPU busy: nearly 100% (both I/Os don't always overlap, but close)

With N jobs, if each is I/O bound p fraction of the time:
CPU utilization ≈ 1 - p^N

For p=0.8, N=2: 1 - 0.64 = 0.36  (modest improvement)
For p=0.8, N=4: 1 - 0.41 = 0.59
For p=0.8, N=8: 1 - 0.17 = 0.83  (significant improvement)

Memory Protection

Multiprogramming required memory protection: if Job A's address space overlapped with Job B's, a bug in Job A could corrupt Job B's memory (or the OS). Hardware protection was required.

The IBM System/360 implemented storage protection keys: each 2KB block of memory had a 4-bit protection key. The CPU's Program Status Word (PSW) had a protection key. An access to a block with a non-matching key (except from supervisor mode, key 0) caused a protection exception. This was sufficient to isolate jobs from each other and from the OS.

OS/MFT and OS/MVT

IBM's OS/360 offered two multiprogramming models:

OS/MFT (Multiprogramming with Fixed number of Tasks): Memory was divided into fixed partitions at system generation time (e.g., 2 large partitions, 4 medium, 4 small). Each partition held one job. Simple but wasteful if job sizes didn't match partition sizes.

OS/MVT (Multiprogramming with Variable number of Tasks): Jobs were allocated exactly the memory they requested. More efficient but caused external fragmentation — free memory became split into small, non-contiguous pieces.

OS/MVT Fragmentation:

Initial state:          After 3 jobs complete:   Problem:
+---------------+       +---------------+         +---------------+
| OS (128KB)    |       | OS (128KB)    |         | OS (128KB)    |
+---------------+       +---------------+         +---------------+
| Job A (200KB) |       |  Free (200KB) |         |               |
+---------------+       +---------------+  Next   |               |
| Job B (300KB) |       | Job D (300KB) |  job    | Free (560KB)  |
+---------------+       +---------------+  wants  | total but in  |
| Job C (400KB) |       |  Free (400KB) |  500KB  | 3 fragments!  |
+---------------+       +---------------+         +---------------+
| Free (100KB)  |       | Free (100KB)  |
+---------------+       +---------------+

Solution 1: Memory compaction (move jobs to consolidate free space)
            -- requires relocation hardware or pausing jobs
Solution 2: Virtual memory with paging (each job gets a logical
            address space; physical pages can be non-contiguous)

Phase 5: Atlas and the Invention of Virtual Memory (1962)

The Atlas computer, built at the University of Manchester (UK), went operational in 1962. It was designed by Tom Kilburn's team and was arguably the most innovative computer system of its era.

Atlas innovations:

Paging: Atlas divided memory into fixed-size pages (512 words). The address space was larger than physical memory. When a program accessed a page not in physical memory, a page fault occurred and the OS loaded the page from drum (a fast rotating storage device).
One-level store: Programs used a single address space. The OS transparently moved pages between drum and core memory as needed. From a program's view, all addresses were equally accessible — there was no "memory" vs "disk."
Dynamic address translation: A set of associative registers (we now call them a TLB — Translation Lookaside Buffer) mapped virtual page numbers to physical frame numbers. Atlas had 32 associative registers — essentially the first hardware TLB.
Page replacement: When physical memory was full and a new page was needed, Atlas selected a page to evict using a policy resembling LRU (Least Recently Used).

Atlas Virtual Memory:

Virtual address: [page number (11 bits)][offset (9 bits)]
                  = 2048 pages of 512 words each
                  = 1M word virtual address space

Physical memory: 98 pages of 512 words (50KB core)
Drum storage:   Up to 2048 pages total

Address translation:
Virtual page# ---> [Associative registers (TLB)] ---> Physical frame#
                            |
                    If not found: PAGE FAULT
                            |
                    OS loads page from drum
                    OS evicts least-recently-used page
                    Updates associative registers
                    Restarts instruction

Atlas demonstrated that virtual memory was practical and beneficial. The concept spread rapidly: IBM 370 added virtual memory in 1972, DEC VAX added it in 1977, and today every CPU supports paged virtual memory.

John McCarthy's Proposal (1959)

John McCarthy, then at MIT, circulated a memo in 1959 proposing that computers could be shared interactively, like telephone service: many users, each with a terminal, sharing one computer, each getting a fraction of CPU time. The computer would be fast enough that each user experienced interactive response times.

This was controversial. Batch advocates argued that interactive use was wasteful — having a user sit at a terminal thinking between commands wasted expensive CPU time. McCarthy's insight was that human think time cost nothing while CPU time was freed by switching to another user. The economics of time-sharing were positive if the machine was fast enough.

Fernando Corbató at MIT implemented CTSS on a modified IBM 7094. It ran up to 30 simultaneous users. The key mechanisms:

Time slicing: Each user received a 200ms time slice. At the end of the slice, a timer interrupt fired, the OS saved the user's CPU state, and switched to the next user.

CTSS Time-Slicing:

Time: 0    200ms  400ms  600ms  800ms  1000ms
      [U1] [ U2 ] [ U3 ] [ U1 ] [ U2 ] [ U3 ]

Each user types a command (takes ~5 seconds of think time)
CPU processes the command in ~100ms
So 30 users share one CPU and each sees <1 second response time
(because they're mostly thinking, not computing)

Human response time requirement: < 2 seconds feels "interactive"
Command execution time: typical command < 100ms
Users: 30
-> Each user's slice every 30 * 200ms = 6 seconds
-> Within 6 seconds, user usually hasn't finished typing
-> Effective: users don't notice sharing

Memory swapping: CTSS had 2MB of physical memory (32K words). It kept the OS and a few active jobs in memory; other jobs were swapped to disk. When a job's time slice arrived, it was swapped into memory if not already there.

Multi-level scheduling: CTSS used a multi-level feedback queue: new jobs started at a high priority (short time slices), and jobs that repeatedly consumed their full slice were demoted to lower priority (longer slices, run less often). This favored interactive (bursty, short) jobs over CPU-bound (long) jobs.

CTSS incubated several important innovations: - Email (1965): Tom Van Vleck and Noel Morris implemented a MAIL command allowing CTSS users to leave messages in each other's directories - Runoff (1964): Jerome Saltzer's text formatting program — ancestor of Unix roff, nroff, troff, and eventually TeX and LaTeX - Compatible: The name reflects that CTSS could also run batch jobs — it was backward compatible with FMS, meaning existing jobs needed no modification

Phase 7: Multics — The Summit of Ambition (1965–1969)

Multics (Multiplexed Information and Computing Service) was designed by MIT, Bell Labs, and GE as the time-sharing system to replace CTSS and define the future of computing. Its goals were breathtakingly ambitious:

Multics design goals: - Utility computing: computing like electricity — you plug in and get as much as you need - 1,000 simultaneous users on one GE-645 computer - A persistent, shared information store — files as a single unified address space - Strong security with multiple protection rings - High availability: system could be maintained while running

Multics Technical Innovations:

Multics Architecture:

Hardware: GE-645 with Multics-specific modifications
  - 36-bit word length
  - Segmented + paged virtual memory
  - Hardware ring protection (8 rings)
  - Tag bits on every memory word (identifying type)

Memory Architecture:
  [Process descriptor segment (PDS)]
         |
         v
  Segment Descriptor Words (SDW) table
  [Seg 0]----> Physical pages (or drum pages)
  [Seg 1]----> Physical pages (or drum pages)
  ...
  [Seg n]----> Physical pages

  Each segment has a name (e.g., "source_file>example.pl1")
  Segments can be shared across processes
  File and memory are unified: editing a file = writing to a segment

Ring Protection:
Ring 0: Multics nucleus (most privileged)
Ring 1: Database management
Ring 2: Trusted application code
Ring 3: Less trusted code
...
Ring 7: User programs (least privileged)

Crossing inward requires a "gate" (controlled entry point)
Hardware enforces: no calling ring-0 code from ring-3 directly

What Multics Got Right: - Hierarchical filesystem (still the model in use) - Dynamic linking of shared libraries - Controlled privilege escalation (rings, later simplified to OS/user mode) - Per-user authentication and per-file access control

Why Multics Was Called a Failure: - Late: CTSS replacement was expected by 1967; first external customer site in 1973 - Expensive: The GE-645 and then Honeywell 6180 were costly; few could afford it - Complex: The system was enormous and hard to understand or maintain - Never approached 1,000 users simultaneously in practice

Why Calling Multics a Failure Is Misleading:

Multics ran production workloads at MIT until 1988 and at a Canadian defense lab until October 2000. Ken Thompson and Dennis Ritchie worked on Multics at Bell Labs, absorbed its concepts, and built Unix as a deliberate simplification. Every Unix filesystem, every Unix process, every modern OS ring protection scheme is influenced by Multics.

Multics was not a failed system — it was an expensive proof of concept that incubated the ideas the industry has used ever since.

OS Evolution Timeline

1952  Open shop -- programmers run their own jobs
  |
1953  Closed shop / batch -- operators manage job queue
  |
1955  Monitor programs -- automatic job-to-job transition
  |
1956  IBM FMS -- first practical monitor for 704
  |
1957  Spooling (GM/NAA) -- overlap I/O with computation
  |
1959  McCarthy memo -- time-sharing concept proposed
  |
1960  IBM OS/MFT -- multiprogramming with fixed partitions
  |
1961  CTSS (MIT) -- first practical time-sharing system
  |
1962  Atlas (Manchester) -- paging, virtual memory, TLB, page faults
  |
1963  CTSS: email, runoff, multi-level scheduling
  |
1964  IBM OS/360 -- storage keys, multiprogramming, JCL
  |
1965  Multics design begins -- rings, segments, dynamic linking
  |
1969  Multics first runs -- ambitious but over budget and late
  |   Bell Labs withdraws from Multics -- Thompson starts Unix
  |
1972  IBM 370 virtual memory -- hardware DAT, paging goes mainstream
  |
1973  Unix v4 (in C) -- simple, portable, influential
  |
1974  OS/VS on IBM 370 -- virtual storage for every job
  |
1977  BSD 1.0 -- Unix variants begin spreading through universities

Key Mechanisms Summary

Mechanism	Problem Solved	Year Invented	Where Used Today
Monitor program	Manual job loading	~1953	Every OS kernel
Spooling	Slow I/O blocking CPU	1957	Print queues, message queues
Memory protection	Job corruption	~1960	CPU rings/modes everywhere
Multiprogramming	CPU idle during I/O	~1961	Multitasking in every OS
Time slicing	Interactive response	1961 (CTSS)	Every preemptive scheduler
Virtual memory / paging	Memory > RAM, isolation	1962 (Atlas)	Every modern CPU/OS
Hierarchical filesystem	Flat namespace chaos	~1965 (Multics)	Every modern filesystem
Ring protection	OS integrity	1965 (Multics)	x86 rings 0/3, ARM EL0-EL3

Production Relevance

The scheduler in your Linux kernel is a direct descendant of CTSS's multilevel feedback queue: SCHED_NORMAL (CFS) favors interactive tasks, SCHED_BATCH and SCHED_IDLE deprioritize background tasks. The same tradeoff McCarthy identified in 1959 is in the scheduler code you're running.
Demand paging (Atlas, 1962) is how your OS handles memory. When a process accesses a virtual address not in physical RAM, a page fault fires, the OS loads the page from swap or a file, and execution continues. This happens millions of times per day on any running machine.
Spooling is the design pattern behind every message queue, every print queue, and every batch job system. AWS SQS, RabbitMQ, Kafka — all implement the same decouple-producer-from-consumer pattern that GM/NAA invented in 1957.
The shell is a direct descendant of CTSS's command interface. The interactive prompt, command parsing, I/O redirection — all of these were first seen on CTSS in 1961.
Memory protection and privilege separation are why the OS doesn't crash when your program segfaults. Storage protection keys (IBM 360) → segmentation (x86) → paging with kernel/user mode → today's CPU rings. Every OS security model rests on this hardware foundation.

Lessons Learned

Every OS concept has a clear economic motivation. Monitor programs: idle machines cost money. Spooling: slow I/O wastes CPU time. Multiprogramming: waiting for I/O wastes CPU time. Time-sharing: interactive use requires fast response. Economics drove every invention. When studying an OS mechanism, ask: what inefficiency was it solving?
Multics taught more by failing than most systems teach by succeeding. Every major OS concept was in Multics: virtual memory, rings, hierarchical FS, dynamic linking, user authentication. Multics's "failure" was commercial and schedule-based, not conceptual. Unix's success came from simplifying Multics ideas, not replacing them.
Simplicity scales where complexity doesn't. Multics used 8 protection rings; Unix collapsed them to 2 (kernel and user). Multics used named segments; Unix used a simple hierarchical filesystem with inodes. The simpler model was implemented on cheaper hardware, spread to universities, and dominated.
Hardware and OS design are inseparable. Atlas's virtual memory required new hardware (TLB). IBM 360's multiprogramming required storage protection keys. Every OS capability requires hardware support. Modern OS design is as much about using hardware features (x86 rings, ARM TrustZone, Intel VT-x) as designing software abstractions.
User-facing simplicity hides enormous internal complexity. CTSS users typed commands and got responses, unaware of time slicing, memory swapping, and multilevel queuing. Good OS design makes complexity invisible to users.

Exercises

Implement a simple round-robin scheduler in Python: have 5 "processes" with random CPU burst lengths and I/O wait times. Simulate time slicing. Plot CPU utilization vs number of processes.
Simulate memory fragmentation in an OS/MVT-style allocator: implement first-fit, best-fit, and worst-fit allocation. Run a trace of job arrivals and departures. Plot external fragmentation over time. Which policy fragments least?
Implement a simple spooling simulation: separate threads for "card reader" (produces jobs), "CPU processor" (consumes jobs, produces output), and "printer" (consumes output). Use queues for the spool. Measure throughput with and without spooling.
Read the CTSS manual (available online at MIT). What were the CTSS multi-level scheduling parameters? How did they prevent CPU-bound jobs from monopolizing the machine?
Compare Atlas's page replacement algorithm with LRU. What information did Atlas have available? What did it use? How does Linux's current page replacement (PFRA, clock algorithm) compare?

References

Corbató, F.J., Merwin-Daggett, M., and Daley, R.C. (1962). "An Experimental Time-Sharing System." AFIPS Spring Joint Computer Conference.
Kilburn, T. et al. (1962). "One-Level Storage System." IRE Transactions on Electronic Computers.
Organick, E.I. (1972). The Multics System: An Examination of Its Structure. MIT Press.
Arden, B.W. (ed.) (1980). What Can Be Automated?: The Computer Science and Engineering Research Study. MIT Press.
Denning, P.J. (1970). "Virtual Memory." ACM Computing Surveys.
Tanenbaum, A.S. and Bos, H. (2014). Modern Operating Systems, 4th ed. Prentice Hall.
Brinch Hansen, P. (1973). Operating System Principles. Prentice Hall.
McCarthy, J. (1962). "Time-Sharing Computer Systems." In Management and the Computer of the Future. MIT Press.
Ross, D.T. (1967). "The AED Free Storage Package." CACM.