Project Gutenberg

A portrait of literary history through 79,491 digitized works

The Project Gutenberg collection contains 79,491 works by 25,942 authors, written in 119 languages and spanning subjects from ancient philosophy to pulp science fiction. This analysis traces the contours of this remarkable archive.

The Shape of Literary Production

The archive reveals a striking pattern: authors born in the 1860s dominate the collection, contributing 9,430 works—more than any other decade. This peak reflects the intersection of Victorian-era productivity, the public domain threshold, and Gutenberg's digitization priorities.

Works by author birth decade, 1700–1900
1800s
1810s
1820s
1830s
1840s
1850s
1860s
1870s
1880s
1890s
Peak: 1860s with 9,430 works

The Dominance of English

English accounts for 76% of all works—an overwhelming majority that reflects both the project's American origins and the global reach of English-language publishing. Yet beneath this dominance lies unexpected variety.

English
60,693
French
3,973
Finnish
3,313
German
2,324
Italian
1,056
Dutch
1,046
Spanish
885
Portuguese
647

Finnish ranks third—a striking overrepresentation for a language spoken by 5 million people, explained by Finland's active digitization community and the richness of its 19th-century literary tradition.

Fiction and Non-Fiction by Language

Languages differ markedly in their fiction-to-nonfiction ratios. Finnish and Hungarian collections are predominantly fiction (61% and 60% respectively), while Portuguese leans heavily toward non-fiction (only 22% fiction).

Finnish
60.8%
Hungarian
60.1%
Dutch
44.1%
Spanish
42.7%
English
39.6%
French
38.6%
German
37.2%
Italian
35.8%
Portuguese
22.2%

The Emergence of Genres

Genres can be dated by the average birth year of their authors. Historical fiction (avg. author born 1833) represents the oldest tradition; science fiction (1906) is distinctly modern, its authors born nearly 75 years later.

1830
1850
1870
1890
1910
Historical fiction (993)
Domestic fiction (380)
Sea stories (360)
Love stories (894)
Fantasy fiction (438)
Adventure stories (1,352)
War stories (314)
Detective fiction (818)
Western stories (602)
Science fiction (2,641)

The Science Fiction Explosion

Science fiction's growth is dramatic. Authors born in the 1910s contributed 816 works—up from just 30 by those born in the 1850s. The genre essentially crystallized in the early 20th century.

SF authors by birth decadepeak: 1910s

Literary Productivity

Shakespeare leads in absolute output (334 works), but measuring works per year of life reveals different patterns of productivity. Jack London, dead at 40, produced 2.85 works per year—among the highest sustained outputs of any major author.

AuthorWorksLifePer year
Shakespeare334156416166.42
Dickens197181218703.40
Twain251183519103.35
Bulwer-Lytton226180318733.23
Balzac159179918503.12
Ebers177183718982.90
London114187619162.85
Stevenson114185018942.59
Dumas165180218702.43
Verne177182819052.30

Brief Lives, Lasting Words

Some of literature's most enduring voices were silenced early. Wilhelm Hauff died at 25, yet left 26 works. Stephen Crane lived 29 years; Shelley and Robert E. Howard just 30. Byron produced 32 works before drowning at 36.

Wilhelm Hauff
18021827 (25 years)
26 works
Stephen Crane
18711900 (29 years)
19 works
Percy Shelley
17921822 (30 years)
17 works
Robert E. Howard
19061936 (30 years)
29 works
Lord Byron
17881824 (36 years)
32 works
Pushkin
17991837 (38 years)
19 works

Author Longevity Over Time

Authors' lifespans have increased steadily. Writers born in the 1500s lived a median of 63 years; by the 1900s, this had risen to 77. The modal lifespan cluster is 70–79 years (5,228 authors), with surprisingly many reaching their 90s.

68
1600s born
n=332
71
1700s born
n=1,745
72
1800s born
n=14,953
77
1900s born
n=655
Distribution of author ages at death:
Under 40
40–49
50–59
60–69
70–79
80–89
90+

The Library of Congress View

Using Library of Congress classification codes, American literature (PS) narrowly leads English literature (PR). Juvenile fiction (PZ) claims third place—a reminder that children's literature was a massive Victorian industry.

PSAmerican Literature
12,013
PREnglish Literature
10,641
PZJuvenile Fiction
7,860
PQRomance Literature
5,281
PTGermanic Literature
3,242
PHFinno-Ugrian
1,810

What They Wrote About

Science fiction leads all subject headings with 3,208 works—a testament to Gutenberg's pulp-era acquisitions. Short stories, adventure, and detective fiction follow. The archive is, above all, a treasury of popular fiction.

3.2k
Science fiction
3.0k
Short stories
2.0k
Fiction
1.6k
Adventure stories
1.0k
Historical fiction
939
Detective and mystery stories
935
Love stories
681
Poetry

This analysis draws on four linked datasets: metadata for 79,491 works, biographical records for 26,077 authors, 76,205 language assignments, and 255,312 subject classifications. The data reflects not just literary history but the history of digitization itself—what we chose to preserve, and when.

Data: TidyTuesday 2025-06-03 / Project Gutenberg