Overview
Fachportal der Sprach- und Literaturwissenschaften in der Schweiz

English corpora

British National Corpus (BNC)

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007.

BNC

Corpus of Contemporary American English (COCA)

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English.
The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2012 and the corpus is also updated regularly (the most recent texts are from Summer 2012).

COCA

Corpus of Global Web-Based English (GloWbE)

The Corpus of Global Web-Based English (GloWbE) is composed of 1.9 billion words from 1.8 million web pages in 20 different English-speaking countries. The corpus was created by Mark Davies of Brigham Young University, and it was released in 2013.

GloWbE (pronounced like "globe") is related to other large corpora that we have created, including the 450 million word Corpus of Contemporary American English (COCA) and the 400 million word Corpus of Historical American English (COHA). Together, these three corpora allow researchers to examine variation in English -- by dialect, genre, and over time -- in ways that are not possible with any other large corpora of English.

GloWbE

Corpus of Historical American English (COHA)

The Corpus of Historical American English (COHA) is the largest structured corpus of historical English. The corpus was created by Mark Davies of Brigham Young University.

COHA allows you to quickly and easily search more than 400 million words of text of American English from 1810 to 2009. You can see how words, phrases and grammatical constructions have increased or decreased in frequency, how words have changed meaning over time, and how stylistic changes have taken place in the language.

COHA