(1) Design Criteria

(a) Monolingual
(b) Professional writing (texts of academic standard)
(c) Synchronic (1995-2002)
(d) Regional variety (AmE/BrE/etc.)
(e) Sample (50,000 words per journal)
(f) Selection criteria

  In order to ensure an objective selection of journal texts, the project team decided to base content decisions on data obtained from the Journal Citation Reports (JCR), which presents quantifiable statistical data for an objective and systematic approach to determining the relative importance of journals within their subject categories. As of 2001, the Science Edition of the JCR contained about 5,700 journals. It uses a unique indicator called “Impact Factor,” which provides a way to evaluate or compare a journal’s relative importance as perceived by others in the same field. Employing these data, the journals with the top 20% impact factor in each field were selected for inclusion in the PERC Corpus. JCR classifications were also used to define the subject fields.

(i) Domains: science and technology including life science (texts from approximately 170 subdomains are classified into the following 22 domains. These domains can be accessed separately as sub-corpora. For further details, see the sub-corpus sections of the concordancer.)

     Civil Engineering
     Computer Science
     Construction & Building Technology
     Earth Science
     Electrical & Electronic Engineering
     Environmental Sciences
     Food Science
     General Science
     Materials Science
     Metallurgy & Metallurgical Engineering
     Nuclear Science & Technology

(ii) Media: academic journals

(2) Text Encoding

The following information is indicated by the mark-up:

1. Sentence boundaries, parts of speech and lemma

2. Meta-textual information regarding the source or encoding of individual texts (Detailed descriptive information is added to each text, in the form of a header, which includes the author's name, title, publication year, journal title, etc.)