Character Frequency Lists

Dynamic List    Jun Da Published List    Chih-Hao Tsai Published List    Showing Details

Frequency lists show characters in decreasing frequency of occurrence, with the most common characters displayed first. By knowing how frequently characters are used in stories, articles, and other documents, it’s easy to decide which characters to learn first. Clicking on the “Character Frequency” tab just above the Reader shows the current frequency list, and this is chosen via “Frequency List” on the main menu. How character frequency is measured (the corpus in published lists) is what distinguishes these three lists:

  1. Dynamic Character Frequency List: This list is generated dynamically when you import documents into Chinese Toolbox 2012. Before importing, this list is empty. After importing, a list of characters appears when you click on the “Character Frequency” tab just above the Reader frame (refer to Interface). Each time you import a document, the program counts the characters and updates frequency counts. Frequency counts are for all imported documents, not just the most recent import. If you’d like to get a frequency count of just one document or set of documents, click on “Clear character frequency counts” on the File menu. The next time you import a document, the character frequencies will be for that one document. FreqListDynamic
  2. Modern Chinese Character Frequency List (Jun Da): A published frequency list used by permission. The online list can be found here. Also, see Jun Da’s WebCentral and the white paper that provides details of the study that resulted in this frequency list. Since some Chinese characters do not exist in the current Chinese Toolbox character dictionary, so thirty characters of this list are not used. They include (by entry number): 6088, 6693, 6933, 7049, 7074, 7188, 7324, 7399, 7525, 7526, 7609, 7963, 8006, 8221, 8222, 8749, 8750, 8751, 8752, 8753, 8771, 8945, 8946, 9020, 9038, 9064, 9711, 9712, 9713, 9764. FreqListJunDa
  3. Frequency and Stroke Counts of Chinese Characters (Chih-Hao Tsai): A published frequency list used by permission. The online list can be found here. See also Chih-Hao Tsai’s Technology Page and Chih-Hao Tsai’s Blog (in Chinese). All characters in this list exist in the Chinese Toolbox character dictionary, so none are excluded. FreqListChihHaoTsai

With the checkbox, “Show details in Character Frequency display”, in the Settings dialog you can specify whether or not to show additional detail in the Character Frequency view. For published character frequency lists, this additional detail currently only includes the position of the character in the frequency list. The dynamic list includes much more information as shown below: FreqListDynamicDetailed

The first line indicates how many characters have been imported into Chinese Toolbox 2012 via the “Import Document button/tab. The second line indicates how many of these characters are unique. Both of these settings are reset when you select “Clear character frequency counts” from the File menu.

Characters of the dynamic character frequency list are displayed one character per line, and each line includes the following information:

  1. Character position in the list
  2. The character
  3. How frequently the character occurs in all imported documents
  4. The actual character frequency count

So in the previous example you can see that 2209 unique characters have been imported. The first character, 的, occurred 1291 times or 4.33% of the time (1291 ÷ 29773) in all imported documents.

The character frequency list is a very important part of Document Analysis in Chinese Toolbox 2012. It is also integrated into the character entry context menu in the Character Dictionary window. Upcoming releases of Chinese Toolbox will provide even greater integration of character frequency lists, especially in the Search window. See Upcoming Release for more information on this.