Character Dictionary Export and Import

This guide was originally written for an earlier version of Chinese Toolbox, and with the new import/export commands on the File menu, this guide is not needed for migrating Chinese Toolbox character data from one version to another. However, this guide may still be useful, (1) for sharing a customized character dictionary with other Chinese Toolbox users, (2) and for importing non-English (e.g. Chinese/German, Chinese/French, etc.) character dictionaries into Chinese Toolbox.

Here is a summary of the main import/export points:

  1. Chinese Toolbox character data falls into two main categories: dictionary data and user data (known and need-to-learn). In Chinese Toolbox, all character data is exported when you select “Export character dictionary” on the File menu.
  2. The program writes the exported data to CharacterDictionary.txt in the user’s Chinese Toolbox documents directory.
  3. The import reads from the same file.
  4. CharacterDictionary.txt is written in Unicode UTF-8 format. If you edit this file, you must save it in UTF-8 format before attempting to import it back into Chinese Toolbox. The import will fail if the file is written in UTF-16 or any other format.
  5. When you edit the file, you can remove any columns except the first, the one with the header ctsf_CHAR_SINGLE.
  6. When you edit the file, you can remove any rows except the first. This first row contains the column headers which are used to associate the column data with target frames in Chinese Toolbox.
  7. Columns can be reordered, and rows can be sorted on any column. Just be sure in the sort to specify that the file does have a header row (the header row is not to be sorted with the data).
  8. By removing data (rows and/or columns) from CharacterDictionary.txt, you can control what is imported back into Chinese Toolbox (beyond the menu commands included on the File menu). For example, if you remove all the character dictionary columns from the file, leaving only the Known (understanding) data, then the Chinese Toolbox built-in dictionary will not be affected when you import data back into Chinese Toolbox. Only the Known (understanding) data in the program is updated. (In Chinese Toolbox 2012, there is a menu command that does just this.)

If any of this is unclear, proceed with the remainder of this guide. Clicking on any of the screenshots below will display the full-size image in your browser.

First, let’s start by exporting the character dictionary:


This is the easy part. Just click on “Export character dictionary” on the File menu. The export process takes only a few seconds. When finished, a new file will exist in the Chinese Toolbox documents directory as CharacterDictionary.txt.

Note that the program will always write the exported data to a CharacterDictionary.txt file in the Chinese Toolbox documents directory. If this file already exists, the next export will overwrite the original CharacterDictionary.txt without any warning. So before you export a second time, rename any existing export file to something other than CharacterDictionary.txt.

In this guide, I use Microsoft Excel 2003 to remove columns containing dictionary data only, leaving only known and need-to -learn data. After saving the new file in the proper format, it can be moved to another computer. However, Excel 2003 cannot write text files in UTF-8 format. The only Unicode format supported by Excel 2003 is UTF-16. So after saving the file in Excel, I’ll need to use another program to convert it to UTF-8 format. Let’s get started.

First, click on the Open menu item under the Excel File menu. The Excel Open dialog will appear. Select the CharacterDictionary.txt file in the Chinese Toolbox document directory as shown below, and click on the Open button.

The Excel import wizard will display three dialogs, one after the other. In the first, select “Delimited” for the file type and “65001 : Unicode (UTF-8)” for the file origin, as shown below:

At the second import wizard dialog (below), just click the Next button. The “Tab” checkbox should already be checked. If it isn’t, check it.

At the third import wizard dialog (below), you shouldn’t need to make any changes. Just click on the Finish button.

After a few seconds, Excel will display the dictionary file in its window. After a little formatting (wrapping text and widening columns), the Excel window appears as follows: 

At this point the columns unrelated to your understanding of characters can be removed. You should be left with the following:

Now save this file in Unicode format. Select “Save As” from the Excel File menu. The following will appear:

Click the Save button, and the following dialog will appear requesting overwrite confirmation.

Click the “Yes” button, then at the following dialog click “Yes” to confirm writing of the file in Unicode format.

At this point you can close Excel. When you do so, Excel will present the following dialog:

This appears because the file has not been saved in the native Excel spreadsheet format. At this dialog, click “No” to confirm that you do not want to save the file again.

CharacterDictionary.txt now contains character understanding data and need to learn data from the original Chinese Toolbox dictionary. However, the file is not yet in the proper format. It still needs to be converted from one Unicode format to another, that is, from UTF-16 to UTF-8. A number of free Unicode text editors are available from various download sites.  Two that work for me are BabelPad and TxtEdit. At the time of this writing, the BabelStone web site was down. Just do a search; BabelPad is available from several software download sites.

After you convert the file to UTF-8 format, you’re ready to copy the file to another computer running Chinese Toolbox. Just place the file, CharacterDictionary.txt, in the Chinese Toolbox documents directory on the second computer. Start up Chinese Toolbox, and click on File, then “Import all data from CharacterDictionary.txt”. The program will automatically shut down when the import is complete.  Start Chinese Toolbox again to begin using it with updated character understanding data.





See the updates for
Chinese Toolbox and Toolbox Coding

especially the new Chinese Toolbox