Unicode File Transfers
EFT’s support for UTF-8 encoded Unicode characters extends to:
Inbound protocols:
Event Rules:
-
Copy/Move and Download action wizards (all protocols) when specifying "UTF-8" as the filename encoding, and when using wildcards for the source filename, e.g. (*.dat, or *.*)
-
Advanced Workflow Engine when passed filename-related context variables (for example, %FS.PATH%, %FS.FILE_NAME%, etc.)
Auditing:
-
EFT’s summary Client Log (CL)
-
EFT’s extended client logs (e.g. LAN copy, FTP extended, and SFTP debug logs)
-
EFT’s debug log (Log4Cplus)
Exclusions*:
-
FTP protocol (inbound)
-
All Event Rule actions that process a filename related context variables (e.g. %FS.PATH%, %FS.FILE_NAME%, etc.); the only exception is Advanced Workflow Engine actions
-
Folder Monitor events. Windows will notify EFT when a Unicode file is dropped into a monitored folder, but EFT cannot (at present) pass the UTF-8 encoded filename context variable off to Event Rule actions for processing. The only exception being when the action is an AWE action, in which case UTF-8 encoding is preserved. Do not be tempted to use wildcards as the source filename for Folder Monitor rules (even if polling only is used), as this will lead to race conditions and other problems. Wildcards should only be used in rules that don’t use filename context variables, such as Timer, user, or system related events.
-
ARM database and all logs not explicitly mentioned above
-
No User Interface (UI) components. This means you cannot specify Unicode characters in Event Rules or anywhere else in the administration interface
-
No COM API support for Unicode
-
EFT does not support UTF-8 filenames over AS2
*UTF-8 will be more comprehensive in a future version. Refer to Unicode Exceptions for more information.
Unicode FAQs
The FAQs below are provided to answer questions you may have regarding EFT's Unicode support.
What is Unicode?
Unicode is a standard that provides a unique number for every character, regardless of platform, program, or language. Systems that don’t support Unicode and without the proper ANSI code page will render characters such as 大きい魚 ???????. or .
Does EFT support Unicode?
EFT partially supports Unicode and is moving towards full support.
What about UTF-8?
UTF-8 is simply a popular mechanism for encoding Unicode characters using one or more bytes. Prior to supporting UTF-8, EFT used ANSI code pages to view filenames in the intended format (on the target system when browsing with the WTC or PTC).
What other mechanisms for encoding Unicode characters does EFT support?
EFT uses full double byte UCS-2 encoding at the file system (I/O) level, UTF-8 encoding within EFT, and ASCII everywhere Unicode is not yet supported.
Does EFT support UTF-8 for file transfers?
EFT preserves UTF-8 encoded filenames when transferring files over HTTP and SFTP when acting as a server, and over all supported protocols when acting as a client, when certain conditions are met (see next question).
What about EFT’s Event Rules?
EFT’s Copy/Move and Download Action wizards (across all protocols) support Unicode when you specify "UTF-8" as the filename encoding method (radio button in the wizard), and when using wildcards for the source filename, e.g. (*.dat, or *.*). However, UTF-8 is not supported for these Actions if you use %FS.PATH% or any other variable for the source filename, which means the Folder Monitor Event cannot be used to offload files and conserve their Unicode format. In fact, the only Action that supports UTF-8-encoded filenames through context variables is an AWE workflow task.
Which client applications can I use to see Unicode filenames when I transfer files to EFT?
EFT's Web Transfer Client (WTC) supports UTF-8. For file transfer applications that do NOT support UTF-8, Unicode filenames will appear as "???????.exe" when using them to transfer files to/from EFT. CuteFTP v9 supports UTF-8.
Can EFT audit or log filenames or other data with Unicode characters?
EFT’s summary Client Log (CL), extended client logs (LAN transfer logs, FTP logs, SFTP debug logs), and debug log (eft.log), and AWE’s logs all support Unicode characters. EFT’s EX logs, cmd out logs, and ARM (both auditing and reporting) do NOT support Unicode characters.
If this filename: 梅雨右折車線_XYZ.ISO is transferred to EFT, how will it appear on disk? In reports? In EFT’s Event Rules?
EFT will store the file to disk and conserve the original Unicode filename. The filename will be audited properly to EFT’s eft.log, but will be down converted to ASCII when audited to the EX log and to the ARM database, resulting in a filename that may look like this: ??????_XYZ.iso, which is also how it appears in EFT’s reports. The reason the last three characters and file extension are conserved is that UTF-8 and ASCII characters are identical for English characters (A-Z). So there is no loss of meaning (fidelity) after performing a UTF-8-to-ASCII conversion. This same UTF-8-to-ASCII conversion applies when EFT hands off the filename to the Event Rule dispatcher, except where an AWE action exists, in which case the filename context variable will retain the original UTF-8 encoded filename. Thus if data integration of UTF-8 encoded filenames is needed, you should consider deploying AWE tasks alongside EFT’s Event Rules.
How do Unicode filenames appear in EFT’s administration interface?
EFT’s administration interface (AI) does not support Unicode characters. UTF-8 is always down converted to ASCII in the AI. This means you can’t specify a unique UTF-8 encoded filename in EFT’s offload wizard, a UTF-8 encoded username, path, or anything else for that matter. The ONLY way to process Unicode filenames in the Copy/Move and Download Actions is to use wildcards (*.*, *.dat, etc.) as the source filename, instead of using a specific filename such as梅雨右折車線.ISO.
Will Unicode encoded filenames be preserved in EFT Server’s context variables, such as FS.FILENAME or FS.PATH?
Yes and no. For all Event Rule Events, Conditions, and Actions EFT will down convert the UTF-8 characters into ASCII. The only exception is when those variables are passed to AWE. In that case alone, EFT conserves the UTF-8 encoded filename, so that AWE can consume the original UTF-8 encoded filename, as AWE is fully UTF-8 compliant.
Does EFT’s internal handling of the file differ depending on whether the file was received in ASCII or Unicode?
In the guts of EFT it handles everything in Unicode. Conversion back to ASCII occurs only when working with a system or capability that doesn’t support Unicode.
Related Topics