The standards defined here apply to the Facsimile project's source files. Many (the target is all) of these standards are testable and are included as part of the build process; source files that do not conform to these standards will be rejected and the build process will fail. This ensures that the standards are applied consistently and without exception.
Only the types of source file listed here (together with their corresponding standard file extension, encoding (with or without a byte-order-mark), MIME Types and indentation spacing, may be submitted for inclusion to the project.
| Type | File Extension | Encoding | BOM | MIME Type | Indentation Spacing |
|---|---|---|---|---|---|
| Java source file | .java | UTF-8 | No | text/x-java-source | 4 |
| Text | .txt | UTF-8 | Yes | text/plain | 4 |
| XML files | .xml | UTF-8 | Yes | application/xml | 4 |
Requests for new source file types, with a detailed rationale, must be made to a project administrator.
The more source file types that are supported, the larger the set of dependencies required to build a Facsimile implementation and the larger the set of skills required by developers and contributors. By keeping the set of supported source files to a minimum, the project goal of Simplicity is supported.
Each source file must have the standard file extension corresponding to the type of the source file. Refer to the list of supported source file types to find the appropriate file extension.
Using numerous file extensions for the same type of file (for example, using ".cpp", ".C" and ".cc" for C++ source files) complicates build and test scripts and can cause confusion for developers and contributors. By standardizing on a single file extension for each source file type, the project goal of Simplicity is supported.
Each source file must have the standard encoding corresponding to the type of the source file. Refer to the list of supported source file types to find the appropriate encoding.
The preferred encoding is UTF-8. UTF-8 is the standard encoding for source file types whose parsers support it; where UTF-8 is not supported, the 8-bit ASCII Latin-1 character set is the standard encoding.
UTF-8 is preferred because it is widely supported and has a unique code for every character in every transcribed language on the planet. By contrast, UTF-16 is less widely supported whilst the 8-bit ASCII Latin-1 character set can only represent 255 characters.
A UTF-8 encoded file can be recognised by the UTF-8 byte-order-mark (BOM) at the start of the file. The UTF-8 BOM is the UTF-8 encoding for the Unicode "zero-width non-breaking space" character (code U+FEFF): the byte sequence 0xEF, 0xBB, 0xBF. A text file that does not start with this three-byte sequence will not necessarily be recognized as a UTF-8 encoded file. Some UTF-8 encoded files are waived from including the UTF-8 BOM if this causes problems for parsers, as is the case with all GNU C++ compilers prior to release 4.4.