|
Line ending conversion Overview and background on line ending conversion A line break in computer text is represented by one or more invisible characters. On Unix systems, the break is the single "\n" (ASCII 10, octal \012). Mac systems use "\r" (ASCII 13, octal \015). Windows systems and Internet protocols use both characters together: "\r\n". Systems which transfer files or text streams from one platform to another will sometimes need to convert the line endings to match the convention of the destination platform. To convert or not to convert line endings Source code files often need to be forced to use a certain line ending convention. In particular, scripts and code files that execute on a Unix server may need the \n line ending. Javascript code (and HTML files with embedded Javascript) will often need either \n or \r\n endings. The Mac "\r" endings will prevent Javascript from executing properly in many web browsers. Aside from those few special cases, most text files do not require conversion. Most programs which interact with files are line-ending tolerant and will accept anything that matches a known ending. Furthermore, binary files will become corrupted if they undergo line-ending conversion. Binary files don't have logical line endings, but they do contain embedded "\r" and "\n" characters. Those characters will be changed, and possibly expanded or contracted, by the algorithm. The result is a binary file whose size and content has changed in unpredictable ways, and which likely cannot be read by applications. Line-ending conversion is a process which destroys information. It is not possible to recover a file that has been corrupted in this way. Because of the risk, line ending conversion should only be performed when needed, and should only be performed on well-understood text-only content. Settings related to line endings Because of the risk of corruption, and because conversion is needed only in very special circumstances, the Genesis Web Authoring System limits line-ending conversion to files with specific extensions: pl cgi sh php php3 js txt htm html shtml css. Conversion can be disabled altogether by going to Admin Page => My Account => Line Ending Conversion. The set of file extensions used for conversion can also be customized there, as well as the type of line endings to use (Mac, Unix, or Windows). All conversion settings are controlled on a per-user basis. Situations in which Genesis will convert line endings
Advanced: Conversion algorithm All code conversion has been extracted to subroutine The algorithm is based on force_CRLF from the Common library. All line endings are first converted to the Internet-standard \r\n using the following three conversions:
In this way, all original \r\n are preserved, while any bare \r or \n are expanded to full \r\n. This is all covered by the standard force_CRLF call. Next:
History In Genesis versions 2.1.0.0024 and earlier, each user had a setting "Upload All Text Files in ASCII Mode" which was enabled by default and which applied "\n" endings to all files which passed Perl's -T is-text-file test. The Perl -T test was returning false positives on PDF files (forcing conversion and corrupting their contents) which led to a change in the approach. With build 0025 and newer, the setting has been renamed "Line Ending Conversion" and has been complemented by a custom file extension list which does not include PDF. The -T switch is no longer used. In addition, users can choose which of the three standard line endings they want, instead of being forced to accept Unix \n. |