Line ending conversion

Overview and background on line ending conversion

A line break in computer text is represented by one or more invisible characters. On Unix systems, the break is the single "\n" (ASCII 10, octal \012). Mac systems use "\r" (ASCII 13, octal \015). Windows systems and Internet protocols use both characters together: "\r\n".

Systems which transfer files or text streams from one platform to another will sometimes need to convert the line endings to match the convention of the destination platform.

To convert or not to convert line endings

Source code files often need to be forced to use a certain line ending convention. In particular, scripts and code files that execute on a Unix server may need the \n line ending. Javascript code (and HTML files with embedded Javascript) will often need either \n or \r\n endings. The Mac "\r" endings will prevent Javascript from executing properly in many web browsers.

Aside from those few special cases, most text files do not require conversion. Most programs which interact with files are line-ending tolerant and will accept anything that matches a known ending.

Furthermore, binary files will become corrupted if they undergo line-ending conversion. Binary files don't have logical line endings, but they do contain embedded "\r" and "\n" characters. Those characters will be changed, and possibly expanded or contracted, by the algorithm. The result is a binary file whose size and content has changed in unpredictable ways, and which likely cannot be read by applications. Line-ending conversion is a process which destroys information. It is not possible to recover a file that has been corrupted in this way.

Because of the risk, line ending conversion should only be performed when needed, and should only be performed on well-understood text-only content.

Settings related to line endings

Because of the risk of corruption, and because conversion is needed only in very special circumstances, the Genesis Web Authoring System limits line-ending conversion to files with specific extensions: pl cgi sh php php3 js txt htm html shtml css.

Conversion can be disabled altogether by going to Admin Page => My Account => Line Ending Conversion. The set of file extensions used for conversion can also be customized there, as well as the type of line endings to use (Mac, Unix, or Windows). All conversion settings are controlled on a per-user basis.

Situations in which Genesis will convert line endings

  • Template Editor => Build Template

    All multi-line textarea inputs which contain line breaks will be converted *if* the Convert Line Endings setting is enabled (regardless of the file extensions within).

    No status information is returned to screen to confirm whether line ending conversion was applied on multi-line text streams.

  • Edit File interface

    The textarea input from HTML Editor => Edit or HTML Editor => Create New File will be converted *if* the Convert Line Endings setting is enabled, and *if* the file extension of the edited file matches the list.

    If line-ending conversion is enabled, the status message will read:

    "wrote to file 'foo.html' in ASCII mode"

    Otherwise, the status message will simply read:

    "wrote to file 'foo.html'"

  • Uploaded and Imported Files

    All uploaded files -- including those uploaded within a template, those uploaded from the single file upload form, those from the multi-file upload form, and those from the Add Web Files interface -- will be converted *if* the Convert Line Endings setting is enabled, and *if* the file extension of the uploaded or imported file matches the list.

    The status message will indicate whether conversion was applied:

    "file 'foo.html' has been uploaded in ascii/text mode" (line ending conversion applied)

    "file 'foo.html' has been uploaded in binary mode" (no conversions)

    For HTTP imports, the messages are:

    "saved 123 bytes in ascii/text mode" (line ending conversion applied)

    "saved 123 bytes in binary mode" (no conversions)

Advanced: Conversion algorithm

All code conversion has been extracted to subroutine force_eoln() which is defined in script file "genesis/index.pl". You may edit that subroutine if you need to customize the behavior.

The algorithm is based on force_CRLF from the Common library. All line endings are first converted to the Internet-standard \r\n using the following three conversions:

  • First, all \r\n are converted to \n

  • Next, all remaining \r are converted to \n

  • Finally, all \n are converted to \r\n

In this way, all original \r\n are preserved, while any bare \r or \n are expanded to full \r\n. This is all covered by the standard force_CRLF call. Next:

  • For Windows standard \r\n, nothing is done.

  • For Unix standard \n, a single \r\n => \n conversion is done.

  • For Mac standard \r, a single \r\n => \r conversion is done.

History

In Genesis versions 2.1.0.0024 and earlier, each user had a setting "Upload All Text Files in ASCII Mode" which was enabled by default and which applied "\n" endings to all files which passed Perl's -T is-text-file test. The Perl -T test was returning false positives on PDF files (forcing conversion and corrupting their contents) which led to a change in the approach.

With build 0025 and newer, the setting has been renamed "Line Ending Conversion" and has been complemented by a custom file extension list which does not include PDF. The -T switch is no longer used. In addition, users can choose which of the three standard line endings they want, instead of being forced to accept Unix \n.