Bush hid the facts

(Learn how and when to remove this message)

"Bush hid the facts" is a common name for a bug present in Microsoft Windows which causes text encoded in ASCII to be interpreted as if it were UTF-16LE, resulting in garbled text. When the string "Bush hid the facts", without quotes, was put in a new Notepad document and saved, closed, and reopened, the nonsensical sequence of the Chinese characters "" would appear instead.[citation needed]

While "Bush hid the facts" is the sentence most commonly presented to induce the error, the bug can be triggered by other strings, for example "hhhh hhh hhh hhhhh"[1] or "this app can break",[2] and even "a " or "z!".[3]

Diagram explaining the bug

The bug occurs when the string is passed to the Win32 charset detection function IsTextUnicode. IsTextUnicode guesses it is Unicode if the "hi byte" (the odd indexes) changes three times less than the "low byte",[3] if so it returns true, and the application then incorrectly interprets the text as UTF-16LE.[4]

The bug had existed since IsTextUnicode was introduced with Windows NT 3.5 in 1994, but was not discovered until early 2004.[5] Many text editors and tools exhibit this behavior on Windows because they use IsTextUnicode to determine the encoding of text files. As of Windows Vista, Notepad has been modified to use a different detection algorithm that does not exhibit the bug, but IsTextUnicode remains unchanged in the operating system, so any other tools that use the function are still affected.[6]

Workarounds

Several workarounds exist for this bug:

  • Add a character so the string is an odd number of bytes long.
  • If the file is saved as "UTF-8" (before 2018) or "UTF-8 with BOM" (after 2018) rather than "ANSI" the text loads correctly, because Notepad prepends a UTF-8 byte order mark, which is a pattern that does not trigger the bug.[citation needed] Opening a file that is valid UTF-8 without the byte order mark would still trigger the bug, as this sequence is represented identically in UTF-8 as in ASCII.
  • The bug is also avoided by saving as "Unicode", which in Microsoft Windows means UTF-16LE. When loading this text IsTextUnicode should (and does) return true and the text is correct.
  • To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. Under Windows 2000, Notepad lacks the "Encoding" list box. WordPad appears to load the text correctly without choosing the encoding, since it uses its own encoding detection.

References

External links

Retrieved from "https:https://www.search.com.vn/wiki/index.php?lang=en&q=Bush_hid_the_facts&oldid=1221154682"
🔥 Top keywords: Main PageSpecial:SearchIndian Premier LeagueWikipedia:Featured picturesPornhubUEFA Champions League2024 Indian Premier LeagueFallout (American TV series)Jontay PorterXXXTentacionAmar Singh ChamkilaFallout (series)Cloud seedingReal Madrid CFCleopatraRama NavamiRichard GaddDeaths in 2024Civil War (film)Shōgun (2024 miniseries)2024 Indian general electionJennifer PanO. J. SimpsonElla PurnellBaby ReindeerCaitlin ClarkLaverne CoxXXX (film series)Facebook2023–24 UEFA Champions LeagueYouTubeCandidates Tournament 2024InstagramList of European Cup and UEFA Champions League finalsJude BellinghamMichael Porter Jr.Andriy LuninCarlo AncelottiBade Miyan Chote Miyan (2024 film)