Byte

From Wikipedia, the free encyclopedia

Jump to: navigation, search

The byte (pronounced /ˈbaɪt/) is a unit of digital information in computing and telecommunications. It is an ordered collection of bits, in which each bit denotes the binary value of 1 or 0. Historically, a byte was the number of bits (typically 6, 7, 8, or 9) used to encode a character of text in a computer[1][2] and it is for this reason the basic addressable element in many computer architectures. The size of a byte is typically hardware dependent, but the modern de facto standard is 8 bits, as this is a convenient power of 2. Most of the numeric values used by many applications are representable in 8 bits and processor designers optimize for this common usage. Signal processing applications tend to operate on larger values and some digital signal processors have 16 or 40 bits as the smallest unit of addressable storage (on such processors a byte may be defined to contain this number of bits).

The term octet was explicitly defined to denote a sequence of 8 bits because of the ambiguity associated with the term byte and is widely used in communications protocol specifications.

Contents

[edit] History

The term byte was coined by Dr. Werner Buchholz in July 1956, during the early design phase for the IBM Stretch computer.[3][4][5] Originally it was defined in instructions by a 4-bit field, allowing sixteen values and typical I/O equipment of the period used six-bit bytes. A fixed eight-bit byte size was later adopted and promulgated as a standard by the System/360. The term byte stems from bite, as in the smallest amount of data a computer could bite at once.[citation needed] The spelling change not only reduced the chance of a bite being mistaken for a bit, but also was consistent with the penchant of early computer scientists to make up words and change spellings. A byte was also often specifically qualified as an 8-bit unit, reinforcing the notion that it was a tuple of 8 bits, and that other sizes were possible.

  • A contiguous sequence of binary bits in a serial data stream, such as in modem or satellite communications, which is the smallest meaningful unit of data. These bytes might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.
  • A data type in certain programming languages. The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits[6][7]. The actual number of bits in a particular implementation is documented as CHAR_BIT as implemented in the limits.h file. Java's primitive byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from −128 to 127.

Early microprocessors, such as Intel 8008 (the direct predecessor of the 8080, and then 8086) could perform a small number of operations on four bits, such as the DAA (decimal adjust) instruction, and the half carry flag, that were used to implement decimal arithmetic routines. These four-bit quantities were called nibbles, in homage to the then-common 8-bit bytes.

Historical IETF documents cite varying examples of byte sizes. RFC 608 mentions byte sizes for FTP hosts (the FTP-BYTE-SIZE attribute in host tables for the ARPANET) to be 36 bits for PDP-10 computers and 32 bits for IBM 360 systems.[8]

[edit] Size

Architectures that did not have eight-bit bytes include the CDC 6000 series scientific mainframes that divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held character data from punched Hollerith cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit display code characters, due to the 12-bit I/O architecture of the machine. The PDP-10 used assembly instructions LDB and DPB to load and deposit bytes of any width from 1 to 36-bits—these operations survive today in Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the PDP-10. The UNIVAC 1100/2200 series computers (now Unisys) addressed in both 6-bit (Fieldata) and nine-bit (ASCII) modes within its 36-bit word.

Factors behind the ubiquity of the eight bit byte include the popularity of the IBM System/360 architecture, introduced in the 1960s, and the 8-bit microprocessors, introduced in the 1970s. The term octet unambiguously specifies an eight-bit byte (such as in protocol definitions, for example).

[edit] Unit symbol or abbreviation

Prefixes for bit and byte multiples
Decimal
Value SI
1000 k kilo
10002 M mega
10003 G giga
10004 T tera
10005 P peta
10006 E exa
10007 Z zetta
10008 Y yotta
Binary
Value IEC JEDEC
1024 Ki kibi K kilo
10242 Mi mebi M mega
10243 Gi gibi G giga
10244 Ti tebi
10245 Pi pebi
10246 Ei exbi
10247 Zi zebi
10248 Yi yobi

IEEE 1541 and Metric-Interchange-Format specify B as the symbol for byte (e.g., MB means megabyte), while IEC 60027 seems silent on the subject. Unfortunately, B is also used for bel, another unit used in the same field. The use of B to stand for bel is consistent with the metric system convention that capitalized symbols are for units named after a person (in this case Alexander Graham Bell); usage of a capital B to stand for byte is not consistent with this convention. However, there is little danger of confusion because the decibel (dB) is used almost exclusively for bel measurements, while the decibyte (1/10 of a byte) is never used.

The unit symbol KB is commonly used for kilobyte, but is often confused with the use of kb to mean kilobit. IEEE 1541 specifies b as the symbol for bit, however, the IEC 60027 and Metric-Interchange-Format specify bit (e.g., Mbit for megabit) for the symbol, achieving maximum disambiguation from byte.

The lowercase letter o for octet is a commonly used symbol in several non-English-speaking countries, and is also used with metric prefixes (for example, ko and Mo).

Today the harmonized ISO/IEC IEC 80000-13:2008 - Quantities and units -- Part 13: Information science and technology standard cancels and replaces subclauses 3.8 and 3.9 of IEC 60027-2:2005 (those related to Information theory and Prefixes for binary multiples). See Units of Information for detailed discussion on names for derived units.

[edit] Unit multiples

Linearly growing percentage of the difference between decimal and binary interpretations of the unit prefixes when plotted against the logarithm of storage size.

There has been considerable confusion about the meanings of SI (or metric) prefixes used with the unit byte, especially concerning prefixes such as kilo (k or K) and mega (M) as shown in the chart Prefixes for bit and byte. Since computer memory is designed with binary logic, multiples are expressed in powers of 2, rather than 10. The software and computer industries often use binary estimates of the SI-prefixed quantities, while producers of computer storage devices prefer the SI values. This is the reason for specifying computer hard drive capacities of, say, 100 GB, when it contains 100 GiB of storage space, or 93 GB of storage space.

While the numerical difference between the decimal and binary interpretations is small for the prefixes kilo and mega, it grows to over 20% for prefix yotta, illustrated in the linear-log graph (see right) of difference versus storage size.

[edit] See also

[edit] References

  1. ^ R.W. Bemer and W. Buchholz (1962). "Planning a Computer System - Project Stretch". in Werner Buchholz. http://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/Buchholz_102636426.pdf. , Chapter 6, Character Set
  2. ^ R.W. Bemer, A proposal for a generalized card code of 256 characters, Commun. ACM 2 (9), pp19–23, 1959.
  3. ^ Bob Bemer. "Origins of the Term "BYTE"". http://www.trailing-edge.com/~bobbemer/BYTE.HTM. Retrieved 2007-08-12. 
  4. ^ Werner Buchholz (July 1956). "TIMELINE OF THE IBM STRETCH/HARVEST ERA (1956–1961)". computerhistory.org. http://archive.computerhistory.org/resources/text/IBM/Stretch/102636400.txt. 
  5. ^ "byte definition". http://catb.org/~esr/jargon/html/B/byte.html. 
  6. ^ [26] Built-in / intrinsic / primitive data types, C++ FAQ Lite
  7. ^ Integer Types In C and C
  8. ^ RFC 608, Host Names On-Line, M.D. Kudlick, SRI-ARC (January 10, 1974)