Shebang

Shebang - 维基百科,自由的百科全书 https://zh.wikipedia.org/wiki/Shebang

Shebang (Unix) - Wikipedia https://en.wikipedia.org/wiki/Shebang_%28Unix%29

计算领域中,Shebang(也称为Hashbang)是一个由井号叹号构成的字符序列#!,其出现在文本文件的第一行的前两个字符。 在文件中存在Shebang的情况下,类Unix操作系统程序加载器会分析Shebang后的内容,将这些内容作为解释器指令,并调用该指令,并将载有Shebang的文件路径作为该解释器的参数[1][2]

例如,以指令#!/bin/sh开头的文件在执行时会实际调用/bin/sh程序(通常是Bourne shell或兼容的shell,例如bashdash等)来执行。这行内容也是shell脚本的标准起始行。

由于#符号在许多脚本语言中都是注释标识符,Shebang的内容会被这些脚本解释器自动忽略。 在#字符不是注释标识符的语言中,例如Scheme,解释器也可能忽略以#!开头的首行内容,以提供与Shebang的兼容性[3]

"Shebang"或者说"Hashbang"的名字有时也被当做Ajax应用程序中的分段标识符,用于浏览器的状态保存;Google网站站长中心提到,以叹号开头的分段标识符(即...url#!state...)会为Google的网页爬虫所索引。

语法[编辑]

Shebang这一语法特性由#!开头,即井号叹号。 在开头字符之后,可以有一个或数个空白字符,后接解释器的绝对路径,用于调用解释器。 在直接调用脚本时,调用者会利用Shebang提供的信息调用相应的解释器,从而使得脚本文件的调用方式与普通的可执行文件类似。

词源与历史[编辑]

Shebang的名字来自于SHArpbang,或haSH bang缩写,指代Shebang中#!两个符号的典型Unix名称。 Unix术语中,井号通常称为sharphashmesh;而叹号则常常称为bang。也有看法认为,shebang名字中的sh来自于默认shell————Bourne shell的名称,sh,因为常常使用shebang调用之。[4][1]

在2010年版的Advanced bash scripting guide页面存档备份,存于互联网档案馆)(revision 6.2)中,shebang被称为"sha-bang",同时提到"也写作she-bang或sh-bang",但该文件中没有提到"shebang"这一形式。[1]

丹尼斯·里奇在被问及他会如何称呼这一特性时,他答道:

发信人:"Ritchie, Dennis M (Dennis)** CTR **" <dmr@[redacted]>

收信人:<[redacted]@talisman.org>

日期:Thu, 19 Nov 2009 18:37:37 -0600

主题:RE: What do -you- call your #!<something> line?

我不记得我们曾经给它取过一个适当的名字。导入这一特性已经是相当晚了--我觉得我是从关于伯克利Unix的UCB会议上的某人那里得到的这一灵感;我可能是首先实现它的人之一,但这个创意是来自于别人的。

至于它的名字:可能是类似于"hash-bang"的英国风描述性文字,但我没有在任何场合使用类似宠物的名字来描述它。

此致,

Dennis

例子[编辑]

下面列出了一些典型的 shebang 解释器指令:

  • #!/bin/sh—使用sh,即Bourne shell或其它兼容shell执行脚本
  • #!/bin/csh—使用csh,即C shell执行
  • #!/usr/bin/perl -w—使用带警告的Perl执行
  • #!/usr/bin/python -O—使用具有代码优化的Python执行
  • #!/usr/bin/php—使用PHP的命令行解释器执行

在许多系统上,/bin/sh软链接硬链接Bash,而/bin/csh则链接到tcsh,因此设定前面的解释器实际上是运行的与之兼容的,或改进的版本。

Shebang行也可以包含需要传递到解释器的特定选项(见下文的Perl例子)。然而,选项传递的方式随实现的不同而不同。

用途[编辑]

解释器指令允许脚本和数据文件充当系统命令,无需在调用时由用户指定解释器,从而对用户和其它程序隐藏其实现细节。

假设/usr/local/bin/foo中有一以下行开头的Bourne shell脚本 #!/bin/sh -x 而它被如此调用("$"是命令提示符) $ foo bar 该命令的输出等同于 $ /bin/sh -x /usr/local/bin/foo bar 除了argv[0]被设定为脚本的文件名,而非解释器的文件名外。

由于sh从其命令行指定的文件中读取命令,上面的命令就会执行/usr/local/bin/foo中的命令,同时,将bar作为foo命令的参数$1

由于shebang开头的井号也是Bourne shell和许多其它解释性语言的注释符,因此在这些语言中,解释器指令本身会被解释器认为是单纯的注释而跳过。 然而,并不是每一种解释器都会自动忽略shebang行,例如对于下面的脚本,cat会把文件中的两行都输出到标准输出中。

#!/bin/cat
Hello world!

使用#!/usr/bin/env 脚本解释器名称是一种常见的在不同平台上都能正确找到解释器的办法。

Linux的操作系统的文件一般是UTF-8编码。如果脚本文件是以UTF-8的BOM(0xEF 0xBB 0xBF)开头的,那么exec函数将不会启动shebang指定的解释器来执行该脚本。因此,Linux的脚本文件不应在文件开头包含UTF-8的BOM

参见[编辑]

In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. It is also called sha-bang,[1][2] hashbang,[3][4] pound-bang,[5][6] or hash-pling.[7]

When a text file with a shebang is used as if it is an executable in a Unix-like operating system, the program loader mechanism parses the rest of the file's initial line as an interpreter directive. The loader executes the specified interpreter program, passing to it as an argument the path that was initially used when attempting to run the script, so that the program may use the file as input data.[8] For example, if a script is named with the path path/to/script, and it starts with the following line, #!/bin/sh, then the program loader is instructed to run the program /bin/sh, passing path/to/script as the first argument. In Linux, this behavior is the result of both kernel and user-space code.[9]

The shebang line is usually ignored by the interpreter, because the "#" character is a comment marker in many scripting languages; some language interpreters that do not use the hash mark to begin comments still may ignore the shebang line in recognition of its purpose.[10]

Syntax[edit]

The form of a shebang interpreter directive is as follows:[8]

#!interpreter [optional-arg]

in which interpreter is generally an absolute path to an executable program. The optional argument is a string representing a single argument. White space after #! is optional.

In Linux, the file specified by interpreter can be executed if it has the execute right and contains code which the kernel can execute directly, if it has a wrapper defined for it via sysctl (such as for executing Microsoft .exe binaries using wine), or if it contains a shebang. On Linux and Minix, an interpreter can also be a script. A chain of shebangs and wrappers yields a directly executable file that gets the encountered scripts as parameters in reverse order. For example, if file /bin/A is an executable file in ELF format, file /bin/B contains the shebang #!/bin/A optparam, and file /bin/C contains the shebang #!/bin/B, then executing file /bin/C resolves to /bin/B /bin/C, which finally resolves to /bin/A optparam /bin/B /bin/C.

In Solaris- and Darwin-derived operating systems (such as macOS), the file specified by interpreter must be an executable binary and cannot itself be a script.[11]

Examples[edit]

Some typical shebang lines:

  • #!/bin/sh – Execute the file using the Bourne shell, or a compatible shell, assumed to be in the /bin directory
  • #!/bin/bash – Execute the file using the Bash shell
  • #!/usr/bin/pwsh – Execute the file using PowerShell
  • #!/usr/bin/env python3 – Execute with a Python interpreter, using the env program search path to find it
  • #!/bin/false – Do nothing, but return a non-zero exit status, indicating failure. Used to prevent stand-alone execution of a script file intended for execution in a specific context, such as by the . command from sh/bash, source from csh/tcsh, or as a .profile, .cshrc, or .login file.

Shebang lines may include specific options that are passed to the interpreter. However, implementations vary in the parsing behavior of options; for portability, only one option should be specified without any embedded whitespace. Further portability guidelines are found below.

Purpose[edit]

Interpreter directives allow scripts and data files to be used as commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.

Bourne shell script that is identified by the path some/path/to/foo, has the initial line,

#!/bin/sh -x

and is executed with parameters bar and baz as

some/path/to/foo bar baz

provides a similar result as having actually executed the following command line instead:

/bin/sh -x some/path/to/foo bar baz

If /bin/sh specifies the Bourne shell, then the end result is that all of the shell commands in the file some/path/to/foo are executed with the positional variables $1 and $2 having the values bar and baz, respectively. Also, because the initial number sign is the character used to introduce comments in the Bourne shell language (and in the languages understood by many other interpreters), the whole shebang line is ignored by the interpreter.

However, it is up to the interpreter to ignore the shebang line; thus, a script consisting of the following two lines simply echos both lines to standard output when run:

#!/bin/cat
Hello world!

Strengths[edit]

When compared to the use of global association lists between file extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extension namespace (where one file extension refers to more than one file type), and allows the implementation language of a script to be changed without changing its invocation syntax by other programs. Invokers of the script need not know what the implementation language is as the script itself is responsible for specifying the interpreter to use.

Portability[edit]

Program location[edit]

Shebangs must specify absolute paths (or paths relative to current working directory) to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. Python, for example, might be in /usr/bin/python3/usr/local/bin/python3, or even something like /home/username/bin/python3 if installed by an ordinary user.

A similar problem exists for the POSIX shell, since POSIX only required its name to be sh, but did not mandate a path. A common value is /bin/sh, but some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh.[12] In many Linux systems, /bin/sh is a hard or symbolic link to /bin/bash, the Bourne Again shell (BASH). Using bash-specific syntax while maintaining a shebang pointing to sh is also not portable.[13]

Because of this it is sometimes required to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this reason and because POSIX does not standardize path names, POSIX does not standardize the feature.[14] The GNU Autoconf tool can test for system support with the macro AC_SYS_INTERPRETER.[15]

Often, the program /usr/bin/env can be used to circumvent this limitation by introducing a level of indirection#! is followed by /usr/bin/env, followed by the desired command without full path, as in this example:

#!/usr/bin/env sh

This mostly works because the path /usr/bin/env is commonly used for the env utility, and it invokes the first sh found in the user's $PATH, typically /bin/sh.

This still has some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.

Character interpretation[edit]

Another portability problem is the interpretation of the command arguments. Some systems, including Linux, do not split up the arguments;[16] for example, when running the script with the first line like,

#!/usr/bin/env python3 -c

all text after the first space is treated as a single argument, that is, python3 -c will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin also behaves this way.

Complex interpreter invocations are possible through the use of an additional wrapper. FreeBSD 6.0 (2005) introduced a -S option to its env as it changed the shebang-reading behavior to non-splitting. This option tells env to split the string itself.[17] The GNU env utility since coreutils 8.30 (2018) also includes this feature.[18] Although using this option mitigates the portability issue on the kernel end with splitting, it adds the requirement that env supports this particular extension.

Another problem is scripts containing a carriage return character immediately after the shebang line, perhaps as a result of being edited on a system that uses DOS line breaks, such as Microsoft Windows. Some systems interpret the carriage return character as part of the interpreter command, resulting in an error message.[19]

Magic number[edit]

The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII of #!. This magic number is detected by the "exec" family of functions, which determine whether a file is a script or an executable binary. The presence of the shebang will result in the execution of the specified executable, usually an interpreter for the script's language. It has been claimed[20] that some old versions of Unix expect the normal shebang to be followed by a space and a slash (#! /), but this appears to be untrue;[21] rather, blanks after the shebang have traditionally been allowed, and sometimes documented with a space (see the 1980 email in history section below).

The shebang characters are represented by the same two bytes in extended ASCII encodings, including UTF-8, which is commonly used for scripts and other text files on current Unix-like systems. However, UTF-8 files may begin with the optional byte order mark (BOM); if the "exec" function specifically detects the bytes 0x23 and 0x21, then the presence of the BOM (0xEF 0xBB 0xBF) before the shebang will prevent the script interpreter from being executed. Some authorities recommend against using the byte order mark in POSIX (Unix-like) scripts,[22] for this reason and for wider interoperability and philosophical concerns. Additionally, a byte order mark is not necessary in UTF-8, as that encoding does not have endianness issues; it serves only to identify the encoding as UTF-8.

Etymology[edit]

An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter. The name shebang for the distinctive two characters may have come from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names for them. Another theory on the sh in shebang is that it is from the default shell sh, usually invoked with shebang.[23] This usage was current by December 1989,[24] and probably earlier.

History[edit]

The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD[25] and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.

The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix in 1979,[26] which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec() system trap would fail, preventing scripts from behaving uniformly as normal system commands.

In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie introduced kernel support for interpreter directives in January 1980, for Version 8 Unix, with the following description:[25]

From uucp Thu Jan 10 01:37:58 1980
>From dmr Thu Jan 10 04:25:49 1980 remote from research

The system has been changed so that if a file being executed
begins with the magic characters #! , the rest of the line is understood
to be the name of an interpreter for the executed file.
Previously (and in fact still) the shell did much of this job;
it automatically executed itself on a text file with executable mode
when the text file's name was typed as a command.
Putting the facility into the system gives the following
benefits.

1) It makes shell scripts more like real executable files,
because they can be the subject of 'exec.'

2) If you do a 'ps' while such a command is running, its real
name appears instead of 'sh'.
Likewise, accounting is done on the basis of the real name.

3) Shell scripts can be set-user-ID.

[a]

4) It is simpler to have alternate shells available;
e.g. if you like the Berkeley csh there is no question about
which shell is to interpret a file.

5) It will allow other interpreters to fit in more smoothly.

To take advantage of this wonderful opportunity,
put

  #! /bin/sh
 
at the left margin of the first line of your shell scripts.
Blanks after ! are OK.  Use a complete pathname (no search is done).
At the moment the whole line is restricted to 16 characters but
this limit will be raised.

The feature's creator didn't give it a name, however:[28]

From: "Ritchie, Dennis M (Dennis)** CTR **" <dmr@[redacted]>
To: <[redacted]@talisman.org>
Date: Thu, 19 Nov 2009 18:37:37 -0600
Subject: RE: What do -you- call your #!<something> line?

 I can't recall that we ever gave it a proper name.
It was pretty late that it went in--I think that I
got the idea from someone at one of the UCB conferences
on Berkeley Unix; I may have been one of the first to
actually install it, but it was an idea that I got
from elsewhere.

As for the name: probably something descriptive like
"hash-bang" though this has a specifically British flavor, but
in any event I don't recall particularly using a pet name
for the construction.

Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.[29]

This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).

Note that, even in systems with full kernel support for the #! magic number, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants. Scripts are then interpreted by the user's default shell.

See also[edit]

 

posted @ 2021-12-16 13:41  papering  阅读(649)  评论(0编辑  收藏  举报