#---TABSTOP=4 *Title: Unix Seminar Notes - Advanced #--- # # # # # # # ## ##### # # #--- # # ## # # # # # # # # # # #--- # # # # # # ## # # # # # # #--- # # # # # # ## ### ###### # # # # #--- # # # ## # # # ### # # # # # # #--- #### # # # # # ### # # ##### ## ------------------------------------------------------------------------------- * About this document. This document covers _beginner_ Unix _user_ commands and information. This document was first written by Alex Batko in 1999; it was updated over the years, and most recently in September 2003. This document is licensed under the GNU Free Documentation License: http://www.gnu.org/copyleft/fdl.html This document along with others is available at the following URL: http://www.cs.mcgill.ca/~abatko/computers/unix/seminars/ Please refer all complaints and comments to: "Alex Batko" ------------------------------------------------------------------------------- *TOC ------------------------------------------------------------------------------- * Who is giving this seminar ? Seminars are given by system staff of the School of Computer Science (SOCS), of McGill Univeristy, as a service offered at the start of each fall and winter term. Apart from systems and network administration, SOCS system staff also provide a daily user support service called the SOCS Help Desk. For information about upcoming seminars, please consult: http://www.cs.mcgill.ca/socsinfo/seminars/ ------------------------------------------------------------------------------- * The SOCS Help Desk. The SOCS Help Desk is the user support centre for answers to computing questions of students, faculty, staff, and anyone else having a valid account at the School of Computer Science at McGill University. The SOCS Help Desk is physically located at the McConnell Engineering Building, in room 209N. For help, call 398-7087 or send an email to help@cs.mcgill.ca. For more information about the SOCS Help Desk, please consult: http://www.cs.mcgill.ca/socsinfo/helpdesk/ When sending mail to the Help Desk make sure to complete the subject field. Write one to four key words that are relavent to your email. Write short messages that get to the point. In most cases you should include the name(s) of the host(s), or computer(s), involved in your particular problem. For information about issues specific to SOCS, such as user accounts, the network, and services, please consult: http://www.cs.mcgill.ca/socsinfo/ ------------------------------------------------------------------------------- * File properties and default permissions. An "inode" is a data structure that describes a file. Within any file system, the number of inodes, and hence the maximum number of files, is set when the filesystem is created. An inode holds most of the important information about the file, including the on-disk address of the file's data blocks. Each inode has its own unique identification number, called an "i-number". An inode also stores the file's ownership, access mode, timestamp, and type. ^<< % stat F - Display the status of file F, including: device number, device type, inode number, access rights, number of hard links, UID, GID, total size in bytes, number of blocks allocated, time of last access, time of last modification, and time of last change. ^>> The following is the ouput of the command "stat welcome.html": ^<< File: "welcome.html" Size: 7970 Blocks: 16 Regular File Access: (0644/-rw-r--r--) Uid: (11374/ abatko) Gid: ( 10/ wheel) Device: 4 Inode: 1627279 Links: 1 Access: Thu Sep 9 11:44:37 1999 Modify: Mon Sep 6 15:47:12 1999 Change: Mon Sep 6 15:47:12 1999 ^>> ------------------------------------------------------------------------------- * Links. It is possible to have special files called "links" that point to other files. UNIX provides two different kinds of links, namely hard links, and soft links. A soft link is merely a pointer to a file that is associated with a set of data blocks, whereas a hard link is another name for the set of blocks associated with the file to which it points; in essence, a hard link contains those data blocks. Amongst other information, the filesystem associates a file's data blocks with an inode, a name, and the number of links to the data blocks. Consider a regular file "foo". Upon creation, foo has one link to its data blocks. By making a hard link to foo, called "fooH", we create another regular file, making another link to foo's data blocks, thus increasing the link count to 2. Next if we make a soft link to foo, called "fooS", we create a special file known as a symbolic link, that refers to foo, but does not increment the link count; this is because a symbolic link has its own inode. If we were to delete foo, then fooS would become a "dangling link" since what it had once pointed to is now gone, namely the file foo; however, fooH will still have full access foo's data blocks, and the original inode's link count will be reduced by 1. Assuming we had not deleted anything yet, if we delete fooH, then foo will still exist, and only the number of links to foo will be decremented, totalling 1. This is because data blocks are only "lost" when the link count goes to zero. Thus fooS will still be a valid symbolic link. When speaking of links, the default is a hard link; thus if speaking of soft links, or symbolic links, an explicit distinction must be made. In summary, when a file is created it has one link (a hard link) to its data blocks, namely its own name. When a hard link is created, another link is made to the same data blocks, thus increasing the number of hard links to a particular set of data blocks. However when a soft link (a symbolic link) is made, the number of links to the data blocks does not increase, instead the symbolic link acts as a pointer. There are some important difference between hard and symbolic links. A hard link contains the data to which it was meant to point, thus hard links can take a lot of space. A symbolic link contains only the path to the file it points to. Unlike hard links, symbolic links (aka symlinks) can span filesystems, or even computer systems if a network file system is being used. ^<< % ln F1 F2 - Create a hard link to file F1, naming it F2. % ln -s F1 F2 - Create a soft link to file F1, naming it F2. % ls -l F - List in long format, information for file F. If the file is a symbolic link, the filename is printed followed by "->" and the path name of the referenced file. % ls -Ll F - List in long format, information for the file linked to by symbolic link. The -L flag can be considered as one that dereferences a symbolic link. ^>> ------------------------------------------------------------------------------- * Advanced commands. Advanced commands are ones that an intermediate is not concerned with. Not knowing about the existance of such commands does not inhibit getting work done; however they can be considered as power tools. ^<< % tee F - Replicate the standard output, sending the copy to file F. This command is useful when used with a pipe, since one copy of the ouput (and thus be redirected to the pipe), while the second copy will go to file F. % script F - Make a typescript of a terminal session, saving the dialogue in file F. If no file name is provided, the typescript is saved in a file called typescript. Press Ctrl-D or type `exit` to quit script. % stty -a - Write to standard output all of the option setting for the terminal. % split -l n F - Split a file F into a set of files having at most n lines each. The original file F is left unchanged. % splitvt - Run two shells in a split window. Use Ctrl-W to toggle between the windows. % xargs -n num U A... - Construct a command line consisting of the utility U and the argument(s) A(...). Invoke the constructed command line and wait for its completion. The -n flag specifies how many standard input arguments to use. % basename F s - Strip all directory components from the file name F, as well as the possible suffix s. ^>> ------------------------------------------------------------------------------- * Regular Expressions: A regular expression is a pattern that describes a strings. Used in combination with the grep utility, regular expressions aid in searching for character patterns in files (described later). ^<< % man 7 regex - User's regular expressions manual. % man 3 regex - Programmer's regular expressions manual. ^>> Do a man on grep, and search for the part REGULAR EXPRESSIONS. Most of the following is a shameless transcription of selected portions of the aforementioned. Note that regular expressions are defined in POSIX 1003.2, and come in two forms: modern or "extended", and obsolete or "basic". ^<< % grep P F - Search file F for all occurrences of pattern P. ^>> Most characters including all letters and digits, are regular expressions that match themselves. ^<< a - Match the single character 'a'. hello - Match the sequence of characters 'hello'. i85 - Match the sequence of characters 'i85'. ^>> Any metacharacter with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters (described later) are PROBLEM HERE WITH AFT... ^<< . ? * + ^ $ { | ( ) [ \ \\ - Match the single character '\'. \\\\ - Match the two characters '\\'. ^>> A "bracket" expressions is a list of characters enclosed by [ and ] matches any single character in that list; if the first character in the list is the caret ^ then it matches any character _not_ in the list. ^<< [234567] - Match any single digit from '0' to '9' [^3x] - Match any single character other then '3', or 'x'. ^>> A range of ASCII characters may be specified by giving the first and last characters, separated by a hyphen. ^<< [2-7] - Match any single character in the ASCII range from '2' to '7'. [a-z] - Match any single character in the ASCII range from 'a' to 'z'. [0-9A-Za-z] - Depending upon the ASCII character encoding, this may match any character that is a digit or an upper or lower case letter. Note that ranges cannot share endpoints. Note that there are predefined classes of characters that are independent of encoding, and are thus portable. ^>> A collating-sequence can be enclosed in ``[.'' and ``.]''. ^<< [[.ch.]]*c - Matchs the first five characters of chchccc. ^>> Most metacharacters lose their special meaning inside lists, or "bracketed" expressions. To include a literal ] place it first in the list (following a possible caret ^. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last. ^<< []a-d] - Match any single character in the list of ']' and the range 'a' to 'd'. [ab^d] - Match any single character in the list of 'a', 'b', '^', and 'd'. [ad2-] - Match any single character in the list of 'a', 'd', '2', and '-'. ^>> The period . matches any single character. A regular expression matching a single character may be followed by one of several repetition operators: ^<< ? - The preceding item will be matched 0 or 1 times. * - The preceding item will be matched 0 or more times. + - The preceding item will be matched 1 or more times. {n} - The preceding item is matched exactly n times. {n,} - The preceding item is matched n or more times. {,m} - The preceding item is optional and is matched at most m times. {n,m} - The preceding item is matched at least n times, but not more than m times. ^>> Two regular expressions may be concatenated. Two regular expressions may be joined by the infix operator | resulting in a regular expression matching any string in either subexpression. Repetition takes precedence over concatenation, which in turn takes precedence over alternation. Parenthesis ( ) override these precedence rules. Note that "basic" regular expressions are somewhat different the "extended" regular expressions. Two differences to keep in mind are that delimiters for bounds are \{ and \}, and parentheses for nested subexpressions are \( and \). There is one new type of basic atom (a regular expression enclosed in ``()''), namely the back reference \ followed by a non-zero decimal digit d. It matches the same sequence of characters matched by the dth parenthesized subexpression. ^<< \([bc]\)\1 - Matches bb or cc but not bc. ^>> ------------------------------------------------------------------------------- * Matching patterns in files. Grep searches the named input file(s) for lines containing a match to the given pattern. Grep understands "basic" (obsolete) regular expressions, and "extended" (modern) regular expressions. The pattern given to grep is by default (implicitly) interpretted as a basic regular expression. It can also be made explicit by the flag -G. To interpret the pattern as an extended regular expression, use the -E flag. ^<< % grep P F - Search the input file F for lines containing a match to pattern P, a basic regular expression. % grep -G P F - Interpret pattern P as a basic regular expression (default). % grep -E P F - Interpret pattern P as an extended regular expression. % grep -N P F - Grep. If a match is found, print N lines of leading and trailing context. % grep -c P F - Grep. Suppress normal output; Print a count of matching lines. % grep -i P F - Grep. Ignoring case distinictions in both the pattern and input file. % grep -n P F - Grep. Prefix each line of output with the line number within input file F. % grep -v P F - Grep. Invert the sense of matching, to select non-matching lines. % grep P - Grep standard input for pattern P. % grep P - - Grep standard input for pattern P. ^>> ------------------------------------------------------------------------------- * Command-line Operability. Luc Boulianne's Theorem (aka Luc's Theorem): "The study of Computer Science is the study of Minimizing Keystrokes." bash: ^<< C-a - Move to the (s)tart of the current line. C-e - Move to the (e)nd of the current line. C-f - Move (f)orward a character. C-b - Move (b)ack a character. M-f - Move (f)orward to the end of the next word. M-b - Move (b)ack to the start of this, or the previous word. C-p - Fetch the (p)revious command from the history list, moving back in this list. C-n - Fetch the (n)ext command from the history list, moving forward in the list. C-h - delete the character behind the cursor. C-d - (d)elete the character under the cursor. M-d - (d)elete from the cursor to the end of the current word, or if between words, to the end of the next word. C-w - kill the (w)ord behind the cursor. C-k - (k)ill from the cursor to the end of the line. C-u - (u)nix-line-discard from cursor to beginning of line. C-y - (y)ank the top of the kill ring into the buffer at the cursor. C-l - Clear the screen, leaving the current line at the top of the screen. C-t - (t)ranspose characters: drag the character before point forward over the character at point. M-t - (t)ranspose words: drag the word behind the cursor past the word in front of the cursor. C-_ - Incremental undo, separately remembered for each line. C-x C-u - Incremental undo, separately remembered for each line. M-# - Make the current line a shell comment. ^>> ------------------------------------------------------------------------------- * make. make is a utility for maintaining, updating, and regenerating groups of related programs and files. The purpose of the make utility is to automatically determine which peices of a large program need to be recompiled, and issue the commands to recompile them. make can be used with any programming language whose compiler can be invoked from the shell. Note that make is not limited to programs; it can be used to update files from others, whenever the others change. The command ``make'' relies on a file called and named "Makefile", which you must write to describe the relationships among files in your program and the commands for updating each file. make executes commands in the makefile associated with each target, typically to create or update a file of the same name. A target entry has the form: ^<< target [:|::] [dependency] ... [; command ] ... [command ] ... ^>> If no target is specified upon invokation of make, all the targets are checked recursively against their dependencies. Once a makefile exists, typing `make` suffices to perform all necessary recompilations. ^<< % make - Perform all necessary recompilations to programs specified in the file called Makefile. The make program uses the makefile database and the last-modification times of the files to decide which of the files need to be updated. ^>> The following is a great example of a short Makefile. If the first non-TAB character is a ``@'', the following command will not be printed before being exectued. ^<< PODFILE = hash.pod TITLE = 'Perl Hash Howto' OUTFILE = index.html JUNK = pod2htm* all: $(PODFILE) @pod2html --infile=$(PODFILE) --outfile=$(OUTFILE) --title=$(TITLE) @rm -f $(JUNK) @if [ -r $(OUTFILE) ] ; then \ chmod 644 $(OUTFILE); \ fi ^>> The following is another example of a Makefile: ^<< COMPILER = gcc MAIN_SOURCE = str2wrd.c OBJ = someobjectfile.o LIB = -lm OUT_NAME = a.out str2wrd: $(MAIN_SOURCE) # $(COMPILER) -o ($OUT_NAME) $(OBJ) $(LIB) $(MAIN_SOURCE) $(COMPILER) -o $(OUT_NAME) $(MAIN_SOURCE) clean: \rm -f $(OBJ) $(OUT_NAME) ^>> ------------------------------------------------------------------------------- * Revision Control System. Programs, documentation, projects, and other such files that undergo frequent revisions or updates can be managed using the Revision Control System (RCS). ^<< % man 1 rcsintro - Manual containing an introduction to rcs. ^>> Someone new to RCS need only learn two commands. ^<< % ci - (c)heck (i)n. Deposit the contents of a file into an archival file called an RCS file. % co - (c)heck (o)ut. Retrive revisions from an RCS file. ^>> Consider an assignment that will undergo frequent revisions. Let the file be called foo.c. Let's assume that foo.c resides at ~/courses/2000.1/cs537/ass/ass04/foo.c. ^<< % cd ~/courses/2000.1/cs537/ass/ass04/ - Change directory to the place where foo.c lives. % mkdir RCS - Make an RCS directory called RCS. % ci foo.c - Check in file foo.c, thereby creating a corresponding RCS file in the RCS directory, storing foo.c into it as revision 1.1, and deleting foo.c. ^>> ------------------------------------------------------------------------------- * Usefull tricks. Use xargs to help kill processes: ^<< % ps axwww | grep http | awk '{print $1}' | xargs -n1 kill -9 - Run ps, pipe it to grep. Pipe the grep output to awk, sending field 1 of each line to xargs. xargs executes `kill -9 on each incomming output of awk. % tar -cvf - D1 | (cd /tmp; tar -xvf -) - Tar-compress directory D1, to standard output, piping it to a Tar-extract in /tmp while reading from standard input. % tar -cvf - jsse1.0 | (cd /usr/local ; tar -xvf -) % for i in `grep "u1/" /var/etc/teaching.cs.mcgill.ca/passwd | grep \* | awk \ -F: '{ print $1 }' ` ; do echo $i ; done % perl -pi -e 's/hello/goodbye/g' F - Inline text substitution of every occurence of the word 'hello' for the word 'goodbye' in file F. ^Z - Suspends current job. Some programs don't allow suspention. For example `pine` must be invoked with -z to enable suspension. ~^Z - Suspend current login session. ^>> ------------------------------------------------------------------------------- * Vim. Vim is Vi IMproved. Vi stands for "visual editor". The underlying editor of both Vi and Vim is ed. Both Vi and Vim have two modes of operation: command mode and insert mode. Range selection in vim can be done using v, V, and ^V. ^<< v - visual text selection per character V - visual text selection per line ^V - visual text selection per block (rectangular shape) ^>> After making the desired selection, press 'y' to 'yank' (copy) the text into vim's buffer. Next move the cursor to a desired location and press 'p' to paste the selected text after the cursor, or 'P' to paste it before the cursor. Searching and replacing. To the whole document (%), search (s), for 'hello' and replace it with 'goodbye', and confirm (c) the replacement. :%s/hello/goodbye/c replace with goodbye (y/n/a/q/^E/^Y)? ^<< y - yes n - no a - all q - quit ^E - scroll up one line ^Y - scroll down one line ^>> Both vi and vim can be invoked with a -r flag followed by the name of a .swp file. This will recover the swp file which may have been left behind after a system crash or corrupted session. ^<< % vim -r F.swp - Recover file F using swap file F.swp. Note: after recovering the file using the swap file you should delete the swap file. Especially before attempting to vim the recovered file again. ^>> Type ":help" in Vim to get started. Type ":help subject" to get help on a specific subject. Some usefull vim commands follow: ^<< :set paste - Turn on pasting mode. :set nopaste - Turn off pasting mode. :set ai - Turn on autoindenting. :set noai - Turn off autoindenting. :set textwidth=0 - Disable maximum text width. :set textwidth=78 - Set max width that text can be insterted to 78. :set list - Show tabs as '^I' and end of line characters as '$'. :set nolist - Turn off list mode. :syntax on - Turn on syntax highlighting. :syntax off - Turn off syntax highlighting. ^>> Variation of an Emacs/vi joke: ``Daddy! Daddy! Why are we hiding from the police?'' ``Because they use Emacs, son, and we use vi.'' ------------------------------------------------------------------------------- * Subshells. Like processes and subprocesses, when a shell starts another shell, the new shell is called a subshell. The child process (the subshell) inherits its parent's environment, however changes to the child's environment does not affect the parent. A shell script runs in a subshell. ------------------------------------------------------------------------------- * Shell scripting (programming). You can read Tom Christiansen's essay "_Csh Programming Considered Harmful_", to learn why csh (and by extension, tcsh) should not be used for writing shell scripts: http://www.cs.mcgill.ca/socsinfo/seminars/csh-whynot -------------------------------------------------------------------------------