1K Linux Commands

From wiki.ucalgary.ca
Revision as of 18:25, 30 March 2014 by Dnorman (talk | contribs) (Reverted edits by JohnAdams71 (talk) to last revision by Locasto)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

1024 Commands: An Introduction to the Linux Command Line

A shell is just a program stored on a computer. When executed, the shell provides an interactive process for issuing commands. For users accustomed to interacting with a computer through a GUI, mouse, or touchscreen, a command line can seem daunting, particularly since the "prompt" (a terse snippit of characters indicating the input location) does very little prompting and often spits back obtuse error messages.

This brief tutorial demonstrates some of the basics of the Linux command line environment using the Bash shell. A reader should have no trouble using another shell as this tutorial largely avoids shell-specific features. Instead, it focuses on a general set of Linux commands.

The CPSC Department Tech Staff have a quick start command line guide at:

https://www.cpsc.ucalgary.ca/tech_support/help/unix_commands

The Basics

A shell is a program that helps a user interact with a computing environment. The core functionality of most shells, particularly in a Unix environment, is to locate and execute programs on behalf of the user. The user types a command; commands typically include the name of a program and some arguments or data for that program to operate on. The shell locates the named program, supplies the arguments to the program, and waits for the program to complete.

It is important to realize that the shell typically acts like a dispatcher -- except in the cases where it uses its own built-in functionality (or in cases where the user invokes shell-specific scripting commands), the shell merely looks up the location of the named program and asks the operating system to execute that program. Commands (most of which are separate programs) are typically stored in a few well-known locations in the file system, such as /bin, /usr/bin, /sbin, /usr/sbin and the user's own bin/ directory in their home directory. For example, the /bin directory of a recent distribution of Linux includes more than 100 commands, including ones like `arch', `cat', `cp', and `date'. It has over 1800 commands in the /usr/bin directory, 280 in /sbin, and 375 in /usr/sbin. This amounts to more than 2500 possible programs to run.

Unix and Linux environments come with a large variety of commands. Commands are typically external programs, although some functionality is built into the shell because it is simple.

Takeaway Message: the shell is a program that waits for your input. You provide input by typing commands. Commands start with the name of some program and have optional arguments or parameters.

Getting Oriented

So your prompt is waiting for you. Patiently. Before we tell it to do something, let's give you a fallback so you know where to get help --- right in your shell (without resorting to digging through Google results or bugging your friends or system administrator).

The first place to start is knowing how to access documentation. Most Linux shell environments come equipped with a command to do this, the `man' (short for "manual" command). You can supply the `man' command with a number of arguments to control its behavior in various ways, but the simplest form is by giving it just the name of a program whose functionality you'd like to determine. For example, typing the following command and hitting <enter>:

[michael@host 1k]$ man man

causes the manual page for the `man' command to be displayed in the terminal window. Man pages provide the name and brief explanation of the command, a short synopsis, and a list of accepted arguments to the command. You can stop viewing the manual page by typing the letter `q'.

Takeaway Message: Use the manual pages to explore the rich functionality of the many different Linux commands available on a typical system. When in doubt, read the man page!

1024 Commands

This tutorial is based on a history of 1024 commands I gathered from one of my terminal windows (I have multiple open, each with their own extensive history). The history of each shell sessions is maintained in memory by the shell during runtime and written to the file .bash_history when the shell exits.

One thousand commands didn't seem like enough, and 1024 is a nice power of two.

So, what have I been typing for the last one-thousand twenty four commands? The simplest way to find out is to ask the shell with the `history' command:

[michael@host 1k]$ history
1  clear
2  more disorder.c
3  more ../include/disorder.h
4  more disorder.c
5  clear
6  cat disorder.c
7  clear
8  grep 8080 /etc/services
...
[michael@host 1k]$

What the shell displays is a numbered list of all commands it has kept track of. Already we can see a few different commands in play: `clear', `more', `cat', and `grep' -- and these commands seem to be operating on some additional data.

How can I save this history into a file so I can work with it? Fortunately, the shell provides input and output redirection. As you noted, the `history' command dumps its output to the terminal screen (named stdout for "standard output"). Using the output redirection character > we can send this output to a new file named 1k.dat.

[michael@host 1k]$ history 1024 > 1k.dat
[michael@host 1k]$

Here, we've asked the shell to print the history of the last 1024 commands, but instead of sending the output to the stdout, we've asked the shell to take that output and write it to a file named `1k.dat' (there is nothing special about this file name, and nothing special about the extension ".dat" -- we could have easily named the file just `1k'...file names in Unix do not require extensions.

One thing you'll notice about the command line is that it isn't very chatty -- when things work, it just returns a prompt to you. When things don't, it often returns a terse error message:

[michael@host 1k]$ foo
bash: foo: command not found
[michael@host 1k]$

Here, the shell (named `bash') tried to lookup and find the program named `foo', but did not find it (how this lookup proceeds is outside the scope of this tutorial, but the shell basically searches some common locations like /bin and /usr/bin).

Now that we have our last 1024 commands saved in a file, let's find out what those commands actually were. One way to do this would be to spit out the file to the standard output again using the `cat' command or scroll through it a pageful at a time with the `more' command. I could even peek at the front few lines or last few lines using `head' or 'tail', respectively.

[michael@host 1k]$ head -4 1k.dat
1  clear
2  more disorder.c
3  more ../include/disorder.h
4  more disorder.c
[michael@host 1k]$

But these approaches aren't really helpful, particularly since we begin to see that I use some commands quite frequently. What I am really after, then, is a summary of this command history. What is the relative frequency of commands I use?

One solution is to whip up a quick AWK script to process the saved history file. From our `head' command above, we notice that the format of the file is whitespace, a number enumerating the command, more whitespace, and then the command and its arguments (each separated by whitespace). Each line is terminated with a newline character. AWK is a particularly useful scripting language that is designed to manipulate and process files containing records. Using AWK, we can treat each line of the history file as a record and extract only the command name:

[michael@host 1k]$ gawk '{print $2}' 1k.dat
clear
more
more
more
clear
cat
clear
grep
clear
yum
ls
more
ls
ls
ls
...
[michael@host 1k]$

We now see only the 1024 command names on the terminal's standard output, and the repetition of certain commands is obvious, even from this small sample. Is there a way to make this information more compact? Can I ask the shell to count the relative frequencies of these 1024 commands?

It turns out that I can. Successfully. But first let's go back and try to understand what happened with AWK above. I issued the command `gawk' (which is the name of the GNU version of `awk'). `gawk' is an interpreter of AWK programs; that is, it understands the AWK language and will take action based on statements in that language. In this case, we specified a very simple AWK script as the first argument to `gawk'. The entire AWK program is contained in the single quotes: '{print $2}' and simply means "print out the second field of each line" that `gawk' encounters in its input file. The second argument to `gawk' is the name of that file -- 1k.dat. So, gawk reads 1k.dat a line at a time and prints out the second field, which just happens to be the name of the command.

This information still isn't summarized. We can do this by introducing another feature of modern shells: pipes.

Pipes are similar to the output redirection construct we saw above, but instead of connecting a program's output with a different target file, a pipe (denoted by the | character) connects two programs. This is a beautiful idea.

With a pipe, the output of one program becomes the input to another. Very powerful processing chains can be built using this simple construct. For example, we can take the output of the `gawk' command above and pipe it to a command that cleans up that output by sorting it with the `sort' command:

[michael@host 1k]$ gawk '{print $2}' 1k.dat | sort
ant
./a.out
./a.out
...
yum
yum
yum
[michael@host 1k]$

What we see is the output of gawk sorted alphabetically (actually, lexicographically).

Did you imagine that in order to extract a data field from an arbitrary file and then sort the results, you wouldn't have to write a single line of Java or C or implement quicksort?

We can continue to build our processing pipeline. We are now in a position to identify only unique strings in this output. Doing so will help us boil down these 1024 commands to the actual subset of Linux commands that I used in this particular collection. Fortunately, Linux has another helpful program for me: the `uniq' command. `uniq's basic operation is to eliminate duplicate lines; because we've previously sorted the output, all "duplicates" are next to each other in the output. For our purposes, removing the duplicates isn't terribly useful, but if we pass the -c argument to `uniq', it will actually count those duplicate instances and report the count!

[michael@host 1k]$ gawk '{print $2}' 1k.dat | sort | uniq -c
 ...
 1 xpd
 4 xpdf
12 yes
 7 yum
[michael@host 1k]$

What we have now is output that lists the frequency of each command. But we can go a step further and sort this output in descending order by the frequency! How? By piping again to another instance of `sort', but this time we'll tell it to sort numerically and in reverse (i.e., descending) order:

[michael@host 1k]$ gawk '{print $2}' 1k.dat | sort | uniq -c | sort -nr
 193 ls
 111 cd
  80 man
  60 clear
  51 more
 ...
   1 chown
   1 bc
   1 arp
   1 ant
[michael@host 1k]$

Finally, by adding just another stage to our pipeline, we can count how many lines this output has using the `wc' command:

[michael@host 1k]$ gawk '{print $2}' 1k.dat | sort | uniq -c | sort -nr | wc
   89    178    1211
[michael@host 1k]$

We see that those 1024 commands are really repetitions of 89 individual commands:

[michael@host 1k]$ gawk '{print $2}' 1k.dat | sort | uniq -c | sort -nr
193 ls
111 cd
 80 man
 60 clear
 51 more
 41 make
 40 exit
 40 emacs
 33 cat
 30 printf
 28 ./a.out
 23 strace
 17 objdump
 17 echo
 16 id
 15 ps
 15 ll
 12 yes
 11 su
  9 readelf
  9 mkdir
  9 gcc
  8 cp
  8 ./snyfer
  7 yum
  5 pwd
  5 mv
  5 history
  5 ./sh3-fixed
  5 ./sh1
  4 xpdf
  4 rmdir
  4 nasm
  4 hexdump
  4 grep
  4 execstack
  4 env
  4 chmod
  4 ./sh3fixed
  3 udcli
  3 top
  3 rm
  3 netstat
  3 locate
  3 ifconfig
  3 gdb
  3 file
  2 w
  2 unlink
  2 touch
  2 reset
  2 pushd
  2 pstree
  2 popd
  2 ndisasm
  2 last
  2 kill
  2 df
  2 date
  2 arch
  2 ./hdb
  1 xpd
  1 whoami
  1 which
  1 uptime
  1 uname
  1 tty
  1 traceroute
  1 ssh
  1 srtol
  1 ps2pdf
  1 ping
  1 open
  1 no
  1 lsof
  1 lks
  1 javac
  1 java
  1 finger
  1 dmesg
  1 ddd
  1 cpio
  1 chown
  1 bc
  1 arp
  1 ant
  1 ./redpill
  1 ./mzr
  1 ./llist
[michael@host 1k]$

So what is on this list?

The 89 Commands

We'll take a look at each of these commands in turn.

Let's take a look at the most frequently invoked commands:

193 ls
111 cd
 80 man
 60 clear
 51 more
 41 make
 40 exit
 40 emacs
 33 cat
 30 printf

The most common command is `ls', which causes the shell to list the files in the current directory. This is similar to a windowing environment displaying the files and folders (i.e., directories) in the current folder. Here we see the typical Unix preference for terse commands. Clearly, this user often asks the environment to display what files are located in the current working directory. It's a useful command to know what your context is. Other context commands include `pwd' (print the name of the working directory) and `env' (display the name-value hashmap of environment variables). The `ls' command takes a number of arguments; -l produces more detail, -a displays "hidden" files (those starting their name with a . character), -h displays human-readable sizes rather than raw byte counts, -S sorts the display listing by size, and -F prints a trailing character that helps convey the semantics of the file name (such as whether it is a directory, executable, a symbolic link, socket, or named pipe). In Unix, most every OS-level construct can be treated as a file, so these meta-characters help distinguish at a glance between types of resources represented by the listed file names.

The `cd' command is a comparatively close second; it helps the user actually move around the file system namespace. So viewing and navigating the file system name space seem to be common operations. `cd' is typically invoked with one argument: the name of the "destination" directory to change context to. Another thing to note is that `cd' is typically a command built into the shell, not an external program that the shell invokes, and it is built on the system call chdir(2), which changes the value of the current working directory (often exposed in the shell environment via the $PWD variable).

We see that this user frequently asks the `man' system for help in looking at the syntax and semantics of various command line programs. This shell session has been used for teaching tasks a lot, so that is why `man' is up there, but even in non-teaching, pure administration sessions, consulting the manual is a great idea and common task!

The `clear' command is another command that helps manage the shell's display. It ignores any command line arguments and attempts to clear the screen according to the specific terminal type.

This user also likes to look at files via the `more' command; the `more' (or `less') command displays file content a screen-full at a time. The `cat' command, a few lines down, is another way to look at file content, but completely dumps the file content to screen. You can also pipe (via |) program output to more.

The `make' command invokes the GNU make program, a program for controlling the compilation and building of other programs. Make typically reads a "makefile" for instructions on how to build something. This is a lot easier than constantly typing or recalling complicated gcc command lines.

The `exit' command terminates the current session and logs out.

The `emacs' command invokes and enters the Emacs text editor. Emacs is a powerful editor (among other things) and IDE.

The `printf' command is a built-in shell command for formatting and displaying output strings.

and continuing with the next 9 until we dip under double digits in frequency:


 28 ./a.out
 23 strace
 17 objdump
 17 echo
 16 id
 15 ps
 15 ll
 12 yes
 11 su

Other Resources

  • In the Beginning Was the Command Line (Neal Stephenson)