Linux
UNIX
Primarily notes from chapter 1 of TLPI
- Originally a specific operating system, with the first version developed at Bell Labs by Ken Thompson in 1969
- Came in versions
- Nowadays, UNIX more often denotes a "UNIX-like" operating system, such as Linux distributions, macOS and BSD variants.
- The Open Group owns the UNIX trademark
- They issue certifications for conformance to the Single Unix Specification (SUS), of which there are multiple versions
- For example, macOS 14 (Sonoma) is a certified SUSv3 OS
- No Linux distribution I know of is a certified UNIX product
Fundamentals
Primarily notes from chapter 2 of TLPI.
Kernel
- Resides at
/etc/vmlinuz
- The trailing
z
denotes that the file is compressed
- The trailing
- Tasks performed by the kernel include
- Process scheduling (Linux employs preemptive multitasking, including cration and termination
- Memory management
- File system provisioning
- Device access - devices are exposed as files in
/dev
- Networking
- System call API
- Multi-user enviroment - each user gets a virtual private computer
- Kernel mode vs user mode
- Modern CPUs provide hardware instructions for allowing processes to operate in kernel or user mode.
- Areas of memory can be marked for either mode.
- Kernel space and user space
- Kernel mode processes can access everything, whereas user mode processes are limited to user space
- Kernel-specific data structures will be marked for kernel mode, which user processes cannot access
Users and groups
- User consists of
- Login name (username)
- Numeric user ID (UID)
- Group ID - the first group the user belongs to
- Home directory
- Login shell
- Users are stored in the
/etc/passwd
file- For security reasons, passwords are usually stored in
/etc/shadow
- For security reasons, passwords are usually stored in
- Group consists of
- Group name
- Numeric group id (GID)
- User list - comma separated list of usernames that belong to the group
- Superuser
- UID=0
- Login name is most often root
Directory hierarchy
- Single tree structure (i.e. just one root)
- Contrast this to Windows, where each device has it's own directory
hierarchy (e.g.
C:
,D:
etc) - Devices are mounted into the single directory hierarchy
- Contrast this to Windows, where each device has it's own directory
hierarchy (e.g.
- Files
- In Linux, "everything is a file"
- Files have different types: regular file, directory, pipes, sockets symbolic links etc.
- Directory
- A file whose contents is a list of other files
- Always has at least two entries
.
denotes itself..
denotes its parent (except for the root directory, where. == ..
)
- (Hard) link
- An association between a filename and a specific file
- Symbolic link
- A file that contains a path to another file
- Sometimes called a soft link to contrast it with a normal (or hard) link
- Filenames
- The name of a file in a directory
- Recommended to keep to the portable filename character set specified by
SUSv3:
[-._a-zA-Z0-9]
.- Even if most modern systems can handle a large range of UTF8.
- Characters that have special meaning must be escaped
- Some environments/programs may not support escaping, and then you're kind of pooped
- Pathnames
- A
/
-separated list of directory names, except for the final entry which may be any kind of file - Absolute pathnames start with
/
, relative pathnames do not
- A
- Current working directory
- Every process has a current working directory
- The working directory is inherited from the parent process, or explicitly specified when a new process is created
- A login shell always starts in the home directory of the logged in user
- File permissions
- Each file has an associated UID and GID
- Each file has a set of permissions
- read (
r
) - allows a file to be read - write (
w
) - allows a file to be written - execute (
x
) - allows a file to be executed - Permission bits are in the order
rwx
, so e.g.7
is all permissions, while4
is onlyr
permission.
- read (
- There are 9 permission bits, 3 each for owner, group and other
- So e.g.
744
translates torwx
for the owner, and onlyr
for group and other
- So e.g.
- Directories have a slightly different interpretation of permissions
- read - allows the filenames in the directory to be listed
- write - allows filenames to be added or removed
- execute (or search) - allows files to be read and written as per
the permissions on the files themselves
- A file with
777
permissions in a directory that lacksx
still cannot be read!
- A file with
File I/O
- Universality of I/O
- All I/O is done through accessing files with system calls such as
open()
,read()
,write()
etc. - It doesn't matter if the file is a device or a regular file
- There's only one type of file access: a stream of bytes that you can get
to any point in with the
lseek()
system call.
- All I/O is done through accessing files with system calls such as
- File descriptors
- A per-process unique and non-negative integer denoting an open file
- Each process has a table of open file descriptors
- Normally a process inherits descriptors 0 (stdout), 1 (stdin) and 2 (stderror) from the parent process
Processes
- A process is an instance of an executing program
- Process memory layout:
- Text - program instructions
- Data - static variables
- Heap - dynamically allocated memory
- Stack - automatically allocated memory
- Processes are created with the fork() system call
- fork() clones the currently running process and creates a child process
- A child process inherits copies of the parent's data, stack and heap
- The text segment is shared between parent and child process
- The child process will often immediately execute the
execve()
system call to execute some new program
- Process ID (PID)
- A small non-negative integer representing the process
- Parent process ID (PPID)
- Process ID of the parent process
- Group process id (GPID)
- Only for shells with job control
- Each process in a process group has the same GPID
- The process group leader has
PID == GPID
- Termination status
- A small non-negative integer that is "emitted" by as the program terminates
- Non-zero termination status indicates an error
- Exit status
- A special termination status that the process sets itself as it calls the
_exit()
system call - Note that
_exit()
is a system call - a process cannot terminate itself!
- A special termination status that the process sets itself as it calls the
- Processes have user and group ids
- Real UID and GID - denote the user and group to which the process belong (that started the process)
- Effective UID and GID - denote the user and group with which privileges the process is executing.
- Most often the same as real UID/GID, but can differ e.g. by using
setuid
- Supplementary GIDs - additional groups the process belongs to
- Privileged processes
- Any process running with effective UID 0 (root)
- Bypasses any normal permission checks
- Either started by root, or uses setuid
- Capabilities
- Introduced in Linux 2.2, allows a subset of superuser privileges to be allowed
- The superuser represents all capabilities
- The init process
- The first process started when the OS starts
- The praent of all processes, has PID 1
- Cannot be stopped (even by the superuser)
- Derived from
/sbin/init
- Daemon (service)
- Long-lived process that runs in the background
- Usually does not have any associated terminal
- Usually starts at system start and is managed by something like
systemd
- Environment list
- A process has a list of environment variables
- Inherited from the parent process and thus provides a way for a parent process to "pass arguments" to a child process
- Resource limits
- The resources a process can use can be limited with the
setrlimit()
system call - Soft limit - the current resource limit, can be adjusted by the process itself
- Hard limit - the absolute resource limit, the soft limit cannot be adjusted beyond this point
- The resources a process can use can be limited with the
Memory mappings
- The
mmap()
system call creates a memory mapping - File mapping
- Maps a region of a file into the process address space
- Anonymous mapping
- Does not need a corresponding file
- Initializes the memory to 0s
- Can be used e.g. for:
- Initializing a process text
- Allocation of new memory (filled with 0s)
- Memory-mapped I/O
- Communication between processes using shared mappings
Libraries
- Static libraries
- Bundle of compiled object modules
- Statically linked to a program at compile time - the libraries are copied into the program text
- Shared libraries
- Linked by a dynamic linker when the program starts
- Requires that the shared library is available on the system the program executes on
Interprocess communication (IPC)
- Processes need to communicate with each other, and can do so with
- signals - to indicate that something has occurred
- pipes and FIFOs - to transfer data
- sockets - to transfer data to a process that may not be on the same host
- file locking - to disallow other processes from reading or writing part of a file
- message queues - to exchange messages
- semaphores - to synchronize concurrent actions
- shared memory - to read and write to the same memory
Signals
- Used as IPC
- Used by the kernel to control programs, and can be sent if e.g.
- A user pressed CTRL-C (SIGINT)
- A user used the
kill
command - A timer expired
- A process behaved badly
Threads
- A process has one or more threads
- Threads (of a process) share, among other things
- Resource limits
- Address space
- Program code
- Threads have their own stack pointer
Sessions
- A session is a collection of process groups
- Session leader - the process that created the session
- All processes in a session have the same session id
- All processes created by a job-control shell belong to the same session as the shell itself
- Usually associated with a controlling terminal
- Established when the session leader opens a terminal device
- Usually the terminal with which the user logged in
- The session leader becomes the controlling process
- The controlling process receives a SIGHUP if the terminal disconnects
- Foreground process group - the current "focused" process group
- Background process group - any process group that is not the foreground process group
File I/O
- Each process has a file descriptor table
- File descriptors 0, 1 and 2 are by convention assigned to stdin, stdout and stderr
- This can be changed with the
freopen()
system call!