Table of Contents
This article covers various Linux system calls in C, providing a brief explanation and example code for each topic. This overview should help you understand how to interact with Linux system resources directly from C programs.
1. Linux System Calls
System calls are the interface between user-space applications and the Linux kernel. They allow your programs to request services from the kernel, such as file operations, process management, and inter-process communication.
2. Using strace
strace
is a powerful diagnostic, debugging, and instructional utility for Linux and other Unix-like operating systems. It allows you to trace system calls and signals made and received by a process. strace
provides tips into how applications interact with the system's kernel, which can be invaluable for debugging, performance tuning, and understanding the inner workings of programs.
2.1 Basic Usage
To start tracing a program, you can use strace
followed by the command you wish to run:
strace ls
This command traces the ls
command, showing all system calls made by ls
.
2.2 Tracing a Running Process
You can attach strace
to an already running process using the -p
option followed by the process ID (PID):
strace -p 1234
Replace 1234
with the actual PID of the process you want to trace.
2.3 Filtering Traced Output
strace
can generate a lot of output, making it hard to find relevant information. You can limit the output to certain system calls using the -e
option. For example, to trace only open
and close
system calls, you can use:
strace -e trace=open,close <command>
2.4 Writing Output to a File
To save the output of strace
to a file, use the -o
option:
strace -o output.txt <command>
This command runs <command>
and writes the trace output to output.txt
.
2.5 Tracing Specific Events
strace
can trace more than just system calls. For instance, you can trace network operations, process control, file operations, and more. The -e
option allows you to specify exactly what you want to trace. Check the man page (man strace
) for a full list of what can be traced.
2.6 Tracing Child Processes
By default, strace
traces only the main process. To trace child processes created by fork
or similar system calls, use the -f
option:
strace -f <command>
2.7 Useful Options
-c
: Provides a summary of system calls made by the program, including how many times each was called and the time spent in each call.-d
: Debug mode forstrace
itself, useful for diagnosing problems withstrace
.-t
: Prefix each line of thestrace
output with the time of day.-T
: Show the time spent in each system call.
2.8 Example Use Case
Suppose you have a program that's failing to open a configuration file, but you're not sure why. You can use strace
to trace file operations:
strace -e trace=file <program>
This command can help you identify attempts to open files, showing both successful and failed operations, along with the paths being accessed. This can quickly lead you to the problem, such as attempting to open a non-existent file or lacking the necessary permissions.
strace
is a versatile tool that, once mastered, becomes an indispensable part of the Linux programmer's and system administrator's toolkit. Its ability to reveal what a program is doing "under the hood" makes it an excellent tool for learning, debugging, and optimizing code.
3. access
: Test the Permissions of a File
In the access
system call example, we check whether the calling process can access a file in a particular way—specifically, whether the file can be read, written, or executed. The access
function is defined in <unistd.h>
and its prototype looks like this:
int access(const char *pathname, int mode);
pathname
: The path to the file you want to check.mode
: A mask consisting of one or more of the following flags ORed together:R_OK
: Test for read permission.W_OK
: Test for write permission.X_OK
: Test for execute permission.F_OK
: Test for the existence of the file.
When using access
, you're asking the question, "Does the user running this program have the specified access to the file?" This check is based on the real UID (user ID) and GID (group ID) of the process, rather than the effective IDs. This is important in programs that may run with elevated privileges (e.g., setuid programs).
Here's an expanded version of the access
example, demonstrating how to check for different permissions as well as the existence of a file:
#include <unistd.h>
#include <stdio.h>
int main() {
const char *filepath = "example.txt";
// Check for the existence of the file
if (access(filepath, F_OK) == 0) {
printf("The file exists.\n");
// Check for read permission
if (access(filepath, R_OK) == 0) {
printf("Read permission granted.\n");
} else {
printf("Read permission denied.\n");
}
// Check for write permission
if (access(filepath, W_OK) == 0) {
printf("Write permission granted.\n");
} else {
printf("Write permission denied.\n");
}
// Check for execute permission
if (access(filepath, X_OK) == 0) {
printf("Execute permission granted.\n");
} else {
printf("Execute permission denied.\n");
}
} else {
printf("The file does not exist.\n");
}
return 0;
}
This code snippet demonstrates how to use access
to perform comprehensive permission checks on a file. It first checks if the file exists using F_OK
. If the file exists, it then checks for read, write, and execute permissions in turn. This is a basic pattern you might use to pre-validate file access in your applications before attempting to open or execute the file, ensuring that your program behaves gracefully if the necessary permissions are not available.
4. fcntl
: Locks and File Operations
fcntl
can change the properties of a file that's already open.
4.1 Understanding File Locks with fcntl
File locks are mechanisms that allow synchronization between different processes to prevent them from concurrently modifying a file, potentially leading to data corruption. In Unix-like systems, fcntl
can be used to apply advisory file locks. These locks are "advisory" because they do not prevent other processes from accessing the file unless those processes also use and check for the same locks. It's a cooperative mechanism, not enforced by the system.
F_WRLCK
: Requests a write lock. No other process can hold a write or read lock.F_RDLCK
: Requests a read lock. Other processes can also hold a read lock, but not a write lock.F_UNLCK
: Releases a lock.
The F_SETLKW
command tells fcntl
to set the lock and wait if the lock cannot be acquired immediately, as opposed to F_SETLK
which returns immediately if the lock cannot be acquired.
4.2 Example Program: File Locking with fcntl
This example demonstrates how to use fcntl
to place a write lock on a file specified by the command-line argument. The program waits for the user to press Enter before releasing the lock.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Usage: %s <file>\n", argv[0]);
return 1;
}
char* file = argv[1];
int fd;
printf("Opening %s\n", file);
// Open a file descriptor.
fd = open(file, O_WRONLY);
if (fd == -1) {
perror("open");
return 1;
}
printf("Locking\n");
// Initialize the flock structure.
struct flock lock;
memset(&lock, 0, sizeof(lock));
lock.l_type = F_WRLCK; // Request a write lock
// Attempt to place a write lock on the file.
if (fcntl(fd, F_SETLKW, &lock) == -1) {
perror("fcntl");
close(fd);
return 1;
}
printf("Locked; press Enter to unlock... ");
// Wait for the user to press Enter.
getchar();
printf("Unlocking\n");
// Release the lock.
lock.l_type = F_UNLCK;
if (fcntl(fd, F_SETLKW, &lock) == -1) {
perror("fcntl");
close(fd);
return 1;
}
close(fd);
return 0;
}
4.3 How It Works
- The program first checks for the correct usage, requiring a filename as an argument.
- It then attempts to open the specified file in write-only mode.
- If successful, it initializes a
struct flock
and requests a write lock usingF_SETLKW
. This call will block if the lock cannot be immediately acquired, waiting until the lock is available. - The program waits for the user to press Enter. Upon receiving input, it sets the lock type to
F_UNLCK
to release the lock and then closes the file descriptor.
This example provides a straightforward demonstration of using file locks in C to coordinate access to a file between different processes. It's essential to handle potential errors, such as the file not opening or the locking mechanism failing, to ensure the program behaves as expected under various conditions.
5. fsync
and fdatasync
: Purging Disk Buffers
To ensure data integrity, especially in scenarios where your application maintains a log, database, or any other form of critical data storage, it's essential to have a mechanism that guarantees the data has been physically written to the storage device. In Linux, this is where fsync
and fdatasync
system calls come into play.
5.1 fsync
and fdatasync
: Purging Disk Buffers
When your application writes data to a file, the data might initially be placed in a buffer (in-memory cache) by the kernel to improve performance. However, buffered data can be lost if the system crashes or loses power before the buffer is flushed (i.e., written out to the disk). To mitigate this risk, you can use fsync
or fdatasync
.
-
fsync(int fd)
: Synchronizes a file's in-memory state with that on the physical disk to ensure that all modifications are written out.fsync
flushes both data and metadata (like file modification times). -
fdatasync(int fd)
: Similar tofsync
, but it only flushes data, not metadata. This can be more efficient in scenarios where metadata changes aren't crucial to preserve immediately.
5.2 Example: Ensuring Data Integrity in a Journaling System
The following example demonstrates how to use fsync
to ensure that a journal entry is physically written to disk, thus preventing data loss in the event of a system crash or power failure. This code snippet expands on your provided template with error handling to make it more robust.
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
const char* journal_filename = "journal.log";
void write_journal_entry(char* entry) {
// Open the journal file with appropriate flags and permissions
int fd = open(journal_filename, O_WRONLY | O_CREAT | O_APPEND, 0660);
if (fd == -1) {
perror("Failed to open journal file");
return;
}
// Write the journal entry to the file
if (write(fd, entry, strlen(entry)) == -1) {
perror("Failed to write entry");
close(fd);
return;
}
// Write a newline character after the entry
if (write(fd, "\n", 1) == -1) {
perror("Failed to write newline");
close(fd);
return;
}
// Ensure the entry is physically written to the disk
if (fsync(fd) == -1) {
perror("Failed to fsync");
close(fd);
return;
}
// Close the file descriptor
close(fd);
}
int main() {
// Example usage
char* entry = "Sample journal entry";
write_journal_entry(entry);
return 0;
}
5.3 How It Works
-
Open the File: The file is opened (or created if it doesn't exist) with write-only access. The
O_APPEND
flag ensures that each write operation appends data at the end of the file. -
Write the Entry: The journal entry is written to the file followed by a newline character. Each
write
call is checked for errors. -
Synchronize:
fsync
is called to flush the file's data and metadata to disk. This is crucial for ensuring the durability of the journal entry. -
Close the File: Finally, the file descriptor is closed.
Error handling is crucial in file operations, especially when dealing with critical data. This example includes basic error checks after each system call to handle potential failures gracefully.
6. getrlimit
and setrlimit
: Resource Limits
The getrlimit
and setrlimit
system calls in Linux allow processes to get and set resource limits, respectively. These limits are crucial for controlling the amount of system resources a process can consume, which can prevent individual processes from exhausting system resources, thus ensuring system stability and fairness among multiple processes.
6.1 Resource Limits Overview
Each process in a Unix-like system has associated resource limits, which are constraints on the system resources that the process may consume. Examples of such resources include the maximum number of file descriptors a process can open (RLIMIT_NOFILE
), the maximum size of the process's heap (RLIMIT_DATA
), and the maximum size of the process stack (RLIMIT_STACK
).
6.2 Using getrlimit
and setrlimit
The getrlimit
function retrieves the current limits for a specified resource, while setrlimit
sets new limits. The resource limits are specified by two values:
rlim_cur
: The soft limit — the value that the kernel enforces for the corresponding resource.rlim_max
: The hard limit — the ceiling for the soft limit. Only privileged processes (typically those with root privileges) can raise the hard limit.
6.3 Example: Checking and Modifying File Descriptor Limit
This example demonstrates how to use getrlimit
and setrlimit
to check and then modify the maximum number of file descriptors that the current process can open.
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
int main() {
struct rlimit limit;
// Get the current limit on file descriptors
if (getrlimit(RLIMIT_NOFILE, &limit) != 0) {
perror("getrlimit failed");
return EXIT_FAILURE;
}
printf("Current Limits: soft = %ld, hard = %ld\n", limit.rlim_cur, limit.rlim_max);
// Attempt to increase the soft limit to the hard limit value
limit.rlim_cur = limit.rlim_max;
if (setrlimit(RLIMIT_NOFILE, &limit) != 0) {
perror("setrlimit failed");
// Non-root processes may fail to increase the hard limit
} else {
printf("Soft limit raised to hard limit: %ld\n", limit.rlim_cur);
}
// Verify the change
if (getrlimit(RLIMIT_NOFILE, &limit) != 0) {
perror("getrlimit failed");
return EXIT_FAILURE;
}
printf("Updated Limits: soft = %ld, hard = %ld\n", limit.rlim_cur, limit.rlim_max);
return EXIT_SUCCESS;
}
6.4 How It Works
-
Retrieve Current Limits: The program first calls
getrlimit
forRLIMIT_NOFILE
to fetch the current soft and hard limits on the number of file descriptors. -
Modify the Soft Limit: It then tries to increase the soft limit (
rlim_cur
) to match the hard limit (rlim_max
). This is a common practice to maximize resource utilization without requiring root privileges to modify the hard limit. However, attempting to set the soft limit above the hard limit will fail unless the process has adequate permissions. -
Verify the Change: Finally, the program calls
getrlimit
again to verify that the soft limit was successfully updated.
Note that changing resource limits can have significant implications for system stability and security. Increasing limits for certain resources may allow processes to consume more system resources, potentially leading to resource exhaustion. Therefore, adjustments to resource limits should be made judiciously, with a clear understanding of the implications.
7. getrusage
: Process Statistics
The getrusage
system call is a powerful tool for monitoring the resource usage of processes in Unix-like operating systems. It provides detailed statistics about the system resources that a specific process or group of processes has consumed. This information is invaluable for performance analysis, debugging, and system monitoring.
7.1 Understanding getrusage
Prototype:
#include <sys/resource.h>
int getrusage(int who, struct rusage *usage);
who
: Specifies which process(es) to retrieve the usage information for. Common values are:RUSAGE_SELF
: To get the resource usage of the calling process.RUSAGE_CHILDREN
: To get the resource usage of all children of the calling process that have terminated and been waited for.usage
: A pointer to astruct rusage
structure where the resource usage information will be stored.
The struct rusage
structure contains many fields, including CPU time used, maximum resident set size, number of page faults, number of context switches, etc.
7.2 Example: Monitoring CPU Time and Page Faults
The following example demonstrates how to use getrusage
to obtain and print the CPU time used by the current process and the number of minor page faults it has caused.
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
int main() {
struct rusage usage;
if (getrusage(RUSAGE_SELF, &usage) == -1) {
perror("getrusage failed");
return EXIT_FAILURE;
}
// User CPU time and system CPU time
printf("User CPU time used: %ld.%06ld sec\n", usage.ru_utime.tv_sec, usage.ru_utime.tv_usec);
printf("System CPU time used: %ld.%06ld sec\n", usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
// Page faults
printf("Minor page faults: %ld\n", usage.ru_minflt);
printf("Major page faults: %ld\n", usage.ru_majflt);
return EXIT_SUCCESS;
}
7.3 Understanding the Output
- User CPU time: The amount of time the CPU spent executing instructions in user mode (outside the kernel) on behalf of the process.
- System CPU time: The amount of time the CPU spent executing system calls (inside the kernel) on behalf of the process.
- Minor page faults: These occur when the process accesses a page that is not in memory but can be loaded without disk access (e.g., a page that was swapped out is still in the swap cache).
- Major page faults: These occur when the process accesses a page that is not in memory, requiring disk access to retrieve.
This example gives a snapshot of the process's resource consumption at the time getrusage
is called. By calling getrusage
at different points in a program's execution, you can measure the resources consumed during specific operations, which is useful for profiling and optimization.
8. gettimeofday
: System Time
The gettimeofday
system call is used to get the current time and date. Unlike time()
which provides the current time in seconds since the Epoch (1970-01-01 00:00:00 UTC), gettimeofday
provides a higher resolution time by also including microseconds. It's part of the POSIX specification but is considered obsolete in favor of the clock_gettime()
call for newer applications, mainly because gettimeofday
does not provide a timezone conversion capability and is limited by the resolution and the issues around system clock changes (e.g., adjustments or daylight saving time shifts).
8.1 Prototype of gettimeofday
#include <sys/time.h>
int gettimeofday(struct timeval *tv, struct timezone *tz);
tv
: Pointer to astruct timeval
structure where the current time will be stored.tz
: This argument is obsolete and should generally be specified asNULL
. Historically, it was used to obtain timezone information, but this usage is now deprecated.
The struct timeval
structure is defined as follows:
struct timeval {
time_t tv_sec; // seconds since Jan. 1, 1970
suseconds_t tv_usec; // and microseconds
};
8.2 Example: Getting the Current Time with gettimeofday
Here's a simple example demonstrating how to use gettimeofday
to fetch the current time with microsecond precision:
#include <stdio.h>
#include <sys/time.h>
int main() {
struct timeval tv;
int res;
// Get the current time
res = gettimeofday(&tv, NULL);
if (res == 0) {
printf("Current time: %ld seconds and %ld microseconds since the Epoch\n",
(long)tv.tv_sec, (long)tv.tv_usec);
} else {
perror("gettimeofday failed");
return 1;
}
return 0;
}
8.3 How It Works
- The
gettimeofday
function fills in thestruct timeval
you provide with the current time in seconds and microseconds since the Epoch (00:00:00 UTC, January 1, 1970). - The function returns
0
on success, and-1
on failure, settingerrno
to indicate the error. - This example prints the current time with microsecond precision. Note that the actual resolution of the system clock may vary and may not always provide microsecond accuracy.
While gettimeofday
is useful for obtaining high-resolution time stamps and measuring time intervals with microsecond precision, for new applications, especially those needing monotonic time or handling time zones, consider using clock_gettime()
with CLOCK_REALTIME
or other suitable clocks provided by the system.
9. The mlock
Family: Physical Memory Lock
The mlock
family of system calls in Linux is used to control the memory locking of a process's address space. Memory locking is a mechanism that ensures pages residing in the virtual memory area of a process are not swapped out to the swap area (disk or similar storage) under any circumstances, ensuring they remain in RAM. This capability is crucial for real-time applications or those that handle sensitive information, where it's necessary to prevent delays due to page faults or to ensure that sensitive information is not written to disk.
9.1 The mlock
Family of Functions
mlock
: Locks a specified region of the process's address space, preventing those pages from being paged out.munlock
: Unlocks a specified region of the process's address space, allowing those pages to be paged out again.mlockall
: Locks all pages mapped into the address space of the calling process.munlockall
: Unlocks all pages mapped into the address space of the calling process.
9.2 Using mlock
and munlock
Here's how you might use mlock
and munlock
in a program:
9.3 Prototype
#include <sys/mman.h>
int mlock(const void *addr, size_t len);
int munlock(const void *addr, size_t len);
addr
: The starting address of the memory to lock or unlock.len
: The length of the memory region to lock or unlock.
9.4 Example: Locking Memory to Prevent Swapping
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
int main() {
const size_t size = 1024 * 1024; // 1 MB of memory
void *buffer = malloc(size);
if (!buffer) {
perror("malloc failed");
return 1;
}
// Initialize the memory with some data
memset(buffer, 0, size);
// Lock the memory to prevent swapping
if (mlock(buffer, size) == -1) {
perror("mlock failed");
free(buffer);
return 1;
}
printf("Memory is locked in RAM.\n");
// Here you can work with the locked memory
// Unlock the memory
if (munlock(buffer, size) == -1) {
perror("munlock failed");
}
free(buffer);
return 0;
}
9.5 Important Considerations
- Permissions: Locking memory usually requires privileged (root) permissions or an appropriate limit set via the
ulimit
command or/etc/security/limits.conf
because locked memory is guaranteed to stay in RAM, which could potentially exhaust system resources. - Use Cases: Memory locking is typically used in applications where timing is critical (real-time applications) or where it's imperative that sensitive information (e.g., cryptographic keys) not be written to disk, even in a swap area.
- Resource Management: Excessive use of memory locking can negatively impact system performance by reducing the amount of RAM available for other processes and the operating system's caching mechanisms. It's crucial to lock only as much memory as necessary and to unlock it as soon as it's no longer needed.
This feature should be used judiciously, keeping in mind the overall system performance and the security implications of locking sensitive information into physical memory.
10. mprotect
: Set Memory Permissions
The mprotect
system call in Linux is used to change the access permissions of any pages in the virtual address space of a process. This call allows a program to control whether a region of memory is readable, writable, or executable, or some combination of these. It's particularly useful for implementing security measures, such as creating read-only data areas, or for stack overflow protection by marking areas of memory as non-executable.
10.1 Prototype
#include <sys/mman.h>
int mprotect(void *addr, size_t len, int prot);
addr
: The starting address of the memory region whose access permissions are to be changed. This address must be aligned to a page boundary.len
: The length of the memory region whose access permissions are to be changed.prot
: The new protection flags for the memory region, which can be a combination of:PROT_NONE
: Pages cannot be accessed.PROT_READ
: Pages can be read.PROT_WRITE
: Pages can be written.PROT_EXEC
: Pages can be executed.
10.2 Example: Changing Memory Permissions with mprotect
The following example demonstrates how mprotect
can be used to change a memory region's permissions to read-only after initializing it, and then back to read-write.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main() {
// Allocate a page of memory
size_t pagesize = sysconf(_SC_PAGESIZE);
void* buffer = malloc(pagesize);
if (buffer == NULL) {
perror("malloc failed");
return 1;
}
// Initialize the memory with some data
strcpy(buffer, "Hello, mprotect!");
// Change the memory permissions to read-only
if (mprotect(buffer, pagesize, PROT_READ) == -1) {
perror("mprotect failed to set read-only");
free(buffer);
return 1;
}
printf("Memory set to read-only: %s\n", (char*)buffer);
// Attempt to write to the memory (this will cause a segmentation fault)
// strcpy(buffer, "Write attempt"); // Uncommenting this line will crash the program
// Change the memory permissions back to read-write
if (mprotect(buffer, pagesize, PROT_READ | PROT_WRITE) == -1) {
perror("mprotect failed to set read-write");
free(buffer);
return 1;
}
// Now writing to the memory is possible again
strcpy(buffer, "Now in read-write mode");
printf("Updated buffer: %s\n", (char*)buffer);
free(buffer);
return 0;
}
10.3 Important Notes
- Page Alignment: The address passed to
mprotect
must be aligned to a page boundary. If it's not,mprotect
fails withEINVAL
. - Security Implications: Modifying memory protections can have security implications. For example, marking previously executable areas as non-executable can mitigate certain types of attacks, like code injection or return-oriented programming (ROP) attacks.
- Error Handling: It's important to check the return value of
mprotect
for errors, as attempting to access memory in a way not allowed by its current permissions (e.g., writing to a read-only area) will result in a segmentation fault.
This functionality is a cornerstone of modern security and memory management techniques, allowing for dynamic control over how applications interact with their allocated memory.
11. nanosleep
: Pause in High Precision
The nanosleep
system call is used in Linux to suspend the execution of the calling thread for a specified duration, with nanosecond precision. It provides a higher precision alternative to functions like sleep
or usleep
, which provide second and microsecond precision, respectively. nanosleep
is particularly useful in real-time programming where precise timing is crucial.
11.1 Using nanosleep
: Prototype
#include <time.h>
int nanosleep(const struct timespec *req, struct timespec *rem);
req
: A pointer to astruct timespec
that specifies the desired sleep time. Thetimespec
structure contains two fields:tv_sec
(seconds) andtv_nsec
(nanoseconds).rem
: If non-NULL,nanosleep
will store the remaining time not slept if the call is interrupted by a signal handler. This allows the program to resume sleeping for the full duration if desired.
11.2 Example: High Precision Sleep with nanosleep
This example demonstrates how to use nanosleep
to pause the program execution for a specific duration, specified in seconds and nanoseconds.
#include <stdio.h>
#include <time.h>
int main() {
// Specify the sleep time: 2.5 seconds
struct timespec req = {
.tv_sec = 2, // 2 seconds
.tv_nsec = 500000000L // 500 million nanoseconds (0.5 seconds)
};
printf("Sleeping for 2.5 seconds...\n");
// Sleep for the requested duration
if (nanosleep(&req, NULL) == -1) {
perror("nanosleep");
return 1;
}
printf("Wake up!\n");
return 0;
}
11.3 Handling Interrupts
If nanosleep
is interrupted by a signal handler, you can use the rem
parameter to determine how much of the requested time has not been slept, and then call nanosleep
again with rem
as the req
parameter to complete the intended duration. This approach ensures that your program sleeps for the total specified time, even if interrupted.
11.4 Precision and Accuracy
While nanosleep
offers nanosecond precision, the actual resolution is limited by the system's timer, which may not provide nanosecond accuracy. The sleep duration might be rounded up to the nearest value supported by the system timer. Additionally, system load and scheduling behavior can affect the timing accuracy.
11.5 Use Cases
nanosleep
is ideal for applications requiring precise control over timing, such as multimedia applications, scientific simulations, or any application that needs to wait for specific hardware events or responses with high precision.
This function is a powerful tool for managing precise timing and delays in Linux programs, enabling developers to implement sleep intervals with much greater accuracy than traditional sleep functions.
12. readlink
: Reading Symbolic Links
The readlink
system call is utilized in Linux and Unix-like operating systems to read the value of a symbolic link. Symbolic links are essentially shortcuts or references to other files or directories. Unlike hard links, which act as direct references to file data, symbolic links point to another entry in the filesystem by name. Reading a symbolic link means obtaining the path to which the symbolic link points.
12.1 Using readlink
: Prototype
#include <unistd.h>
ssize_t readlink(const char *restrict path, char *restrict buf, size_t bufsize);
path
: The pathname of the symbolic link.buf
: A buffer where the link's target path will be stored. This buffer will not be null-terminated automatically.bufsize
: The size of the buffer. This determines the maximum number of bytes that can be read.
12.2 Example: Reading a Symbolic Link
The following example demonstrates how to use readlink
to read the target of a symbolic link and print it. This code includes handling to ensure the result is null-terminated and thus can be treated as a valid C string.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <symlink>\n", argv[0]);
return EXIT_FAILURE;
}
char *symlinkPath = argv[1];
char buf[1024]; // Buffer for the symlink target
ssize_t len = readlink(symlinkPath, buf, sizeof(buf) - 1);
if (len == -1) {
perror("readlink");
return EXIT_FAILURE;
}
buf[len] = '\0'; // Ensure null-termination
printf("The symbolic link '%s' points to '%s'\n", symlinkPath, buf);
return EXIT_SUCCESS;
}
12.3 Important Notes
- Buffer Size: It's crucial to provide a buffer that is large enough to hold the entire path plus the null terminator. If the buffer is too small to hold all of the link content, the result is truncated to
bufsize - 1
characters, potentially leading to an incomplete path. - Null Termination:
readlink
does not append a null terminator to the buffer. You must do this yourself, as shown in the example, by settingbuf[len] = '\0'
. - Return Value: On success,
readlink
returns the number of bytes placed in the buffer. On failure, it returns -1 and setserrno
to indicate the error.
Using readlink
, you can programmatically resolve the targets of symbolic links, which can be particularly useful in scripts or programs that need to work with filesystem structures or navigate through directories that contain symbolic links.
13. sendfile
: Fast Data Transfers
The sendfile
system call is a specialized Linux mechanism designed for transferring data between two file descriptors without the need to copy data into user space. This offers a more efficient way to move data, especially useful for high-performance network servers or file manipulation utilities, because it can significantly reduce CPU usage and increase throughput by leveraging the kernel to handle data transfers directly.
13.1 Using sendfile
: Prototype
#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
out_fd
: The file descriptor of the output file, typically a socket for network operations.in_fd
: The file descriptor of the input file from which data will be read.offset
: A pointer to anoff_t
variable that specifies the starting point for the data transfer. Ifoffset
is notNULL
,sendfile
will start reading data from this offset in the input file and will updateoffset
to reflect the new position just past the last byte read. Ifoffset
isNULL
,sendfile
starts reading from the current file offset and updates the file offset accordingly.count
: The number of bytes to transfer.
13.2 Example: Copying File Content with sendfile
The following example demonstrates using sendfile
to copy the content from one file to another. This could be part of a file copy utility or a server sending a file to a client over a socket.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <source_file> <destination_file>\n", argv[0]);
return EXIT_FAILURE;
}
// Open the source file for reading
int src = open(argv[1], O_RDONLY);
if (src == -1) {
perror("Failed to open source file");
return EXIT_FAILURE;
}
// Open the destination file for writing
int dest = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC, 0666);
if (dest == -1) {
perror("Failed to open destination file");
close(src);
return EXIT_FAILURE;
}
// Get the size of the source file
struct stat stat_src;
if (fstat(src, &stat_src) == -1) {
perror("Failed to stat source file");
close(src);
close(dest);
return EXIT_FAILURE;
}
// Perform the file copy
off_t offset = 0;
ssize_t bytesSent = sendfile(dest, src, &offset, stat_src.st_size);
if (bytesSent == -1) {
perror("Failed to send file");
close(src);
close(dest);
return EXIT_FAILURE;
}
printf("Copied %zd bytes from %s to %s\n", bytesSent, argv[1], argv[2]);
close(src);
close(dest);
return EXIT_SUCCESS;
}
Notes
- Efficiency:
sendfile
is particularly efficient for copying data between a file and a socket because the data transfer occurs entirely within the kernel, avoiding the overhead of moving data to and from user space. - Limitations: Originally,
sendfile
could only be used with sockets as the output file descriptor. Modern Linux kernels have relaxed this limitation, allowingsendfile
to be used with various types of file descriptors. - Use Cases:
sendfile
is widely used in web servers and FTP servers for efficiently sending files over the network. It's also useful in applications that require fast file duplication or transformation processes that can work with file descriptors directly.
This direct data transfer capability makes sendfile
an invaluable tool for developing high-performance file handling and network communication applications.
14. setitimer
: Create Timers
The setitimer
function in Unix-like operating systems allows a process to set a timer that can generate signals after a specified interval, offering a mechanism for periodic operations or implementing timeouts. It's especially useful in scenarios where non-blocking or asynchronous behavior is desired.
14.1 Understanding setitimer
The setitimer
function can set three types of timers:
ITIMER_REAL
: Decreases in real time. Upon expiration,SIGALRM
is delivered.ITIMER_VIRTUAL
: Counts down only when the process is executing. Upon expiration,SIGVTALRM
is delivered.ITIMER_PROF
: Decreases both when the process executes and when the system is executing on behalf of the process. Upon expiration,SIGPROF
is delivered.
14.2 Using setitimer
: Prototype
#include <sys/time.h>
int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value);
which
: The timer type (ITIMER_REAL
,ITIMER_VIRTUAL
,ITIMER_PROF
).new_value
: Specifies the new timer value.old_value
: If not NULL,setitimer
stores the current timer value here before it is updated.
14.3 The itimerval
Structure
struct itimerval {
struct timeval it_interval; // Next value for the timer.
struct timeval it_value; // Current value (initial countdown).
};
Each timeval
structure represents an amount of time as seconds (tv_sec
) and microseconds (tv_usec
).
14.4 Example: Setting a Periodic Timer
Here's how to set up a periodic ITIMER_REAL
timer that sends SIGALRM
every 2 seconds, demonstrating basic usage of setitimer
.
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
void handle_sigalrm(int sig) {
printf("Timer expired\n");
}
int main() {
struct itimerval timer;
struct sigaction sa;
// Set up the signal handler
memset(&sa, 0, sizeof(sa));
sa.sa_handler = &handle_sigalrm;
sigaction(SIGALRM, &sa, NULL);
// Configure the timer to expire after 2 seconds...
timer.it_value.tv_sec = 2;
timer.it_value.tv_usec = 0;
// ...and every 2 seconds after that.
timer.it_interval.tv_sec = 2;
timer.it_interval.tv_usec = 0;
// Start the timer
if (setitimer(ITIMER_REAL, &timer, NULL) == -1) {
perror("setitimer");
return 1;
}
// Main loop
while (1) {
// Your application logic here
pause(); // Wait for signals
}
return 0;
}
14.5 Notes and Best Practices
- Signal Handling: Ensure that you've set up a signal handler for the timer's signal (
SIGALRM
,SIGVTALRM
, orSIGPROF
) before starting the timer. - Accuracy and Resolution: While
setitimer
allows specifying time in microseconds, the actual resolution depends on the system's clock tick rate. Linux systems typically have a clock tick rate of 10 milliseconds (100 Hz), but this can vary. - Use in Modern Applications: For new applications, consider using timerfd (with
timerfd_create
,timerfd_settime
, etc.) for timer functionality, especially in event-driven programs that use I/O multiplexing (select
,poll
,epoll
). Timerfd integrates better with such models by providing a file descriptor that can be monitored for timer expiration events. - Portability: While
setitimer
and associated signals are standardized across Unix-like systems, specific behaviors and available resolution might vary, making timerfd a preferable choice for Linux-specific applications requiring fine-grained timer control.
15. sysinfo
: Retrieving System Statistics
The setitimer
system call allows you to create timers that generate signals after a specified interval. This functionality is particularly useful for implementing timeout operations, periodic tasks, or measuring time intervals in Unix-like operating systems. The timer counts down in real time (or process time) and sends a signal upon reaching zero. You can specify an initial countdown and an interval for periodic signals.
15.1 Using setitimer
: Prototype
#include <sys/time.h>
int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value);
which
: Specifies the timer. Common values areITIMER_REAL
(decrements in real time, and sendsSIGALRM
upon expiration),ITIMER_VIRTUAL
(decrements only when the process is executing, and sendsSIGVTALRM
), andITIMER_PROF
(decrements both when the process executes and when the system is executing on behalf of the process, sendingSIGPROF
).new_value
: Points to astruct itimerval
that specifies the new value for the timer.old_value
: If notNULL
, the current value of the timer is stored here before it is updated.
The struct itimerval
is defined as follows:
struct itimerval {
struct timeval it_interval; // Next value
struct timeval it_value; // Current value
};
Each struct timeval
represents an amount of time as seconds (tv_sec
) and microseconds (tv_usec
).
15.2 Example: Using setitimer
for a Periodic Timer
This example demonstrates setting up a periodic timer that fires every 2 seconds, using ITIMER_REAL
to measure real (wall-clock) time.
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <sys/time.h>
#include <unistd.h>
void timer_handler(int signum) {
static int count = 0;
printf("timer expired %d times\n", ++count);
}
int main() {
struct sigaction sa;
struct itimerval timer;
// Install timer_handler as the signal handler for SIGALRM.
memset(&sa, 0, sizeof(sa));
sa.sa_handler = &timer_handler;
sigaction(SIGALRM, &sa, NULL);
// Configure the timer to expire after 2 sec...
timer.it_value.tv_sec = 2;
timer.it_value.tv_usec = 0;
// ...and every 2 sec after that.
timer.it_interval.tv_sec = 2;
timer.it_interval.tv_usec = 0;
// Start a real timer.
setitimer(ITIMER_REAL, &timer, NULL);
// Do something else...
while (1) {
sleep(1);
}
return 0;
}
15.3 Key Points
- Signal Handling: The timer sends
SIGALRM
upon expiration. You must set up a signal handler to catch this signal and define the timer's behavior when it expires. - Timer Types: Choose the appropriate timer type (
ITIMER_REAL
,ITIMER_VIRTUAL
,ITIMER_PROF
) based on whether you need wall-clock timing or process/user timing. - Periodicity: To make a timer periodic, set the
it_interval
field of theitimerval
structure to the desired period. To create a one-shot timer, setit_interval
to zero.
This mechanism provides a flexible way to manage time-driven operations in your program, from simple periodic updates to more complex timing control.
16. uname
The uname
system call in Unix-like operating systems provides a simple way to retrieve the system's basic information, including the operating system name, version, architecture, and more. This information can be particularly useful for programs that need to adjust their behavior based on the system they're running on.
16.1 Using uname
: Prototype
#include <sys/utsname.h>
int uname(struct utsname *buf);
buf
: A pointer to astruct utsname
structure that will be filled with the system information.
The struct utsname
structure is defined as follows:
struct utsname {
char sysname[]; // Operating system name (e.g., "Linux")
char nodename[]; // Name within "some implementation-defined network"
char release[]; // Operating system release (e.g., "4.15.0-54-generic")
char version[]; // Operating system version
char machine[]; // Hardware identifier (e.g., "x86_64")
char domainname[]; // NIS or YP domain name
};
16.2 Example: Retrieving and Displaying System Information
This example demonstrates how to use uname
to fetch and display the system's information:
#include <stdio.h>
#include <sys/utsname.h>
int main() {
struct utsname unameData;
// Fetch the system information
if (uname(&unameData) < 0) {
perror("uname");
return 1;
}
// Display the fetched information
printf("System Name: %s\n", unameData.sysname);
printf("Node Name: %s\n", unameData.nodename);
printf("Release: %s\n", unameData.release);
printf("Version: %s\n", unameData.version);
printf("Machine: %s\n", unameData.machine);
#ifdef _GNU_SOURCE
printf("Domain Name: %s\n", unameData.domainname); // GNU extension
#endif
return 0;
}
16.3 Key Points
- The
uname
system call fills in astruct utsname
with information about the system. - This information includes the operating system name, the network node hostname, the OS release level, the OS version, and the hardware type.
- The
domainname
field is a GNU extension and might not be present on all systems. Use conditional compilation (as shown) to ensure portability when accessing this field. - The
uname
command in Unix-like systems' command-line interfaces is based on the same system call and provides similar information.
Using uname
, applications can identify the operating environment, making it possible to perform platform-specific operations or optimizations. This is particularly useful for portable applications that need to run across different Unix-like systems.
This tutorial covers a broad range of system calls you can use to interact with the Linux kernel and manipulate various system resources. Each example provides a basic usage scenario for the corresponding system call. Remember, when working with system calls, always check the return value for errors and handle them appropriately in your real applications.
🏷️ Author position : Embedded Software Engineer
🔗 Author LinkedIn : LinkedIn profile
Comments