新聞中心

EEPW首頁 > 嵌入式系統(tǒng) > 設(shè)計(jì)應(yīng)用 > Arm Linux系統(tǒng)調(diào)用流程詳細(xì)解析

Arm Linux系統(tǒng)調(diào)用流程詳細(xì)解析

作者: 時(shí)間:2016-11-09 來源:網(wǎng)絡(luò) 收藏
Linux系統(tǒng)通過向內(nèi)核發(fā)出系統(tǒng)調(diào)用(system call)實(shí)現(xiàn)了用戶態(tài)進(jìn)程和硬件設(shè)備之間的大部分接口。

系統(tǒng)調(diào)用是操作系統(tǒng)提供的服務(wù),用戶程序通過各種系統(tǒng)調(diào)用,來引用內(nèi)核提供的各種服務(wù),系統(tǒng)調(diào)用的執(zhí)行讓用戶程序陷入內(nèi)核,該陷入動(dòng)作由swi軟中斷完成。

本文引用地址:http://m.butianyuan.cn/article/201611/318013.htm

1、用戶可以通過兩種方式使用系統(tǒng)調(diào)用:

第一種方式是通過C庫函數(shù),包括系統(tǒng)調(diào)用在C庫中的封裝函數(shù)和其他普通函數(shù)。

第二種方式是使用_syscall宏。2.6.18版本之前的內(nèi)核,在include/asm-i386/unistd.h文件中定義有7個(gè)_syscall宏,分別是:

_syscall0(type,name)  _syscall1(type,name,type1,arg1)  _syscall2(type,name,type1,arg1,type2,arg2)  _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3)  _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4)  _syscall5(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,type5,arg5)  _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,type5,arg5,type6,arg6) 

其中,type表示所生成系統(tǒng)調(diào)用的返回值類型,name表示該系統(tǒng)調(diào)用的名稱,typeN、argN分別表示第N個(gè)參數(shù)的類型和名稱,它們的數(shù)目和_syscall后面的數(shù)字一樣大。

這些宏的作用是創(chuàng)建名為name的函數(shù),_syscall后面跟的數(shù)字指明了該函數(shù)的參數(shù)的個(gè)數(shù)。

比如sysinfo系統(tǒng)調(diào)用用于獲取系統(tǒng)總體統(tǒng)計(jì)信息,使用_syscall宏定義為:

_syscall1(int, sysinfo, struct sysinfo *, info); 

展開后的形式為:

int sysinfo(struct sysinfo * info)  {        long __res;        __asm__ volatile("int $0x80" : "=a" (__res) : "0" (116),"b" ((long)(info)));        do {              if ((unsigned long)(__res) >= (unsigned long)(-(128 + 1)))     {                    errno = -(__res);                    __res  = -1;              }              return (int) (__res);        } while (0);  } 

可以看出,_syscall1(int, sysinfo, struct sysinfo *, info)展開成一個(gè)名為sysinfo的函數(shù),原參數(shù)int就是函數(shù)的返回類型,原參數(shù)struct sysinfo *和info分別構(gòu)成新函數(shù)的參數(shù)。

在程序文件里使用_syscall宏定義需要的系統(tǒng)調(diào)用,就可以在接下來的代碼中通過系統(tǒng)調(diào)用名稱直接調(diào)用該系統(tǒng)調(diào)用。下面是一個(gè)使用sysinfo系統(tǒng)調(diào)用的實(shí)例。

代碼清單5.1 sysinfo系統(tǒng)調(diào)用使用實(shí)例

#include  #include  #include          #include        /* for struct sysinfo */  _syscall1(int, sysinfo, struct sysinfo *, info);       int main(void)  {    struct sysinfo s_info;    int error;  error = sysinfo(&s_info);    printf("code error = %d/n", error);    printf("Uptime = %lds/nLoad:       1 min %lu / 5 min %lu / 15 min %lu/n"             "RAM: total %lu / free %lu / shared %lu/n"             "Memory in buffers = %lu/nSwap: total %lu / free %lu/n"          "Number of processes = %d/n",    s_info.uptime,       s_info.loads[0], s_info.loads[1], s_info.loads[2],               s_info.totalram, s_info.freeram,  s_info.sharedram, s_info.bufferram, s_info.totalswap, s_info.freeswap,              s_info.procs);      exit(EXIT_SUCCESS);       } 

但是自2.6.19版本開始,_syscall宏被廢除,我們需要使用syscall函數(shù),通過指定系統(tǒng)調(diào)用號(hào)和一組參數(shù)來調(diào)用系統(tǒng)調(diào)用。

syscall函數(shù)原型為:

int syscall(int number, ...); 

其中number是系統(tǒng)調(diào)用號(hào),number后面應(yīng)順序接上該系統(tǒng)調(diào)用的所有參數(shù)。下面是gettid系統(tǒng)調(diào)用的調(diào)用實(shí)例。

代碼清單5.2 gettid系統(tǒng)調(diào)用使用實(shí)例

#include  #include  #include  #define __NR_gettid      224  int main(int argc, char *argv[])  {       pid_t tid;  tid = syscall(__NR_gettid);  }

大部分系統(tǒng)調(diào)用都包括了一個(gè)SYS_符號(hào)常量來指定自己到系統(tǒng)調(diào)用號(hào)的映射,因此上面第10行可重寫為:

tid = syscall(SYS_gettid);  

2 系統(tǒng)調(diào)用與應(yīng)用編程接口(API)區(qū)別

應(yīng)用編程接口(API)與系統(tǒng)調(diào)用的不同在于,前者只是一個(gè)函數(shù)定義,說明了如何獲得一個(gè)給定的服務(wù),而后者是通過軟件中斷向內(nèi)核發(fā)出的一個(gè)明確的請求。POSIX標(biāo)準(zhǔn)針對API,而不針對系統(tǒng)調(diào)用。Unix系統(tǒng)給程序員提供了很多API庫函數(shù)。libc的標(biāo)準(zhǔn)c庫所定義的一些API引用了封裝例程(wrapper routine)(其唯一目的就是發(fā)布系統(tǒng)調(diào)用)。通常情況下,每個(gè)系統(tǒng)調(diào)用對應(yīng)一個(gè)封裝例程,而封裝例程定義了應(yīng)用程序使用的API。反之則不然,一個(gè)API沒必要對應(yīng)一個(gè)特定的系統(tǒng)調(diào)用。從編程者的觀點(diǎn)看,API和系統(tǒng)調(diào)用之間的差別是沒有關(guān)系的:唯一相關(guān)的事情就是函數(shù)名、參數(shù)類型及返回代碼的含義。然而,從內(nèi)核設(shè)計(jì)者的觀點(diǎn)看,這種差別確實(shí)有關(guān)系,因?yàn)橄到y(tǒng)調(diào)用屬于內(nèi)核,而用戶態(tài)的庫函數(shù)不屬于內(nèi)核。

大部分封裝例程返回一個(gè)整數(shù),其值的含義依賴于相應(yīng)的系統(tǒng)調(diào)用。返回-1通常表示內(nèi)核不能滿足進(jìn)程的請求。系統(tǒng)調(diào)用處理程序的失敗可能是由無效參數(shù)引起的,也可能是因?yàn)槿狈捎觅Y源,或硬件出了問題等等。在libd庫中定義的errno變量包含特定的出錯(cuò)碼。每個(gè)出錯(cuò)碼定義為一個(gè)常量宏。

當(dāng)用戶態(tài)的進(jìn)程調(diào)用一個(gè)系統(tǒng)調(diào)用時(shí),CPU切換到內(nèi)核態(tài)并開始執(zhí)行一個(gè)內(nèi)核函數(shù)。因?yàn)閮?nèi)核實(shí)現(xiàn)了很多不同的系統(tǒng)調(diào)用,因此進(jìn)程必須傳遞一個(gè)名為系統(tǒng)調(diào)用號(hào)(system call number)的參數(shù)來識(shí)別所需的系統(tǒng)調(diào)用。所有的系統(tǒng)調(diào)用都返回一個(gè)整數(shù)值。這些返回值與封裝例程返回值的約定是不同的。在內(nèi)核中,整數(shù)或0表示系統(tǒng)調(diào)用成功結(jié)束,而負(fù)數(shù)表示一個(gè)出錯(cuò)條件。在后一種情況下,這個(gè)值就是存放在errno變量中必須返回給應(yīng)用程序的負(fù)出錯(cuò)碼。

3 系統(tǒng)調(diào)用執(zhí)行過程

ARM Linux系統(tǒng)利用SWI指令來從用戶空間進(jìn)入內(nèi)核空間,還是先讓我們了解下這個(gè)SWI指令吧。SWI指令用于產(chǎn)生軟件中斷,從而實(shí)現(xiàn)從用戶模式變換到管理模式,CPSR保存到管理模式的SPSR,執(zhí)行轉(zhuǎn)移到SWI向量。在其他模式下也可使用SWI指令,處理器同樣地切換到管理模式。指令格式如下:

SWI{cond} immed_24

其中:

immed_2424位立即數(shù),值為從0——16215之間的整數(shù)。

使用SWI指令時(shí),通常使用一下兩種方法進(jìn)行參數(shù)傳遞,SWI異常處理程序可以提供相關(guān)的服務(wù),這兩種方法均是用戶軟件協(xié)定。SWI異常中斷處理程序要通過讀取引起軟件中斷的SWI指令,以取得24為立即數(shù)。

1)、指令中24位的立即數(shù)指定了用戶請求的服務(wù)類型,參數(shù)通過通用寄存器傳遞。如:

MOV R0,#34SWI 12

2)、指令中的24位立即數(shù)被忽略,用戶請求的服務(wù)類型有寄存器R0的只決定,參數(shù)通過其他的通用寄存器傳遞。如:

MOV R0, #12MOV R1, #34SWI 0

在SWI異常處理程序中,去除SWI立即數(shù)的步驟為:首先確定一起軟中斷的SWI指令時(shí)ARM指令還是Thumb指令,這可通過對SPSR訪問得到;然后取得該SWI指令的地址,這可通過訪問LR寄存器得到;接著讀出指令,分解出立即數(shù)(低24位)。

由用戶空間進(jìn)入系統(tǒng)調(diào)用

通常情況下,我們寫的代碼都是通過封裝的C lib來調(diào)用系統(tǒng)調(diào)用的。以0.9.30版uClibc中的open為例,來追蹤一下這個(gè)封裝的函數(shù)是如何一步一步的調(diào)用系統(tǒng)調(diào)用的。在include/fcntl.h中有定義:

# define open open64

open實(shí)際上只是open64的一個(gè)別名而已。

在libc/sysdeps/linux/common/open64.c中可以看到:

extern __typeof(open64) __libc_open64;extern __typeof(open) __libc_open;

可見open64也只不過是__libc_open64的別名,而__libc_open64函數(shù)在同一個(gè)文件中定義:

libc_hidden_proto(__libc_open64)int __libc_open64 (const char *file, int oflag, ...){mode_t mode = 0;if (oflag & O_CREAT){va_list arg;va_start (arg, oflag);mode = va_arg (arg, mode_t);va_end (arg);}return __libc_open(file, oflag  O_LARGEFILE, mode);}libc_hidden_def(__libc_open64)

最終__libc_open64又調(diào)用了__libc_open函數(shù),這個(gè)函數(shù)在文件libc/sysdeps/linux/common/open.c中定義:

libc_hidden_proto(__libc_open)int __libc_open(const char *file, int oflag, ...){mode_t mode = 0;if (oflag & O_CREAT) {va_list arg;va_start (arg, oflag);mode = va_arg (arg, mode_t);va_end (arg);}return __syscall_open(file, oflag, mode);}libc_hidden_def(__libc_open)

__syscall_open在同一個(gè)文件中定義:

static __inline__ _syscall3(int, __syscall_open, const char *, file, int, flags, __kernel_mode_t, mode)

在文件libc/sysdeps/linux/arm/bits/syscalls.h文件中可以看到:

#undef _syscall3#define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3) type name(type1 arg1,type2 arg2,type3 arg3) { return (type) (INLINE_SYSCALL(name, 3, arg1, arg2, arg3)); }

這個(gè)宏實(shí)際上完成定義一個(gè)函數(shù)的工作,這個(gè)宏的第一個(gè)參數(shù)是函數(shù)的返回值類型,第二個(gè)參數(shù)是函數(shù)名,之后的參數(shù)就如同它的參數(shù)名所表明的那樣,分別是函數(shù)的參數(shù)類型及參數(shù)名。__syscall_open實(shí)際上為:

int __syscall_open (const char * file,int flags, __kernel_mode_t mode){return (int) (INLINE_SYSCALL(__syscall_open, 3, file, flags, mode));}

INLINE_SYSCALL為同一個(gè)文件中定義的宏:

#undef INLINE_SYSCALL#define INLINE_SYSCALL(name, nr, args...)            ({ unsigned int _inline_sys_result = INTERNAL_SYSCALL (name, , nr, args);  if (__builtin_expect (INTERNAL_SYSCALL_ERROR_P (_inline_sys_result, ), 0))  {                        __set_errno (INTERNAL_SYSCALL_ERRNO (_inline_sys_result, ));    _inline_sys_result = (unsigned int) -1;          }                        (int) _inline_sys_result; })#undef INTERNAL_SYSCALL#if !defined(__thumb__)#if defined(__ARM_EABI__)#define INTERNAL_SYSCALL(name, err, nr, args...)        ({unsigned int __sys_result;                 {                          register int _a1 __asm__ ("r0"), _nr __asm__ ("r7");    LOAD_ARGS_##nr (args)                _nr = SYS_ify(name);                 __asm__ __volatile__ ("swi  0x0   @ syscall " #name  : "=r" (_a1)            : "r" (_nr) ASM_ARGS_##nr        : "memory");            __sys_result = _a1;               }                          (int) __sys_result; })#else /* defined(__ARM_EABI__) */#define INTERNAL_SYSCALL(name, err, nr, args...)        ({ unsigned int __sys_result;                {                          register int _a1 __asm__ ("a1");               LOAD_ARGS_##nr (args)                __asm__ __volatile__ ("swi  %1 @ syscall " #name  : "=r" (_a1)               : "i" (SYS_ify(name)) ASM_ARGS_##nr    : "memory");               __sys_result = _a1;                  }                          (int) __sys_result; })#endif#else /* !defined(__thumb__) *//* We cant use push/pop inside the asm because that breaksunwinding (ie. thread cancellation).*/#define INTERNAL_SYSCALL(name, err, nr, args...)        ({ unsigned int __sys_result;                {                           int _sys_buf[2];                   register int _a1 __asm__ ("a1");                register int *_v3 __asm__ ("v3") = _sys_buf;       *_v3 = (int) (SYS_ify(name));               LOAD_ARGS_##nr (args)                 __asm__ __volatile__ ("str   r7, [v3, #4]n"       "tldr   r7, [v3]n"           "tswi   0  @ syscall " #name "n"      "tldr   r7, [v3, #4]"            : "=r" (_a1)                : "r" (_v3) ASM_ARGS_##nr            : "memory");              __sys_result = _a1;                  }                           (int) __sys_result; })#endif /*!defined(__thumb__)*/

這里也將同文件中的LOAD_ARGS宏的定義貼出來:

#define LOAD_ARGS_0()#define ASM_ARGS_0#define LOAD_ARGS_1(a1)           _a1 = (int) (a1);            LOAD_ARGS_0 ()#define ASM_ARGS_1 ASM_ARGS_0, "r" (_a1)#define LOAD_ARGS_2(a1, a2)       register int _a2 __asm__ ("a2") = (int) (a2);   LOAD_ARGS_1 (a1)#define ASM_ARGS_2 ASM_ARGS_1, "r" (_a2)#define LOAD_ARGS_3(a1, a2, a3)         register int _a3 __asm__ ("a3") = (int) (a3);   LOAD_ARGS_2 (a1, a2)

這項(xiàng)宏用來在相應(yīng)的寄存器中加載相應(yīng)的參數(shù)。SYS_ify宏獲得系統(tǒng)調(diào)用號(hào)

#define SYS_ify(syscall_name)  (__NR_##syscall_name)

也就是__NR___syscall_open,在libc/sysdeps/linux/common/open.c中可以看到這個(gè)宏的定義:

#define __NR___syscall_open __NR_open

__NR_open在內(nèi)核代碼的頭文件中有定義。在r7寄存器中存放系統(tǒng)調(diào)用號(hào),而參數(shù)傳遞似乎和普通的函數(shù)調(diào)用的參數(shù)傳遞也沒有什么區(qū)別。

在這個(gè)地方,得注意那個(gè)EABI,EABI是什么東西呢?ABI,Application Binary Interface,應(yīng)用二進(jìn)制接口。在較新的EABI規(guī)范中,是將系統(tǒng)調(diào)用號(hào)壓入寄存器r7中,而在老的OABI中則是執(zhí)行的swi中斷號(hào)的方式,也就是說原來的調(diào)用方式(Old ABI)是通過跟隨在swi指令中的調(diào)用號(hào)來進(jìn)行的。同時(shí)這兩種調(diào)用方式的系統(tǒng)調(diào)用號(hào)也是存在這區(qū)別的,在內(nèi)核的文件arch/arm/inclue/asm/unistd.h中可以看到:

#define __NR_OABI_SYSCALL_BASE0x900#if defined(__thumb__)  defined(__ARM_EABI__)#define __NR_SYSCALL_BASE   0#else#define __NR_SYSCALL_BASE   __NR_OABI_SYSCALL_BASE#endif/** This file contains the system call numbers.*/#define __NR_restart_syscall      (__NR_SYSCALL_BASE+  0)#define __NR_exit        (__NR_SYSCALL_BASE+  1)#define __NR_fork        (__NR_SYSCALL_BASE+  2)#define __NR_read        (__NR_SYSCALL_BASE+  3)#define __NR_write       (__NR_SYSCALL_BASE+  4)#define __NR_open        (__NR_SYSCALL_BASE+  5)……

接下來來看操作系統(tǒng)對系統(tǒng)調(diào)用的處理。我們回到ARM Linux的異常向量表,因?yàn)楫?dāng)執(zhí)行swi時(shí),會(huì)從異常向量表中取例程的地址從而跳轉(zhuǎn)到相應(yīng)的處理程序中。在文件arch/arm/kernel/entry-armv.S中:

/** We group all the following data together to optimise* for CPUs with separate I & D caches.*/.align    5.LCvswi:.word    vector_swi.globl    __stubs_end__stubs_end:.equ    stubs_offset, __vectors_start + 0x200 - __stubs_start.globl    __vectors_start__vectors_start:ARM(    swi    SYS_ERROR0    )THUMB(    svc    #0        )THUMB(    nop            )W(b)    vector_und + stubs_offsetW(ldr)    pc, .LCvswi + stubs_offsetW(b)    vector_pabt + stubs_offsetW(b)    vector_dabt + stubs_offsetW(b)    vector_addrexcptn + stubs_offsetW(b)    vector_irq + stubs_offsetW(b)    vector_fiq + stubs_offset.globl    __vectors_end__vectors_end:

而.LCvswi在同一個(gè)文件中定義為:

.LCvswi:.word vector_swi

也就是最終會(huì)執(zhí)行例程vector_swi來完成對系統(tǒng)調(diào)用的處理,接下來我們來看下在arch/arm/kernel/entry-common.S中定義的vector_swi例程:

/*=============================================================================* SWI handler*--*//* If were optimising for StrongARM the resulting code wont run on an ARM7 and we can save a couple of instructions.  --pb */#ifdef CONFIG_CPU_ARM710#define A710(code...) code.Larm710bug:ldmia    sp, {r0 - lr}^            @ Get calling r0 - lrmov    r0, r0add    sp, sp, #S_FRAME_SIZEsubs    pc, lr, #4#else#define A710(code...)#endif.align    5ENTRY(vector_swi)sub    sp, sp, #S_FRAME_SIZEstmia    sp, {r0 - r12}            @ Calling r0 - r12ARM(    add    r8, sp, #S_PC        )ARM(    stmdb    r8, {sp, lr}^        )    @ Calling sp, lrTHUMB(    mov    r8, sp            )THUMB(    store_user_sp_lr r8, r10, S_SP    )    @ calling sp, lrmrs    r8, spsr            @ called from non-FIQ mode, so ok.str    lr, [sp, #S_PC]            @ Save calling PCstr    r8, [sp, #S_PSR]        @ Save CPSRstr    r0, [sp, #S_OLD_R0]        @ Save OLD_R0zero_fp/** Get the system call number.*/#if defined(CONFIG_OABI_COMPAT)/** If we have CONFIG_OABI_COMPAT then we need to look at the swi* value to determine if it is an EABI or an old ABI call.*/#ifdef CONFIG_ARM_THUMBtst    r8, #PSR_T_BITmovne    r10, #0                @ no thumb OABI emulationldreq    r10, [lr, #-4]            @ get SWI instruction#elseldr    r10, [lr, #-4]            @ get SWI instructionA710(    and    ip, r10, #0x0f        @ check for SWI        )A710(    teq    ip, #0x0f                        )A710(    bne    .Larm710bug                        )#endif#ifdef CONFIG_CPU_ENDIAN_BE8rev    r10, r10            @ little endian instruction#endif#elif defined(CONFIG_AEABI)/** Pure EABI user space always put syscall number into scno (r7).*/A710(    ldr    ip, [lr, #-4]            @ get SWI instruction    )A710(    and    ip, ip, #0x0f        @ check for SWI        )A710(    teq    ip, #0x0f                        )A710(    bne    .Larm710bug                        )#elif defined(CONFIG_ARM_THUMB)/* Legacy ABI only, possibly thumb mode. */tst    r8, #PSR_T_BIT            @ this is SPSR from save_user_regsaddne    scno, r7, #__NR_SYSCALL_BASE    @ put OS number inldreq    scno, [lr, #-4]#else/* Legacy ABI only. */ldr    scno, [lr, #-4]            @ get SWI instructionA710(    and    ip, scno, #0x0f        @ check for SWI        )A710(    teq    ip, #0x0f                        )A710(    bne    .Larm710bug                        )#endif#ifdef CONFIG_ALIGNMENT_TRAPldr    ip, __cr_alignmentldr    ip, [ip]mcr    p15, 0, ip, c1, c0        @ update control register#endifenable_irq

   //tsk 是寄存器r9的別名,在arch/arm/kernel/entry-header.S中定義:// tsk .req r9 @current thread_info

// 獲得線程對象的基地址。

get_thread_info tsk

// tbl是r8寄存器的別名,在arch/arm/kernel/entry-header.S中定義:

// tbl .req r8 @syscall table pointer,

// 用來存放系統(tǒng)調(diào)用表的指針,系統(tǒng)調(diào)用表在后面調(diào)用

adr    tbl, sys_call_table        @ load syscall table pointer#if defined(CONFIG_OABI_COMPAT)/** If the swi argument is zero, this is an EABI call and we do nothing.** If this is an old ABI call, get the syscall number into scno and* get the old ABI syscall table address.*/bics    r10, r10, #0xffeorne    scno, r10, #__NR_OABI_SYSCALL_BASEldrne    tbl, =sys_oabi_call_table#elif !defined(CONFIG_AEABI)  // scno是寄存器r7的別名bic    scno, scno, #0xff        @ mask off SWI op-codeeor    scno, scno, #__NR_SYSCALL_BASE    @ check OS number#endifldr    r10, [tsk, #TI_FLAGS]        @ check for syscall tracingstmdb    sp!, {r4, r5}            @ push fifth and sixth args#ifdef CONFIG_SECCOMPtst    r10, #_TIF_SECCOMPbeq    1fmov    r0, scnobl    __secure_computing    add    r0, sp, #S_R0 + S_OFF        @ pointer to regsldmia    r0, {r0 - r3}            @ have to reload r0 - r31:#endiftst    r10, #_TIF_SYSCALL_TRACE        @ are we tracing syscalls?bne    __sys_tracecmp    scno, #NR_syscalls        @ check upper syscall limitadr    lr, BSYM(ret_fast_syscall)    @ return addressldrcc    pc, [tbl, scno, lsl #2]        @ call sys_* routineadd    r1, sp, #S_OFF

// why也是r8寄存器的別名

2: mov why, #0@ no longer a real syscall

cmp    scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE)eor    r0, scno, #__NR_SYSCALL_BASE    @ put OS number backbcs    arm_syscall    b    sys_ni_syscall            @ not private funcENDPROC(vector_swi)/** This is the really slow path.  Were going to be doing* context switches, and waiting for our parent to respond.*/__sys_trace:mov    r2, scnoadd    r1, sp, #S_OFFmov    r0, #0                @ trace entry [IP = 0]bl    syscall_traceadr    lr, BSYM(__sys_trace_return)    @ return addressmov    scno, r0            @ syscall number (possibly new)add    r1, sp, #S_R0 + S_OFF        @ pointer to regscmp    scno, #NR_syscalls        @ check upper syscall limitldmccia    r1, {r0 - r3}            @ have to reload r0 - r3ldrcc    pc, [tbl, scno, lsl #2]        @ call sys_* routineb    2b__sys_trace_return:str    r0, [sp, #S_R0 + S_OFF]!    @ save returned r0mov    r2, scnomov    r1, spmov    r0, #1                @ trace exit [IP = 1]bl    syscall_traceb    ret_slow_syscall.align    5#ifdef CONFIG_ALIGNMENT_TRAP.type    __cr_alignment, #object__cr_alignment:.word    cr_alignment#endif.ltorg/** This is the syscall table declaration for native ABI syscalls.* With EABI a couple syscalls are obsolete and defined as sys_ni_syscall.*/#define ABI(native, compat) native#ifdef CONFIG_AEABI#define OBSOLETE(syscall) sys_ni_syscall#else#define OBSOLETE(syscall) syscall#endif.type    sys_call_table, #objectENTRY(sys_call_table)#include "calls.S"#undef ABI#undef OBSOLETE

上面的zero_fp是一個(gè)宏,在arch/arm/kernel/entry-header.S中定義:

.macro zero_fp#ifdef CONFIG_FRAME_POINTERmov   fp, #0#endif.endm//而fp位寄存器r11。

像每一個(gè)異常處理程序一樣,要做的第一件事當(dāng)然就是保護(hù)現(xiàn)場了。緊接著是獲得系統(tǒng)調(diào)用的系統(tǒng)調(diào)用號(hào)。

然后以系統(tǒng)調(diào)用號(hào)作為索引來查找系統(tǒng)調(diào)用表,如果系統(tǒng)調(diào)用號(hào)正常的話,就會(huì)調(diào)用相應(yīng)的處理例程來處理,就是上面的那個(gè)ldrccpc, [tbl, scno, lsl #2]語句,然后通過例程ret_fast_syscall來返回。

在這個(gè)地方我們接著來討論ABI的問題?,F(xiàn)在,我們首先來看兩個(gè)宏,一個(gè)是CONFIG_OABI_COMPAT意思是說與old ABI兼容,另一個(gè)是CONFIG_AEABI意思是說指定現(xiàn)在的方式為EABI。這兩個(gè)宏可以同時(shí)配置,也可以都不配,也可以配置任何一種。我們來看一下內(nèi)核是怎么處理這一問題的。我們知道,sys_call_table在內(nèi)核中是個(gè)跳轉(zhuǎn)表,這個(gè)表中存儲(chǔ)的是一系列的函數(shù)指針,這些指針就是系統(tǒng)調(diào)用函數(shù)的指針,如(sys_open)。系統(tǒng)調(diào)用是根據(jù)一個(gè)系統(tǒng)調(diào)用號(hào)(通常就是表的索引)找到實(shí)際該調(diào)用內(nèi)核哪個(gè)函數(shù),然后通過運(yùn)行該函數(shù)完成的。
首先,對于old ABI,內(nèi)核給出的處理是為它建立一個(gè)單獨(dú)的system call table,叫sys_oabi_call_table,這樣,兼容方式下就會(huì)有兩個(gè)system call table,以old ABI方式的系統(tǒng)調(diào)用會(huì)執(zhí)行old_syscall_table表中的系統(tǒng)調(diào)用函數(shù),EABI方式的系統(tǒng)調(diào)用會(huì)用sys_call_table中的函數(shù)指針。
配置無外乎以下4中:
第一、兩個(gè)宏都配置行為就是上面說的那樣。
第二、只配置CONFIG_OABI_COMPAT,那么以old ABI方式調(diào)用的會(huì)用sys_oabi_call_table,以EABI方式調(diào)用的用sys_call_table,和1實(shí)質(zhì)上是相同的。只是情況1更加明確。
第三、只配置CONFIG_AEABI系統(tǒng)中不存在sys_oabi_call_table,對old ABI方式調(diào)用不兼容。只能 以EABI方式調(diào)用,用sys_call_table。

第四、兩個(gè)都沒有配置,系統(tǒng)默認(rèn)會(huì)只允許old ABI方式,但是不存在old_syscall_table,最終會(huì)通過sys_call_table完成函數(shù)調(diào)用

系統(tǒng)會(huì)根據(jù)ABI的不同而將相應(yīng)的系統(tǒng)調(diào)用表的基地址加載進(jìn)tbl寄存器,也就是r8寄存器。接下來來看系統(tǒng)調(diào)用表,如前面所說的那樣,有兩個(gè),同樣都在文件arch/arm/kernel/entry-common.S中:

/** This is the syscall table declaration for native ABI syscalls.* With EABI a couple syscalls are obsolete and defined as sys_ni_syscall.*/#define ABI(native, compat) native#ifdef CONFIG_AEABI#define OBSOLETE(syscall) sys_ni_syscall#else#define OBSOLETE(syscall) syscall#endif.type    sys_call_table, #objectENTRY(sys_call_table)#include "calls.S"#undef ABI#undef OBSOLETE

另外一個(gè)為:

/** This is the syscall table declaration for native ABI syscalls.* With EABI a couple syscalls are obsolete and defined as sys_ni_syscall.*/#define ABI(native, compat) native#ifdef CONFIG_AEABI#define OBSOLETE(syscall) sys_ni_syscall#else#define OBSOLETE(syscall) syscall#endif.type    sys_call_table, #objectENTRY(sys_call_table)#include "calls.S"#undef ABI#undef OBSOLETE

這樣看來貌似兩個(gè)系統(tǒng)調(diào)用表是完全一樣的。這里預(yù)處理指令include的獨(dú)特用法也挺有意思,在系統(tǒng)調(diào)用表的內(nèi)容就是整個(gè)arch/arm/kernel/calls.S文件的內(nèi)容這個(gè)文件的內(nèi)容如下(由于太長,這里就不全部列出了):

/**  linux/arch/arm/kernel/calls.S**  Copyright (C) 1995-2005 Russell King** This program is free software; you can redistribute it and/or modify* it under the terms of the GNU General Public License version 2 as* published by the Free Software Foundation.**  This file is included thrice in entry-common.S*//* 0 */        CALL(sys_restart_syscall)CALL(sys_exit)CALL(sys_fork_wrapper)CALL(sys_read)CALL(sys_write)/* 5 */        CALL(sys_open)CALL(sys_close)CALL(sys_ni_syscall)        /* was sys_waitpid */CALL(sys_creat)CALL(sys_link)...

這個(gè)是同樣在文件arch/arm/kernel/entry-common.S中的宏CALL()的定義:

.equ NR_syscalls,0#define CALL(x) .equ NR_syscalls,NR_syscalls+1#include "calls.S"#undef CALL#define CALL(x) .long x

最后再羅嗦一點(diǎn),如果用sys_open來搜的話,是搜不到系統(tǒng)調(diào)用open的定義的,系統(tǒng)調(diào)用函數(shù)都是用宏來定義的,比如對于open,在文件fs/open.c文件中這樣定義:

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int, mode){long ret;if (force_o_largefile())flags = O_LARGEFILE;ret = do_sys_open(AT_FDCWD, filename, flags, mode);/* avoid REGPARM breakage on x86: */asmlinkage_protect(3, ret, filename, flags, mode);return ret;}

繼續(xù)回到vector_swi,而如果系統(tǒng)調(diào)用號(hào)不正確,則會(huì)調(diào)用arm_syscall函數(shù)來進(jìn)行處理,這個(gè)函數(shù)在文件arch/arm/kernel/traps.c中定義:

/** Handle all unrecognised system calls.*  0x9f0 - 0x9fffff are some more esoteric system calls*/#define NR(x) ((__ARM_NR_##x) - __ARM_NR_BASE)asmlinkage int arm_syscall(int no, struct pt_regs *regs){struct thread_info *thread = current_thread_info();siginfo_t info;if ((no >> 16) != (__ARM_NR_BASE>> 16))return bad_syscall(no, regs);switch (no & 0xffff) {case 0: /* branch through 0 */info.si_signo = SIGSEGV;info.si_errno = 0;info.si_code  = SEGV_MAPERR;info.si_addr  = NULL;arm_notify_die("branch through zero", regs, &info, 0, 0);return 0;case NR(breakpoint): /* SWI BREAK_POINT */regs->ARM_pc -= thumb_mode(regs) ? 2 : 4;ptrace_break(current, regs);return regs->ARM_r0;/** Flush a region from virtual address r0 to virtual address r1* _exclusive_.  There is no alignment requirement on either address;* user space does not need to know the hardware cache layout.** r2 contains flags.  It should ALWAYS be passed as ZERO until it* is defined to be something else.  For now we ignore it, but may* the fires of hell burn in your belly if you break this rule. ;)** (at a later date, we may want to allow this call to not flush* various aspects of the cache.  Passing 0 will guarantee that* everything necessary gets flushed to maintain consistency in* the specified region).*/case NR(cacheflush):do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2);return 0;case NR(usr26):if (!(elf_hwcap & HWCAP_26BIT))break;regs->ARM_cpsr &= ~MODE32_BIT;return regs->ARM_r0;case NR(usr32):if (!(elf_hwcap & HWCAP_26BIT))break;regs->ARM_cpsr = MODE32_BIT;return regs->ARM_r0;case NR(set_tls):thread->tp_value = regs->ARM_r0;if (tls_emu)return 0;if (has_tls_reg) {asm ("mcr p15, 0, %0, c13, c0, 3": : "r" (regs->ARM_r0));} else {/** User space must never try to access this directly.* Expect your app to break eventually if you do so.* The user helper at 0xffff0fe0 must be used instead.* (see entry-armv.S for details)*/*((unsigned int *)0xffff0ff0) = regs->ARM_r0;}return 0;#ifdef CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG/** Atomically store r1 in *r2 if *r2 is equal to r0 for user space.* Return zero in r0 if *MEM was changed or non-zero if no exchange* happened.  Also set the user C flag accordingly.* If access permissions have to be fixed up then non-zero is* returned and the operation has to be re-attempted.** *NOTE*: This is a ghost syscall private to the kernel.  Only the* __kuser_cmpxchg code in entry-armv.S should be aware of its* existence.  Dont ever use this from user code.*/case NR(cmpxchg):for (;;) {extern void do_DataAbort(unsigned long addr, unsigned int fsr,struct pt_regs *regs);unsigned long val;unsigned long addr = regs->ARM_r2;struct mm_struct *mm = current->mm;pgd_t *pgd; pmd_t *pmd; pte_t *pte;spinlock_t *ptl;regs->ARM_cpsr &= ~PSR_C_BIT;down_read(&mm->mmap_sem);pgd = pgd_offset(mm, addr);if (!pgd_present(*pgd))goto bad_access;pmd = pmd_offset(pgd, addr);if (!pmd_present(*pmd))goto bad_access;pte = pte_offset_map_lock(mm, pmd, addr, &ptl);if (!pte_present(*pte)  !pte_dirty(*pte)) {pte_unmap_unlock(pte, ptl);goto bad_access;}val = *(unsigned long *)addr;val -= regs->ARM_r0;if (val == 0) {*(unsigned long *)addr = regs->ARM_r1;regs->ARM_cpsr = PSR_C_BIT;}pte_unmap_unlock(pte, ptl);up_read(&mm->mmap_sem);return val;bad_access:up_read(&mm->mmap_sem);/* simulate a write access fault */do_DataAbort(addr, 15 + (1 << 11), regs);}#endifdefault:/* Calls 9f00xx..9f07ff are defined to return -ENOSYSif not implemented, rather than raising SIGILL.  Thisway the calling program can gracefully determine whethera feature is supported.  */if ((no & 0xffff) <= 0x7ff)return -ENOSYS;break;}#ifdef CONFIG_DEBUG_USER/** experience shows that these seem to indicate that* something catastrophic has happened*/if (user_debug & UDBG_SYSCALL) {printk("[%d] %s: arm syscall %dn",task_pid_nr(current), current->comm, no);dump_instr("", regs);if (user_mode(regs)) {__show_regs(regs);c_backtrace(regs->ARM_fp, processor_mode(regs));}}#endifinfo.si_signo = SIGILL;info.si_errno = 0;info.si_code  = ILL_ILLTRP;info.si_addr  = (void __user *)instruction_pointer(regs) -(thumb_mode(regs) ? 2 : 4);arm_notify_die("Oops - bad syscall(2)", regs, &info, no, 0);return 0;}

還有那個(gè)sys_ni_syscall,這個(gè)函數(shù)在kernel/sys_ni.c中定義,它的作用似乎也僅僅是要給用戶空間返回錯(cuò)誤碼ENOSYS。

/*  we cant #include  here,but tell gcc to not warn with -Wmissing-prototypes  */asmlinkage long sys_ni_syscall(void);/** Non-implemented system calls get redirected here.*/asmlinkage long sys_ni_syscall(void){return -ENOSYS;}

系統(tǒng)調(diào)用號(hào)正確也好不正確也好,最終都是通過ret_fast_syscall例程來返回,同樣在arch/arm/kernel/entry-common.S文件中:

.align    5/** This is the fast syscall return path.  We do as little as* possible here, and this includes saving r0 back into the SVC* stack.*/ret_fast_syscall:UNWIND(.fnstart    )UNWIND(.cantunwind    )disable_irq                @ disable interruptsldr    r1, [tsk, #TI_FLAGS]tst    r1, #_TIF_WORK_MASKbne    fast_work_pending#if defined(CONFIG_IRQSOFF_TRACER)asm_trace_hardirqs_on#endif/* perform architecture specific actions before user return */arch_ret_to_user r1, lrrestore_user_regs fast = 1, offset = S_OFFUNWIND(.fnend        )

四.聲明系統(tǒng)調(diào)用的相關(guān)宏

linux下的系統(tǒng)調(diào)用函數(shù)定義接口:

1.SYSCALL_DEFINE1~6(include/linux/syscalls.h)

#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)

2.SYSCALL_DEFINEx

#ifdef CONFIG_FTRACE_SYSCALLS#define SYSCALL_DEFINEx(x, sname, ...)                static const char *types_##sname[] = {            __SC_STR_TDECL##x(__VA_ARGS__)            };                            static const char *args_##sname[] = {            __SC_STR_ADECL##x(__VA_ARGS__)            };                            SYSCALL_METADATA(sname, x);                __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)#else#define SYSCALL_DEFINEx(x, sname, ...)                __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)#endif

3.__SYSCALL_DEFINEx

#ifdef CONFIG_HAVE_SYSCALL_WRAPPERS#define SYSCALL_DEFINE(name) static inline long SYSC_##name#define __SYSCALL_DEFINEx(x, name, ...)                    asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__));        static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__));    asmlinkage long SyS##name(__SC_LONG##x(__VA_ARGS__))        {                                __SC_TEST##x(__VA_ARGS__);                return (long) SYSC##name(__SC_CAST##x(__VA_ARGS__));    }                                SYSCALL_ALIAS(sys##name, SyS##name);                static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__))#else /* CONFIG_HAVE_SYSCALL_WRAPPERS */#define SYSCALL_DEFINE(name) asmlinkage long sys_##name#define __SYSCALL_DEFINEx(x, name, ...)                    asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__))#endif /* CONFIG_HAVE_SYSCALL_WRAPPERS */

4.__SC_開頭的宏

#define __SC_DECL1(t1, a1)    t1 a1#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)#define __SC_DECL3(t3, a3, ...) t3 a3, __SC_DECL2(__VA_ARGS__)#define __SC_DECL4(t4, a4, ...) t4 a4, __SC_DECL3(__VA_ARGS__)#define __SC_DECL5(t5, a5, ...) t5 a5, __SC_DECL4(__VA_ARGS__)#define __SC_DECL6(t6, a6, ...) t6 a6, __SC_DECL5(__VA_ARGS__)#define __SC_LONG1(t1, a1)     long a1#define __SC_LONG2(t2, a2, ...) long a2, __SC_LONG1(__VA_ARGS__)#define __SC_LONG3(t3, a3, ...) long a3, __SC_LONG2(__VA_ARGS__)#define __SC_LONG4(t4, a4, ...) long a4, __SC_LONG3(__VA_ARGS__)#define __SC_LONG5(t5, a5, ...) long a5, __SC_LONG4(__VA_ARGS__)#define __SC_LONG6(t6, a6, ...) long a6, __SC_LONG5(__VA_ARGS__)#define __SC_CAST1(t1, a1)    (t1) a1#define __SC_CAST2(t2, a2, ...) (t2) a2, __SC_CAST1(__VA_ARGS__)#define __SC_CAST3(t3, a3, ...) (t3) a3, __SC_CAST2(__VA_ARGS__)#define __SC_CAST4(t4, a4, ...) (t4) a4, __SC_CAST3(__VA_ARGS__)#define __SC_CAST5(t5, a5, ...) (t5) a5, __SC_CAST4(__VA_ARGS__)#define __SC_CAST6(t6, a6, ...) (t6) a6, __SC_CAST5(__VA_ARGS__)...

5.針對SYSCALL_DEFINE1(close, unsigned int, fd)來分析一下

SYSCALL_DEFINE1(close, unsigned int, fd)根據(jù)#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)

化簡SYSCALL_DEFINEx(1, _close, __VA_ARGS__)【 ##是連接符的意思】,根據(jù)SYSCALL_DEFINEx的定義

化簡__SYSCALL_DEFINEx(1, _close, __VA_ARGS__) 根據(jù)__SYSCALL_DEFINEx的定義

#define __SYSCALL_DEFINEx(1, _close, ...)                asmlinkage long sys_close(__SC_DECL1(__VA_ARGS__));        static inline long SYSC_close(__SC_DECL1(__VA_ARGS__));    asmlinkage long SyS_close(__SC_LONG1(__VA_ARGS__))        {                            __SC_TEST1(__VA_ARGS__);                return (long) SYSC_close(__SC_CAST1(__VA_ARGS__));    }                            SYSCALL_ALIAS(sys_close, SyS_close);                static inline long SYSC_close(__SC_DECL1(__VA_ARGS__))

這里__VA_ARGS__是可變參數(shù)宏,可以認(rèn)為等于unsigned int, fd

根據(jù)__SC_宏化簡

#define __SYSCALL_DEFINEx(1, _close, ...)                asmlinkage long sys_close(unsigned int fd);            static inline long SYSC_close(unsigned int fd);        asmlinkage long SyS_close(long fd))                {                            BUILD_BUG_ON(sizeof(unsigned int) > sizeof(long))    return (long) SYSC_close((unsigned int)fd);        }                            SYSCALL_ALIAS(sys_close, SyS_close);                static inline long SYSC_close(unsigned int fd)

聲明了sys_close函數(shù)

定義了SyS_close函數(shù),函數(shù)體調(diào)用SYSC_close函數(shù),并返回其返回值

SYSCALL_ALIAS宏

#define SYSCALL_ALIAS(alias, name)                    asm ("t.globl " #alias "nt.set " #alias ", " #name)

插入?yún)R編代碼 讓執(zhí)行sys_close等同于執(zhí)行SYS_close

#define SYSCALL_ALIAS(alias, name)                    asm ("t.globl " #alias "nt.set " #alias ", " #name)

【#是預(yù)處理的意思】

BUILD_BUG_ON宏是個(gè)錯(cuò)誤判斷檢測的功能

最后一句是SYSC_close的函數(shù)定義

所以在SYSCALL_DEFINE1宏定義后面緊跟的是{}包圍起來的函數(shù)體

6.根據(jù)5的解析可推斷出

SYSCALL_DEFINE1的1代表的是sys_close的參數(shù)個(gè)數(shù)為1

同理SYSCALL_DEFINE?的/代表的是sys_name的參數(shù)為?個(gè)

7.系統(tǒng)調(diào)用函數(shù)的定義用SYSCALL_DEFINE宏修飾

系統(tǒng)調(diào)用函數(shù)的外部聲明在include/linux/Syscalls.h頭文件中

5 添加新的系統(tǒng)調(diào)用

第一、打開arch/arm/kernel/calls.S,在最后添加系統(tǒng)調(diào)用的函數(shù)原型的指針,例如:

CALL(sys_set_senda)

補(bǔ)充說明一點(diǎn)關(guān)于NR_syscalls的東西,這個(gè)常量表示系統(tǒng)調(diào)用的總的個(gè)數(shù),在較新版本的內(nèi)核中,文件arch/arm/kernel/entry-common.S中可以找到:

.equ NR_syscalls,0#define CALL(x) .equ NR_syscalls,NR_syscalls+1#include "calls.S"#undef CALL#define CALL(x) .long x

相當(dāng)?shù)那擅?,不是嗎?在系統(tǒng)調(diào)用表中每添加一個(gè)系統(tǒng)調(diào)用,NR_syscalls就自動(dòng)增加一。在這個(gè)地方先求出NR_syscalls,然后重新定義CALL(x)宏,這樣也可以不影響文件后面系統(tǒng)調(diào)用表的建立。

第二、打開include/asm-arm/unistd.h,添加系統(tǒng)調(diào)用號(hào)的宏,感覺這步可以省略,因?yàn)檫@個(gè)地方定義的系統(tǒng)調(diào)用號(hào)主要是個(gè)C庫,比如uClibc、Glibc用的。例如:

#define __NR_plan_set_senda             (__NR_SYSCALL_BASE+365)

為了向后兼容,系統(tǒng)調(diào)用只能增加而不能減少,這里的編號(hào)添加時(shí),也必須按順序來。否則會(huì)導(dǎo)致核心運(yùn)行錯(cuò)誤。

第三,實(shí)例化該系統(tǒng)調(diào)用,即編寫新添加系統(tǒng)調(diào)用的實(shí)現(xiàn)例如:

SYSCALL_DEFINE1(set_senda, int,iset){if(iset)UART_PUT_CR(&at91_port[2],AT91C_US_SENDA);elseUART_PUT_CR(&at91_port[2],AT91C_US_RSTSTA);return 0;}

第四、打開include/linux/syscalls.h添加函數(shù)聲明

asmlinkage long sys_set_senda(int iset);

第五、在應(yīng)用程序中調(diào)用該系統(tǒng)調(diào)用,可以參考uClibc的實(shí)現(xiàn)。

第六、結(jié)束。



關(guān)鍵詞: ArmLinux系統(tǒng)調(diào)

評論


技術(shù)專區(qū)

關(guān)閉