您的当前位置：首页词法分析器设计思路

词法分析器设计思路

来源：纷纭教育

词法分析设计思路

一、实验目的

设计、编制并调试一个词法分析程序，加深对词法分析原理的理解。

二、实验要求

2.1 待分析的简单的词法

（1）关键字：

begin if then while do end 所有的关键字都是小写。（2）运算符和界符

:= + - * / < <= <> > >= = ; ( ) #

（3）其他单词是标识符（ID）和整型常数（SUM），通过以下正规式定义： ID = letter (letter | digit)* NUM = digit digit*

（4）空格有空白、制表符和换行符组成。空格一般用来分隔ID、SUM、运算符、界符和关键字，词法分析阶段通常被忽略。

2.2 各种单词符号对应的种别码：

表2.1 各种单词符号对应的种别码单词符号种别码单词符号种别码 bgin 1 : 17 If 2 := 18 then while do end lettet（letter|digit）* dight dight* + - * / 3 4 5 6 10 11 13 14 15 16 < <> <= > >= = ; ( ) # 20 21 22 23 24 25 26 27 28 0 2.3 词法分析程序的功能：输入：所给文法的源程序input.txt文本文档。

输出：四元组（type，name，rows，cols）构成的output.txt文本文档。其中：type为单词种别码； Name为单词名称； Rows为所在文件行号； Cols为所在文件列号。

例如：对源程序begin Y:=9: if Y>9 then Y:=2*x+1/3; end #的源文件，经过词法分析后输出如下序列：

(1,begin)(10,x)(18,:=)(11,9)(26,;)(2,if)……

词法分析状态转换图（终结状态右上角*表示多读一个符号）空格/换行0#1字母/数字字母2数字数字4非数字5*数字非字母数字3*标识符（需进一步判断是否为关键字）#，结束+6非==78*++=*-9非==1011--=...()1213()

三、词法分析程序的算法思想：

算法的基本任务是从字符串表示的源程序中识别出具有意义的单词符号，其基本思想是根据扫描到单词符号的第一个字符的种类，拼出相应的单词符号。 3.1 主程序示意图：

主程序示意图如图3-1所示。其中初始包括以下两个方面： ⑴ 关键字表的初值。

关键字作为特殊标识符处理，把它们预先安排在一张表格中（称为关键字表），当扫描程序识别出标识符时，查关键字表。如能查到匹配的单词，则该单词为关键字，否则为一般标识符。关键字表为一个字符串数组，其描述如下：

Char *rwtab[6] = {“begin”, “if”, “then”, “while”, “do”, “end”};

否

置初值调用扫描子程序输出单词二元组输入串结束是结束图3-1 （2）程序中需要用到的主要变量为syn,token和sum 3.2 扫描子程序的算法思想：

首先设置3个变量：①token用来存放构成单词符号的字符串；②sum用来存放整型单词；③syn用来存放单词符号的种别码。扫描子程序主要部分流程如图3-2所示。变量初始化忽略空格是返回是否文件结束？否字母数字其他运算符、符号拼字符串界符等符号拼数否是否关键字？对不同符号给是报错出相应的syn值 syn=10 syn为对应关键字的 syn=11单词种别码 11 返回图 3-2

四、词法分析程序的C++语言程序源代码：

#include\"stdafx.h\"

#include\"conio.h\" //包含getch函数的头文件 #include\"stdlib.h\" //包含exit函数的头文件 #include #include

char prog[81],token[8],ch,tmp; int syn,p,m,n,sum,rows;

char *rwtab[6]={\"begin\void scaner(); void main() {

FILE *out; FILE *in;

rows=1;

if((out=fopen(\"d:\\\\output.txt\{printf(\"\\non error buiding d:\\\\output.txt\\n\"); getch(); exit(1); }

if((in=fopen(\"d:\\\\input.txt\{printf(\"\\nConn't found d:\\\\input.txt\\n\"); getch(); exit(1);

}/* printf(\"\\n please input a string(end with '#'):\\n\");*/ rewind(in); rewind(out);

fprintf(out,\"%-5s%-10s%-5s%5s\\n\do{ p=0; do

{ /*scanf(\"%c\ if(p==80)

{printf(\"\\non error colunm's length more than 80!\\n\"); getch(); exit(1); }

ch=fgetc(in); prog[p++]=ch;

}while(ch!=EOF&&ch!='\\n'); prog[p]='#'; tmp=ch; p=0; do{

scaner(); switch(syn)

{case 11:fprintf(out,\"%-5d%-10d%-5d%5d\\n\ break;

case -1:printf(\"you have input a wrong string,rows cols=(%d,%d)\\n\ getch(); exit(0);

default:if(syn!=0) fprintf(out,\"%-5d%-10s%-5d%5d\\n\ break; }

}while(syn!=0);

if(tmp=='\\n') rows++; }while(ch!=EOF); fclose(in); fclose(out);

}

void scaner() { sum=0;

for(m=0;m<8;m++)token[m++]=NULL; ch=prog[p++]; m=0;

while((ch==' ')||(ch=='\\n'))ch=prog[p++];

if(((ch<='z')&&(ch>='a'))||((ch<='Z')&&(ch>='A')))

{ while(((ch<='z')&&(ch>='a'))||((ch<='Z')&&(ch>='A'))||((ch>='0')&&(ch<='9'))) { token[m++]=ch; ch=prog[p++]; } p--; syn=10;

for(n=0;n<6;n++)

if(strcmp(token,rwtab[n])==0) { syn=n+1; break; } }

else if((ch>='0')&&(ch<='9')) { while((ch>='0')&&(ch<='9')) { sum=sum*10+ch-'0'; ch=prog[p++]; } p--; syn=11; }

else switch(ch)

{ case '<':token[m++]=ch; ch=prog[p++]; if(ch=='=') { syn=22;

token[m++]=ch; } else

{ syn=20; p--; } break;

case '>':token[m++]=ch; ch=prog[p++]; if(ch=='=') { syn=24;

token[m++]=ch; } else

{ syn=23; p--; } break;

case '+': token[m++]=ch; ch=prog[p++]; if(ch=='+') { syn=17;

token[m++]=ch; } else

{ syn=13;

p--; } break;

case '-':token[m++]=ch; ch=prog[p++]; if(ch=='-') { syn=29;

token[m++]=ch; } else

{ syn=14; p--; } break;

case '!':ch=prog[p++]; if(ch=='=') { syn=21;

token[m++]=ch; } else

{ syn=31; p--; } break;

case '=':token[m++]=ch; ch=prog[p++]; if(ch=='=') { syn=25;

token[m++]=ch; } else

{ syn=18; p--; } break;

case '*': syn=15; token[m++]=ch; break;

case '/': syn=16; token[m++]=ch; break;

case '(': syn=27; token[m++]=ch; break;

case ')': syn=28; token[m++]=ch; break;

case '{': syn=5; token[m++]=ch; break;

case '}': syn=6; token[m++]=ch; break;

case ';': syn=26; token[m++]=ch; break;

case '\\\"': syn=30;

token[m++]=ch; break;

case EOF:syn=0;break; case '#': syn=0; break; case ':':syn=17; token[m++]=ch; break;

default: syn=-1; break; }

token[m++]='\\0';

} 五、结果分析：

运行程序，加载input.txt文档（begin x:=9: if x>9 then x:=2*x+1/3; end # ）后经词法分析输出如下序列：(begin 1)(x 10)(：17)(= 18)(9 11)(；26)(if 2)…… 如图5-1所示：

图5-1

六、总结：

词法分析的基本任务是从字符串表示的源程序中识别出具有意义的单词符号，其基本思想是根据扫描到单词符号的第一个字符的种类，拼出相应的单词符号。通过本试验的完成，更加加深了对词法分析原理的理解。

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文