User:Toady Nakamura/LindenAIML

From Second Life Wiki
Jump to navigation Jump to search

LindenAIML is inspired by Richard Wallace's AIML (artificial intelligence markup language). AIML utilizes the fact that humans, in conversation, often return to the same patterns of speech again and again, varying the words that determine the sentences reference, while keeping the mechanical structures largely similar. AIML replaces these variable portions of utterances with wildcards and attempts to match patterns in speech based on the invariant portions of utterances. For more information, read about ALICE. True AIML is an XML compliant language, and resembles HTML more than anything else. LindenAIML, while not XML compliant at this writing, imports most of the more commonly used features of AIML while supplementing it with features created specifically for the Second Life platform.

The basic code for the LindenAIML interpreter is as follows:

//The following code was generated by Luciftias Neurocam and Azrael Baphomet.
//It is freely distributable under the terms of the GNU public license.
//Please report significant mods to either Azrael or Luciftias.  Thank you.


string gName="filename";
key gQueryID;
list AIMList;
list REPLYList;
string data; 
string newmessage;
integer gLine;
integer handle;
integer touched=0;

integer myParse(string message)    
{
     integer begin; //index of message string marking beginning of matched pattern
    integer begin2; 
    integer begina; //as above
    integer chat_channel=0;
    integer check_line; //no longer implemented
    integer end; //length of matched pattern, so ending index actually begin+end-1
    integer end_tag;
    integer end2;
    integer end3;
    integer enda;
    integer error_cond=0; //use later
    integer i=0; //loop (line) number
    integer ind;
    integer ind2;
    integer j;
    integer lenaimlist; //length of AIMList
    integer lenclust;
    integer poundindex;
    integer reply_index=-1;
    integer s_ind;
    integer starindex; //beginning index of *ed expression in message
    list cl; //individual words in clusters parsed to individual list entries
    list cl_a;
    list cl_b;
    list cl2; //individual words in cluster2 blah blah blah
    list cluster_a;
    list cluster_b;
    list cluster2;//cluster of post * words
    list clusters; //first cluster of pre * words
    list headmsglist;
    list msglist; //string message as parsed to list, separated by " "
    list new_reply_list;
    list newmsglist;
    list poundlist;
    list prestarlist;
    list starlist;
    list startopoundlist; //grabs text between star and pound
    list tailmsglist; 
    list temp_reply_list;
    list unparsed_clusters;
    string last_message; //not used yet
    string newmsg;
    string parsed_reply;
    string poundstring;
    string resp_line;
    string starstring; //string corresponding to *ed characters
    string string1;
    string tempmessage="";
    string unparsed_clusters_string;
    string unparsed_reply;

    //begin actual program
    integer punc_index=llStringLength(message);
    punc_index--;
    if(llGetSubString(message,punc_index,punc_index)=="." || llGetSubString(message,punc_index,punc_index)=="!" || llGetSubString(message,punc_index,punc_index)=="?")
    {
        message=llDeleteSubString(message,punc_index,punc_index);
    }
    lenaimlist=llGetListLength(AIMList);
    reply_index=-1;
    while(reply_index==-1 && i< lenaimlist)//for(i=0;i<lenaimlist;i++)
    {
        clusters=[]; 
        cluster2=[];
        cl=[];
        cl2=[];
        //override loop:  if this string occurs anywhere in input, ignore all other parsing and respond with template following override
        if(llSubStringIndex(llList2String(AIMList,i),"<override>")!=-1)
        {
            //strip out <override> and </override>

            ind=llSubStringIndex(llList2String(AIMList,i),"<override>");
            string1=llDeleteSubString(llList2String(AIMList,i),ind,ind+9);
            ind2=llSubStringIndex(string1,"</override>");
            string override_string=llDeleteSubString(string1,ind2,ind2+10);//this now represents string to match to message
            if(llSubStringIndex(message,override_string)!=-1)
            {

                reply_index=i;
            }
        }         


        //find clusters/cluster2 in AIMList entry
        if(llSubStringIndex(llList2String(AIMList,i),"<pattern>")!=-1)
        {
            //strip out <pattern> and </pattern> from every line
            ind=llSubStringIndex(llList2String(AIMList,i),"<pattern>");
            string1=llDeleteSubString(llList2String(AIMList,i),ind,ind+8);
            ind2=llSubStringIndex(string1,"</pattern>");
            string unparsed_clusters_string=llDeleteSubString(string1,ind2,ind2+9);//this now represents string to match to message
            if(llSubStringIndex(unparsed_clusters_string,"*")!=-1)// && llSubStringIndex(unparsed_clusters_string,"#")==-1) //parse only lines using wildcard this way.  non-wildcard parse follows
            {
                unparsed_clusters=llParseString2List(unparsed_clusters_string,["*"],[]); //break into 2 clusters, before * and after *



                lenclust=llGetListLength(unparsed_clusters);
                //unparsed_clusters=llDeleteSubList(unparsed_clusters,lenclust,lenclust);//what does this do?    
                clusters=llList2List(unparsed_clusters,0,0); //Before * cluster            
                cluster2=llList2List(unparsed_clusters,1,1);//after * cluster
                if(llSubStringIndex(unparsed_clusters_string,"*")==0)
                {
                    cluster2=clusters;
                    clusters=[];
                }
                cl=llParseString2List( (string) clusters,[" "],[]); //breaks out individual words in clusters
                cl2=llParseString2List( (string) cluster2,[" "],[]); //breaks out individual words in cluster2
                msglist=llParseString2List(message,[" "],[]); //breaks up input message into list of individual words.            
                if(cl2==[])
                    cl2=["asdfasfads"]; //insert unmatcheable value into cl2
                if(cl==[]) 
                    cl=["hgsjgh"];

                //Case of cluster *
                if(llListFindList(msglist,cl)!=-1 && llListFindList(msglist,cl2)==-1)
                {

                    begin =llListFindList(msglist,cl); //find cl in msglist
                    end=llGetListLength(cl);
                    end--;
                    tailmsglist=llDeleteSubList(msglist,begin,begin+end); //return only * and post * words

                    starindex=llListFindList(msglist,tailmsglist);  //locates *
                    end2=llGetListLength(msglist);
                    end2--; 
                    //if(end>=0 && begin>=0 && starindex >=0 && end2>=0) //so no negative indexes
                    //    newmsg=llDumpList2String(llListReplaceList(msglist,["*"],starindex,starindex+end2)," ");  //replaces phrase with * in message
                    //what to replace * with in reply
                    starlist=llList2List(msglist,starindex,starindex+end2);

                    //check similarity of matched pattern with message string
                    if(llListReplaceList(msglist,["*"],starindex,starindex+end2)==cl+"*")
                    {
                        reply_index=i;
                        starstring=llDumpList2String(starlist," ");                                  
                    }
                }

                //Case of * cluster2
                if(llListFindList(msglist,cl)==-1 && llListFindList(msglist,cl2)!=-1)
                {            
                    begin=llListFindList(msglist,cl2);
                    end=llGetListLength(cl2);
                    end--;
                    headmsglist=llDeleteSubList(msglist,begin, begin+end);
                    starindex=llListFindList(msglist,headmsglist);
                    end2=llGetListLength(headmsglist);
                    end2--;
                    //if(end>=0 && begin >=0 && starindex>=0 && end2>=0)
                    //    new
                    starlist=llList2List(msglist, starindex, starindex+end2);
                    starstring=llDumpList2String(starlist," ");
                    if(llListReplaceList(msglist,["*"],starindex,starindex+end2)=="*"+cl2)
                    {
                        reply_index=i;
                        starstring=llDumpList2String(starlist," ");

                    }
                }

                //Case of clusters * cluster2
                if(llListFindList(msglist,cl)!=-1 && llListFindList(msglist,cl2)!=-1)
                {
                    //first delete clusters from message

                    begin=llListFindList(msglist, cl);
                    end=llGetListLength(cl);
                    end--;
                    headmsglist=llDeleteSubList(msglist,begin,begin+end);

                    //then delete cluster2 from tailmsglist
                    begina =llListFindList(headmsglist, cl2);
                    enda=llGetListLength(cl2);
                    enda--;
                    starlist=llDeleteSubList(headmsglist,begina,begina+enda);
                    starindex=llListFindList(msglist,starlist);
                    end2=llGetListLength(starlist);
                    end2--;
                    // starstring=llDumpList2String(starlist," ");
                    if(llListReplaceList(msglist,["*"],starindex,starindex+end2)==cl+"*"+cl2 )                        
                    {
                        reply_index=i;
                        starstring=llDumpList2String(starlist," ");
                    }
                }
                //case of cluster * cluster_a # cluster_b
                if(llSubStringIndex(unparsed_clusters_string,"#")!=-1)
                {
                    ////first delete clusters from message

                    begin=llListFindList(msglist, cl);
                    end=llGetListLength(cl);
                    end--;
                    headmsglist=llDeleteSubList(msglist,begin,begin+end);
                    prestarlist=llList2List(msglist,begin,begin+end);
                    //parse cluster2 into cluster_a and cluster_b
                    startopoundlist=llParseString2List((string) cluster2, ["#"],[]);
                    cluster_a=llList2List(startopoundlist,0,0);
                    list cl_a=llParseString2List( (string) cluster_a,[" "],[]);
                    //then delete cl_a from tailmsglist
                    
                    begina =llListFindList(headmsglist, cl_a);
                    enda=llGetListLength(cl_a);
                    enda--;
                   
                   
                     //replace starlist in msglist first;
                     integer endb=llGetListLength(headmsglist);
                     starlist=llDeleteSubList(headmsglist,begina,endb--);
                     starindex=llListFindList(msglist,starlist);
                     end2=llGetListLength(starlist);
                    end2--;
                    //newmsglist=llListReplaceList(tailmsglist,["*"],starindex,starindex+end2);
                     if(llGetListLength(startopoundlist)>1)
                    {
                        cluster_b=llList2List(startopoundlist,1,1);
                        cl_b=llParseString2List( (string) cluster_b,[" "],[]);
                    //find index of cl_b in msglist
                    integer clbstart=llListFindList(headmsglist,llParseString2List((string) llList2List(startopoundlist,1,1),[" "],[]));                    //delete all but this
                    integer lenmsglist=llGetListLength(headmsglist);
                    poundlist=llDeleteSubList(headmsglist,clbstart,lenmsglist--);                
                    enda=llGetListLength(llParseString2List( (string) llList2List(startopoundlist,0,0),[" "],[]));
                    poundlist=llDeleteSubList(poundlist,0,enda--) ;
                    }
                
                    //integer endb=llGetListLength(headmsglist);
                    // starlist=llDeleteSubList(headmsglist,begina,endb--);
                    
                    
                    starindex=llListFindList(msglist,starlist);
                   
                    // 
                    integer starlength=llGetListLength(starlist);
                    starlength--;
                    newmsglist=llListReplaceList(msglist,["*"],starindex,starindex+starlength--);
                    integer poundindex=llListFindList(newmsglist,poundlist);
                    integer poundlength=llGetListLength(poundlist);
                    poundlength--;
                    newmsglist=llListReplaceList(newmsglist,["#"],poundindex,poundindex+poundlength--);
                    if(newmsglist==cl+["*"]+cl_a+["#"]+cl_b)                        
                    {
                        reply_index=i;
                        starstring=llDumpList2String(starlist," ");
                        poundstring=llDumpList2String(poundlist," ");
                    }

                }
            }
            else
            {
                if(unparsed_clusters_string==message)
                    reply_index=i;
            }
        }

        i++;

    }

    if(reply_index!=-1) //if a match exists
    { 
        if(llSubStringIndex(llList2String(AIMList,reply_index+1),"<channel>")!=-1)
        {
            ind=llSubStringIndex(llList2String(AIMList,reply_index+1),"<channel>");
            string1=llDeleteSubString(llList2String(AIMList,reply_index+1),ind,ind+8);
            ind2=llSubStringIndex(string1,"</channel>");
            string chat_channel_string=llDeleteSubString(string1,ind2,ind2+9);          
            chat_channel=(integer) chat_channel_string;

        }
        unparsed_reply=llList2String(REPLYList, reply_index);
        end_tag=llSubStringIndex(unparsed_reply,"</template>");
        end_tag--;
        parsed_reply=llGetSubString(unparsed_reply,10,end_tag--); //from end of <template> to </template>
        if(llSubStringIndex(parsed_reply, "<star/>")==-1)
        {
            if(llSubStringIndex(parsed_reply,"<srai")==-1)
            {
                llSay(chat_channel,parsed_reply);

                reply_index=0;
            }
            else
            {
                error_cond=-1;
                integer lenp=llStringLength(parsed_reply); 
                integer termin=lenp--;
                termin--;termin--;termin--;termin--;termin--;termin--;termin--;termin--;
                // integer starindex2=llSubStringIndex(parsed_reply,"<star/>");
                // starindex2--;
                //if(starindex2!=-1)
                //{
                //    parsed_reply=llGetSubString(parsed_reply,6,starindex) +" " +starstring;
                //}
                //else
                //{

                newmessage=llGetSubString(parsed_reply,6,termin);
                // }
            }
        }
        else // if <star/> expression exists, insert antecedant
        {
            //if <pound/> expression exists
            if(llSubStringIndex(parsed_reply,"<poun")!=-1)
            {
                temp_reply_list=llParseString2List(parsed_reply,[" "],[]);                   
                s_ind=llListFindList(temp_reply_list,["<pound/>"]);
                new_reply_list=llListReplaceList(temp_reply_list,[poundstring],s_ind,s_ind);
                parsed_reply=llDumpList2String(new_reply_list," ");
                llSay(0,(string)new_reply_list);

            }
            //else new_reply_list=llParseString2List(parsed_reply,[" "],[]);
            if(llSubStringIndex(parsed_reply,"<srai")==-1)
            {
                temp_reply_list=llParseString2List(parsed_reply,[" "],[]);    
                integer s_ind2=llListFindList(temp_reply_list,["<star/>"]);
                temp_reply_list=llListReplaceList(temp_reply_list,[starstring],s_ind2,s_ind2);
                
                parsed_reply=llDumpList2String(temp_reply_list," ");
                llSay(chat_channel,parsed_reply);
                
                reply_index=0;
            }
            else 
            {   
                //strip out <srai> tags
                integer srai_ind1=llSubStringIndex(parsed_reply,"<srai>");
                integer srai_ind2=llSubStringIndex(parsed_reply,"</srai>");
                //parsed_reply=llDeleteSubString(parsed_reply,srai_ind1,srai_ind1+5);

                parsed_reply=llGetSubString(parsed_reply,srai_ind1+6,srai_ind2-=1);
                //if pound expression exists
                if(llSubStringIndex(parsed_reply,"<poun")!=-1)
                {
                    error_cond=-1; 
                    integer lenp=llStringLength(parsed_reply);
                    integer termin=lenp--;
                    integer poundindex2=llSubStringIndex(parsed_reply,"<pound/>");
                    
                    poundindex2--;
                    if(poundindex2!=-1)
                    {
                        parsed_reply=llGetSubString(parsed_reply,6,poundindex2) +poundstring;
                        parsed_reply=llDeleteSubString(parsed_reply,poundindex2,poundindex2+8);
                        parsed_reply=llInsertString(parsed_reply,poundindex,poundstring);
                        //use llDeleteSubString followed by llInsertString
                    }
                    else
                    {
                        parsed_reply=llGetSubString(parsed_reply,6,termin);

                    }

                }
                error_cond=-1; 
                
                integer lenp=llStringLength(parsed_reply);
                integer termin=lenp--;
                integer starindex2=llSubStringIndex(parsed_reply,"<star/>");
                starindex2--;
                if(starindex2!=-1)
                {
                   parsed_reply=llGetSubString(parsed_reply,6,starindex2) +starstring;
                   parsed_reply=llDeleteSubString(parsed_reply,starindex2,starindex2+8);
                   parsed_reply=llInsertString(parsed_reply,starindex,starstring);
                        //use llDeleteSubString followed by llInsertString

                }
                else
                {
                    newmessage=llGetSubString(parsed_reply,6,termin);

                }

            }

        }
    }
    else
    {
        llSay(0,"I'm afraid I don't understand what you said...yet");
        //email message to some address dedicated to receiving unmatched patterns.  Uncomment line after inserting correct address
      //llEmail(address@whatever.com, "LindenAIML: New Template Needed", "no match for: " +message); // send email to self
    }
    return error_cond;
}


default
{
    state_entry()    
    {
        llSetText("LindenAIML Concierge", <1,1,1>, 1.0);
        gQueryID=llGetNotecardLine(gName,gLine);//request first line
        gLine++; //increase line count
    }

    dataserver(key query_id,string data)
    {
        if(query_id==gQueryID)
        {    
            if(data!=EOF)
            {
                if( llGetSubString(data,0,3)=="<pat" || llGetSubString(data,0,3)=="<ove" || llGetSubString(data,0,3)=="<cha")
                {
                    if(gLine==0) //for now ignore all but pattern or template lines
                    {
                        AIMList=(list) [data];
                    }    
                    else
                    {
                        AIMList=AIMList + [data];
                    }
                }
                if(llGetSubString(data,0,3)=="<tem"  || llGetSubString(data,0,3)=="<cha"  )
                {
                    if(gLine==0) //for now ignore all but pattern or template lines    
                    {
                        REPLYList=(list) [data];
                    }    
                    else
                    {
                        REPLYList=REPLYList + [data];

                    }

                }
            }
            gQueryID=llGetNotecardLine(gName,gLine); //request next line
            gLine++;    
        }
    }

    touch_start(integer total_number)
    {


        if(touched==0)
        {
            handle=llListen(0,"",llGetOwner(),"");
            llSay(0,"AIML on");
            llGiveInventory(llDetectedKey(0), "LindenAIML");
            llGiveInventory(llDetectedKey(0), "LAIMLDocs");
            touched++;
        }
        else
        {
            llListenRemove(handle);
            llSay(0,"AIML off");
            touched=0;
        }            

    }

    listen(integer channel, string name, key id, string msg)
    {

        integer error_cond=myParse(msg); 
        if(error_cond==-1)
        {
            //message=newmessage;

            myParse(newmessage);
        }
    } 
}

In addition to the LindenAIML interpreter code above, one also needs a “dialogue template” written in LindenAIML designed to coordinate responses to queries made on Second Life chat.

The text below is intended to be incorporated as a notecard in an object containing the interpreter. This file, referred to as "filename" in the above code, contains the AIML tagged text to be used as a script for the interpreter. The syntax of LindenAIML is strict, as will be outlined below.

<category>
<pattern>How are you *</pattern>
<template><star/> is fine</template>
</category>
<category>
<pattern>Are you well * bot</pattern>
<template>This <star/> bot is great</template>
</category>
<category>
<pattern>Who is *</pattern>
<template>I don't know, who is <star/></template>
</category>
<category>
<override>fish</override>
<template>I hate fish</template>
</category>
<category>
<pattern>Are there too many * in this sim</pattern>
<template>I don't know much about <star/></template>
</category>
<category>
<pattern>Are you OK * bot</pattern>
<template><srai>Are you well <star/> bot</srai></template>
</category>
<category>
<pattern>Reply on channel 1</pattern>
<template>OK</template>
<channel>1</channel>
</category>
<pattern>The * in spain falls mainly # the plain</pattern>
<template>That confounded <star/> falls mainly <pound/> that blasted plain</template>
</category>

A brief explication of the tags. As of this writing, the <category> tag is not used. In the future it will contain flags about topic and such like. But right now it's simply included to keep the LindenAIML file looking somewhat familiar to AIML users.

The <pattern> tags contain patterns to match user utterances. The <template> tags contain hypothetical replies to the patterns. The <channel> tag (not included in every entry contained in <category></category> pair) allows the user to determin what channel the reply will occur on. So the bot can issue commands to other devices listening on those channels.

<override> represents a special case of <pattern>. In the example, if the user says anything that contains the word "fish", the reply will be that in the <template></template> tags immediately following the <override></override> pair.

<srai></srai> is the "syntactic reduction" tagpair. If a pattern produces a template with syntactic reduction tags enclosing the response, that responses is fed to the algorithm as a new user message to parse. The upshot of this: Patterns followed by <srai> tags are regarded as equivalent to the text contained in the <srai> tagpair, which should be defined elsewhere in the file (and by convention, previously in the file). Too many of these can slow operation considerably. Be warned.

the <star/> tag is an isolated tag and represents the insertion point for phrases denoted by the wildcard "*" in the pattern to match. So If I say: "I hate you bot", and the pattern to match is "I * you bot", the template reply might be "Well, I <star/> you too.", resulting in a reply: "Well, I hate you too."

A second wildcard has been implemented, using the “#” character. As with “*”, the “#” character will commonly replace the <pound/> tag in the reply string. As of this writing the second wildcard only processes patterns of the format "string1 * string2 # string3". Updates to LindenAIML will also allow the processing of patterns such as "* string #".

Beware punctuation following <star/> or <pound/> tags. I have not, as yet, integrated that kind of punctuation into the parsing procedures. However, end of utterance punctuation is stripped from the user message at this point, so it is not necessary to include punctuation at the end of patterns.

It is important to note that LindenAIML is sensitive to the format of the AIML file. No extra spaces between tags, and line-format should be as above. For example, if you put a space between the <pattern> tag and the first word of the actual pattern, you run the risk of confusing the interpreter. Also, tags must be lower case at this point. All tagsets except the <category></category> must be on the same line of text. In future versions, we hope to make LindenAIML more flexible sytlistically.

-Azrael Baphomet and Luciftias Neurocam --copied from [1] for the Linden Script Library. Visit my LSL wiki page for my library of scripts ! Toady Nakamura