User:Toady Nakamura/LindenAIML
LindenAIML is inspired by Richard Wallace's AIML (artificial intelligence markup language). AIML utilizes the fact that humans, in conversation, often return to the same patterns of speech again and again, varying the words that determine the sentences reference, while keeping the mechanical structures largely similar. AIML replaces these variable portions of utterances with wildcards and attempts to match patterns in speech based on the invariant portions of utterances. For more information, read about ALICE. True AIML is an XML compliant language, and resembles HTML more than anything else. LindenAIML, while not XML compliant at this writing, imports most of the more commonly used features of AIML while supplementing it with features created specifically for the Second Life platform.
The basic code for the LindenAIML interpreter is as follows:
//The following code was generated by Luciftias Neurocam and Azrael Baphomet.
//It is freely distributable under the terms of the GNU public license.
//Please report significant mods to either Azrael or Luciftias. Thank you.
string gName="filename";
key gQueryID;
list AIMList;
list REPLYList;
string data;
string newmessage;
integer gLine;
integer handle;
integer touched=0;
integer myParse(string message)
{
integer begin; //index of message string marking beginning of matched pattern
integer begin2;
integer begina; //as above
integer chat_channel=0;
integer check_line; //no longer implemented
integer end; //length of matched pattern, so ending index actually begin+end-1
integer end_tag;
integer end2;
integer end3;
integer enda;
integer error_cond=0; //use later
integer i=0; //loop (line) number
integer ind;
integer ind2;
integer j;
integer lenaimlist; //length of AIMList
integer lenclust;
integer poundindex;
integer reply_index=-1;
integer s_ind;
integer starindex; //beginning index of *ed expression in message
list cl; //individual words in clusters parsed to individual list entries
list cl_a;
list cl_b;
list cl2; //individual words in cluster2 blah blah blah
list cluster_a;
list cluster_b;
list cluster2;//cluster of post * words
list clusters; //first cluster of pre * words
list headmsglist;
list msglist; //string message as parsed to list, separated by " "
list new_reply_list;
list newmsglist;
list poundlist;
list prestarlist;
list starlist;
list startopoundlist; //grabs text between star and pound
list tailmsglist;
list temp_reply_list;
list unparsed_clusters;
string last_message; //not used yet
string newmsg;
string parsed_reply;
string poundstring;
string resp_line;
string starstring; //string corresponding to *ed characters
string string1;
string tempmessage="";
string unparsed_clusters_string;
string unparsed_reply;
//begin actual program
integer punc_index=llStringLength(message);
punc_index--;
if(llGetSubString(message,punc_index,punc_index)=="." || llGetSubString(message,punc_index,punc_index)=="!" || llGetSubString(message,punc_index,punc_index)=="?")
{
message=llDeleteSubString(message,punc_index,punc_index);
}
lenaimlist=llGetListLength(AIMList);
reply_index=-1;
while(reply_index==-1 && i< lenaimlist)//for(i=0;i<lenaimlist;i++)
{
clusters=[];
cluster2=[];
cl=[];
cl2=[];
//override loop: if this string occurs anywhere in input, ignore all other parsing and respond with template following override
if(llSubStringIndex(llList2String(AIMList,i),"<override>")!=-1)
{
//strip out <override> and </override>
ind=llSubStringIndex(llList2String(AIMList,i),"<override>");
string1=llDeleteSubString(llList2String(AIMList,i),ind,ind+9);
ind2=llSubStringIndex(string1,"</override>");
string override_string=llDeleteSubString(string1,ind2,ind2+10);//this now represents string to match to message
if(llSubStringIndex(message,override_string)!=-1)
{
reply_index=i;
}
}
//find clusters/cluster2 in AIMList entry
if(llSubStringIndex(llList2String(AIMList,i),"<pattern>")!=-1)
{
//strip out <pattern> and </pattern> from every line
ind=llSubStringIndex(llList2String(AIMList,i),"<pattern>");
string1=llDeleteSubString(llList2String(AIMList,i),ind,ind+8);
ind2=llSubStringIndex(string1,"</pattern>");
string unparsed_clusters_string=llDeleteSubString(string1,ind2,ind2+9);//this now represents string to match to message
if(llSubStringIndex(unparsed_clusters_string,"*")!=-1)// && llSubStringIndex(unparsed_clusters_string,"#")==-1) //parse only lines using wildcard this way. non-wildcard parse follows
{
unparsed_clusters=llParseString2List(unparsed_clusters_string,["*"],[]); //break into 2 clusters, before * and after *
lenclust=llGetListLength(unparsed_clusters);
//unparsed_clusters=llDeleteSubList(unparsed_clusters,lenclust,lenclust);//what does this do?
clusters=llList2List(unparsed_clusters,0,0); //Before * cluster
cluster2=llList2List(unparsed_clusters,1,1);//after * cluster
if(llSubStringIndex(unparsed_clusters_string,"*")==0)
{
cluster2=clusters;
clusters=[];
}
cl=llParseString2List( (string) clusters,[" "],[]); //breaks out individual words in clusters
cl2=llParseString2List( (string) cluster2,[" "],[]); //breaks out individual words in cluster2
msglist=llParseString2List(message,[" "],[]); //breaks up input message into list of individual words.
if(cl2==[])
cl2=["asdfasfads"]; //insert unmatcheable value into cl2
if(cl==[])
cl=["hgsjgh"];
//Case of cluster *
if(llListFindList(msglist,cl)!=-1 && llListFindList(msglist,cl2)==-1)
{
begin =llListFindList(msglist,cl); //find cl in msglist
end=llGetListLength(cl);
end--;
tailmsglist=llDeleteSubList(msglist,begin,begin+end); //return only * and post * words
starindex=llListFindList(msglist,tailmsglist); //locates *
end2=llGetListLength(msglist);
end2--;
//if(end>=0 && begin>=0 && starindex >=0 && end2>=0) //so no negative indexes
// newmsg=llDumpList2String(llListReplaceList(msglist,["*"],starindex,starindex+end2)," "); //replaces phrase with * in message
//what to replace * with in reply
starlist=llList2List(msglist,starindex,starindex+end2);
//check similarity of matched pattern with message string
if(llListReplaceList(msglist,["*"],starindex,starindex+end2)==cl+"*")
{
reply_index=i;
starstring=llDumpList2String(starlist," ");
}
}
//Case of * cluster2
if(llListFindList(msglist,cl)==-1 && llListFindList(msglist,cl2)!=-1)
{
begin=llListFindList(msglist,cl2);
end=llGetListLength(cl2);
end--;
headmsglist=llDeleteSubList(msglist,begin, begin+end);
starindex=llListFindList(msglist,headmsglist);
end2=llGetListLength(headmsglist);
end2--;
//if(end>=0 && begin >=0 && starindex>=0 && end2>=0)
// new
starlist=llList2List(msglist, starindex, starindex+end2);
starstring=llDumpList2String(starlist," ");
if(llListReplaceList(msglist,["*"],starindex,starindex+end2)=="*"+cl2)
{
reply_index=i;
starstring=llDumpList2String(starlist," ");
}
}
//Case of clusters * cluster2
if(llListFindList(msglist,cl)!=-1 && llListFindList(msglist,cl2)!=-1)
{
//first delete clusters from message
begin=llListFindList(msglist, cl);
end=llGetListLength(cl);
end--;
headmsglist=llDeleteSubList(msglist,begin,begin+end);
//then delete cluster2 from tailmsglist
begina =llListFindList(headmsglist, cl2);
enda=llGetListLength(cl2);
enda--;
starlist=llDeleteSubList(headmsglist,begina,begina+enda);
starindex=llListFindList(msglist,starlist);
end2=llGetListLength(starlist);
end2--;
// starstring=llDumpList2String(starlist," ");
if(llListReplaceList(msglist,["*"],starindex,starindex+end2)==cl+"*"+cl2 )
{
reply_index=i;
starstring=llDumpList2String(starlist," ");
}
}
//case of cluster * cluster_a # cluster_b
if(llSubStringIndex(unparsed_clusters_string,"#")!=-1)
{
////first delete clusters from message
begin=llListFindList(msglist, cl);
end=llGetListLength(cl);
end--;
headmsglist=llDeleteSubList(msglist,begin,begin+end);
prestarlist=llList2List(msglist,begin,begin+end);
//parse cluster2 into cluster_a and cluster_b
startopoundlist=llParseString2List((string) cluster2, ["#"],[]);
cluster_a=llList2List(startopoundlist,0,0);
list cl_a=llParseString2List( (string) cluster_a,[" "],[]);
//then delete cl_a from tailmsglist
begina =llListFindList(headmsglist, cl_a);
enda=llGetListLength(cl_a);
enda--;
//replace starlist in msglist first;
integer endb=llGetListLength(headmsglist);
starlist=llDeleteSubList(headmsglist,begina,endb--);
starindex=llListFindList(msglist,starlist);
end2=llGetListLength(starlist);
end2--;
//newmsglist=llListReplaceList(tailmsglist,["*"],starindex,starindex+end2);
if(llGetListLength(startopoundlist)>1)
{
cluster_b=llList2List(startopoundlist,1,1);
cl_b=llParseString2List( (string) cluster_b,[" "],[]);
//find index of cl_b in msglist
integer clbstart=llListFindList(headmsglist,llParseString2List((string) llList2List(startopoundlist,1,1),[" "],[])); //delete all but this
integer lenmsglist=llGetListLength(headmsglist);
poundlist=llDeleteSubList(headmsglist,clbstart,lenmsglist--);
enda=llGetListLength(llParseString2List( (string) llList2List(startopoundlist,0,0),[" "],[]));
poundlist=llDeleteSubList(poundlist,0,enda--) ;
}
//integer endb=llGetListLength(headmsglist);
// starlist=llDeleteSubList(headmsglist,begina,endb--);
starindex=llListFindList(msglist,starlist);
//
integer starlength=llGetListLength(starlist);
starlength--;
newmsglist=llListReplaceList(msglist,["*"],starindex,starindex+starlength--);
integer poundindex=llListFindList(newmsglist,poundlist);
integer poundlength=llGetListLength(poundlist);
poundlength--;
newmsglist=llListReplaceList(newmsglist,["#"],poundindex,poundindex+poundlength--);
if(newmsglist==cl+["*"]+cl_a+["#"]+cl_b)
{
reply_index=i;
starstring=llDumpList2String(starlist," ");
poundstring=llDumpList2String(poundlist," ");
}
}
}
else
{
if(unparsed_clusters_string==message)
reply_index=i;
}
}
i++;
}
if(reply_index!=-1) //if a match exists
{
if(llSubStringIndex(llList2String(AIMList,reply_index+1),"<channel>")!=-1)
{
ind=llSubStringIndex(llList2String(AIMList,reply_index+1),"<channel>");
string1=llDeleteSubString(llList2String(AIMList,reply_index+1),ind,ind+8);
ind2=llSubStringIndex(string1,"</channel>");
string chat_channel_string=llDeleteSubString(string1,ind2,ind2+9);
chat_channel=(integer) chat_channel_string;
}
unparsed_reply=llList2String(REPLYList, reply_index);
end_tag=llSubStringIndex(unparsed_reply,"</template>");
end_tag--;
parsed_reply=llGetSubString(unparsed_reply,10,end_tag--); //from end of <template> to </template>
if(llSubStringIndex(parsed_reply, "<star/>")==-1)
{
if(llSubStringIndex(parsed_reply,"<srai")==-1)
{
llSay(chat_channel,parsed_reply);
reply_index=0;
}
else
{
error_cond=-1;
integer lenp=llStringLength(parsed_reply);
integer termin=lenp--;
termin--;termin--;termin--;termin--;termin--;termin--;termin--;termin--;
// integer starindex2=llSubStringIndex(parsed_reply,"<star/>");
// starindex2--;
//if(starindex2!=-1)
//{
// parsed_reply=llGetSubString(parsed_reply,6,starindex) +" " +starstring;
//}
//else
//{
newmessage=llGetSubString(parsed_reply,6,termin);
// }
}
}
else // if <star/> expression exists, insert antecedant
{
//if <pound/> expression exists
if(llSubStringIndex(parsed_reply,"<poun")!=-1)
{
temp_reply_list=llParseString2List(parsed_reply,[" "],[]);
s_ind=llListFindList(temp_reply_list,["<pound/>"]);
new_reply_list=llListReplaceList(temp_reply_list,[poundstring],s_ind,s_ind);
parsed_reply=llDumpList2String(new_reply_list," ");
llSay(0,(string)new_reply_list);
}
//else new_reply_list=llParseString2List(parsed_reply,[" "],[]);
if(llSubStringIndex(parsed_reply,"<srai")==-1)
{
temp_reply_list=llParseString2List(parsed_reply,[" "],[]);
integer s_ind2=llListFindList(temp_reply_list,["<star/>"]);
temp_reply_list=llListReplaceList(temp_reply_list,[starstring],s_ind2,s_ind2);
parsed_reply=llDumpList2String(temp_reply_list," ");
llSay(chat_channel,parsed_reply);
reply_index=0;
}
else
{
//strip out <srai> tags
integer srai_ind1=llSubStringIndex(parsed_reply,"<srai>");
integer srai_ind2=llSubStringIndex(parsed_reply,"</srai>");
//parsed_reply=llDeleteSubString(parsed_reply,srai_ind1,srai_ind1+5);
parsed_reply=llGetSubString(parsed_reply,srai_ind1+6,srai_ind2-=1);
//if pound expression exists
if(llSubStringIndex(parsed_reply,"<poun")!=-1)
{
error_cond=-1;
integer lenp=llStringLength(parsed_reply);
integer termin=lenp--;
integer poundindex2=llSubStringIndex(parsed_reply,"<pound/>");
poundindex2--;
if(poundindex2!=-1)
{
parsed_reply=llGetSubString(parsed_reply,6,poundindex2) +poundstring;
parsed_reply=llDeleteSubString(parsed_reply,poundindex2,poundindex2+8);
parsed_reply=llInsertString(parsed_reply,poundindex,poundstring);
//use llDeleteSubString followed by llInsertString
}
else
{
parsed_reply=llGetSubString(parsed_reply,6,termin);
}
}
error_cond=-1;
integer lenp=llStringLength(parsed_reply);
integer termin=lenp--;
integer starindex2=llSubStringIndex(parsed_reply,"<star/>");
starindex2--;
if(starindex2!=-1)
{
parsed_reply=llGetSubString(parsed_reply,6,starindex2) +starstring;
parsed_reply=llDeleteSubString(parsed_reply,starindex2,starindex2+8);
parsed_reply=llInsertString(parsed_reply,starindex,starstring);
//use llDeleteSubString followed by llInsertString
}
else
{
newmessage=llGetSubString(parsed_reply,6,termin);
}
}
}
}
else
{
llSay(0,"I'm afraid I don't understand what you said...yet");
//email message to some address dedicated to receiving unmatched patterns. Uncomment line after inserting correct address
//llEmail(address@whatever.com, "LindenAIML: New Template Needed", "no match for: " +message); // send email to self
}
return error_cond;
}
default
{
state_entry()
{
llSetText("LindenAIML Concierge", <1,1,1>, 1.0);
gQueryID=llGetNotecardLine(gName,gLine);//request first line
gLine++; //increase line count
}
dataserver(key query_id,string data)
{
if(query_id==gQueryID)
{
if(data!=EOF)
{
if( llGetSubString(data,0,3)=="<pat" || llGetSubString(data,0,3)=="<ove" || llGetSubString(data,0,3)=="<cha")
{
if(gLine==0) //for now ignore all but pattern or template lines
{
AIMList=(list) [data];
}
else
{
AIMList=AIMList + [data];
}
}
if(llGetSubString(data,0,3)=="<tem" || llGetSubString(data,0,3)=="<cha" )
{
if(gLine==0) //for now ignore all but pattern or template lines
{
REPLYList=(list) [data];
}
else
{
REPLYList=REPLYList + [data];
}
}
}
gQueryID=llGetNotecardLine(gName,gLine); //request next line
gLine++;
}
}
touch_start(integer total_number)
{
if(touched==0)
{
handle=llListen(0,"",llGetOwner(),"");
llSay(0,"AIML on");
llGiveInventory(llDetectedKey(0), "LindenAIML");
llGiveInventory(llDetectedKey(0), "LAIMLDocs");
touched++;
}
else
{
llListenRemove(handle);
llSay(0,"AIML off");
touched=0;
}
}
listen(integer channel, string name, key id, string msg)
{
integer error_cond=myParse(msg);
if(error_cond==-1)
{
//message=newmessage;
myParse(newmessage);
}
}
}
In addition to the LindenAIML interpreter code above, one also needs a “dialogue template” written in LindenAIML designed to coordinate responses to queries made on Second Life chat.
The text below is intended to be incorporated as a notecard in an object containing the interpreter. This file, referred to as "filename" in the above code, contains the AIML tagged text to be used as a script for the interpreter. The syntax of LindenAIML is strict, as will be outlined below.
<category>
<pattern>How are you *</pattern>
<template><star/> is fine</template>
</category>
<category>
<pattern>Are you well * bot</pattern>
<template>This <star/> bot is great</template>
</category>
<category>
<pattern>Who is *</pattern>
<template>I don't know, who is <star/></template>
</category>
<category>
<override>fish</override>
<template>I hate fish</template>
</category>
<category>
<pattern>Are there too many * in this sim</pattern>
<template>I don't know much about <star/></template>
</category>
<category>
<pattern>Are you OK * bot</pattern>
<template><srai>Are you well <star/> bot</srai></template>
</category>
<category>
<pattern>Reply on channel 1</pattern>
<template>OK</template>
<channel>1</channel>
</category>
<pattern>The * in spain falls mainly # the plain</pattern>
<template>That confounded <star/> falls mainly <pound/> that blasted plain</template>
</category>
A brief explication of the tags. As of this writing, the <category> tag is not used. In the future it will contain flags about topic and such like. But right now it's simply included to keep the LindenAIML file looking somewhat familiar to AIML users.
The <pattern> tags contain patterns to match user utterances. The <template> tags contain hypothetical replies to the patterns. The <channel> tag (not included in every entry contained in <category></category> pair) allows the user to determin what channel the reply will occur on. So the bot can issue commands to other devices listening on those channels.
<override> represents a special case of <pattern>. In the example, if the user says anything that contains the word "fish", the reply will be that in the <template></template> tags immediately following the <override></override> pair.
<srai></srai> is the "syntactic reduction" tagpair. If a pattern produces a template with syntactic reduction tags enclosing the response, that responses is fed to the algorithm as a new user message to parse. The upshot of this: Patterns followed by <srai> tags are regarded as equivalent to the text contained in the <srai> tagpair, which should be defined elsewhere in the file (and by convention, previously in the file). Too many of these can slow operation considerably. Be warned.
the <star/> tag is an isolated tag and represents the insertion point for phrases denoted by the wildcard "*" in the pattern to match. So If I say: "I hate you bot", and the pattern to match is "I * you bot", the template reply might be "Well, I <star/> you too.", resulting in a reply: "Well, I hate you too."
A second wildcard has been implemented, using the “#” character. As with “*”, the “#” character will commonly replace the <pound/> tag in the reply string. As of this writing the second wildcard only processes patterns of the format "string1 * string2 # string3". Updates to LindenAIML will also allow the processing of patterns such as "* string #".
Beware punctuation following <star/> or <pound/> tags. I have not, as yet, integrated that kind of punctuation into the parsing procedures. However, end of utterance punctuation is stripped from the user message at this point, so it is not necessary to include punctuation at the end of patterns.
It is important to note that LindenAIML is sensitive to the format of the AIML file. No extra spaces between tags, and line-format should be as above. For example, if you put a space between the <pattern> tag and the first word of the actual pattern, you run the risk of confusing the interpreter. Also, tags must be lower case at this point. All tagsets except the <category></category> must be on the same line of text. In future versions, we hope to make LindenAIML more flexible sytlistically.
-Azrael Baphomet and Luciftias Neurocam --copied from [1] for the Linden Script Library. Visit my LSL wiki page for my library of scripts ! Toady Nakamura