Formula Library
Library Overview
Introduction
The FORMULA library allows users to communicate with programs by inputing algebraic equations. This library will parse and evaluate expressions read from the user.The FORMULA library allows the application to add variables and functions to extend the formula environment.
The following is sample input that the FORMULA library accepts:
/* the factorial function */
fact(a) {
if( a == 0 ) {
return 1;
} else
return a * fact(a-1);
}
(x * 2 + y)/sin(z);
j++;
j[10] = fact(8);
list[0] = 'hello world';
i = 3.1415;
k[i] = j[i] - j[i-1];
FORMULA Grammar
Lexical conventions
When the input is broken into tokens, all white space is ignored including newlines. User comments can be placed between /* ... */.The input from the user consists of identifiers, numbers, strings and operators.
Identifiers must start with a letter or the underscore character (_). Subsequent characters can be letters, numbers, or (_). Upper and lower case is signifigant in identifier names.
Number are represented in decimal, unless preceeded by a leading 0, in which case the number is taken to be in octal. A decimal (``.'') appearing in a number creates a floating point literal.
12 /* decimal 12 */ 012 /* octal 12 */ 12.5 /* float */
Strings are created by enclosing the string in double or single quotes:
"This is a string\n" 'This is a string\n'
Lists
A list is a sequence of basic data types. Here is an example of creating a list:
[ 0, 1, 2, 3 ] [ "dog", "cat", "horse", "human" ] [ 999.25, 400.0, 1.2421, -999.25 ]
Lists are implemented as single dimension arrays of a simple data type. Lists can be passed to functions or, assigned to variables and then treated like an array. See the examples at the end of this manual for examples of using lists.
Data types
Inside the FORMULA environment, variables can be one of the following basic types:
- Integer.
- Float.
- String.
Variables can also be single--dimension arrays of the basic types. Variables can be created from within a FORMULA program, or can be bound to variables in the application. These two types of variables will be referred to as: internal and application.
Here are some rules on how variables and data types can be used:
- Internal variables are declared by first using them in an expression. (You do not have to explicitly declare variables).
- Internal variables start out as integers with a value of 0.
- Internal variable can change from one basic type to another without restriction.
- Internal variables can change from a basic type to an array. (The exception is parameters to functions).
- Arrays cannot change from an array type to a basic type.
- Internal arrays can grow in size as needed.
- Application variables cannot change type or change size.
- The only operations supported on an array is indexing the array, assigning the array, or passing the array to a function.
- The elements of an array are taken/assigned by using the [ ]. operator. The first element of an array is indexed using 0.
- The characters within a string may be taken using the [ ]. Character positions start at 0.
- The + operator can be used on strings. The result is a concatenation of the two operands, which is itself a string. If the other operand is not a string, it will be converted to a string.
Operators
The FORMULA environment contains the large collection of operators for acting on numbers, floats, and strings. There are no operators for acting on arrays. All of the standard C operators are supported. Boolean operators \&\&, $\vert \vert$, and ! can also be represented by the keywords: and, or and not.The relational operators ( < <= > >= == !=) have been overloaded to do string comparisions.
The assignment operator ( = ) will perform assignment on all data types including arrays.
Binary operators that accept numbers, will handle a mixture of floats and integers. When both operands are integer the result will usually remain integer.
The FORMULA operators are: \medskip \begintable Operators | description | Associativity \crthick {\tt * / \% } | Muliplication, division, modulos | left to right \cr {\tt + -} | Addition, Subtraction | left to right \cr {\tt >> <<} | Shift right, Shift left | left to right \cr {\tt < <= > >=} | Comparision operators | left to right \cr {\tt == != } | Equals, Not Equals | left to right \cr {\tt \& } | Bitwise AND | left to right \cr {\tt \^{ } } | Bitwise XOR | left to right \cr {\tt $\mid$} | Bitwise OR | left to right \cr {\tt \&\& and } | Logical AND | left to right \cr {\tt $\mid\mid$ or }| Logical OR | left to right \cr {\tt ?: } | Conditional Operator | left to right \cr {\tt = += -= ...}| Assignment operators | right to left \endtable \centerline Precedence and Associativity of Binary Operators
\bigskip \begintable Operators | description \crthick {\tt ! not} | Logical NOT \cr {\tt \~{ }} | Bitwise NOT \cr {\tt -} | Negation \cr {\tt ++} | Increment \cr {\tt - -} | Decrement \endtable \centerline Unary Operators
\bigskip \begintable Operators | description \crthick {\tt \#} | unused \cr {\tt @} | unused \cr {\tt \$} | unused \endtable \centerline Definable operators
All of the above operators can be redefined to suite the purpose of the programmer (For example, the operator ``\^{ }'' could be redefined to raise one number to the power of another number). It is also possible to define operators using identifier names.
Language Constructs
When defining a function most of the standard flow control statements are supported. These statements are:
expression ;
while( expression ) statement
do statement while( expression );
if( expression ) statment [ else statement ]
for( expression ; expression ; expression ) statement
return expression;
break;
continue;
{ [ statement ] ... }
;
The format for defining a function is as follows:
foo(a, b, c, ...)
{
statement ...
}
The parameters refer to the arguments passed when making the function call. Arguments whose type are basic will be passed by value. Arrays are passed by reference. The parameters are stored on the stack, and therefore can be recursive.
Program input
Input parsed from the user has the following form:[function-definition ... ] [ expr ; expr ; ... ; expr ]
In other words, input consists of zero or more function definitions followed by zero or more semi-colon separated expressions. Typical input will consist of a single expr from the user.
The value of the user input is the value of the last expr in the expression list. An empty expression has a value of 0.
Built--in functions
The FORMULA environment provides a small set of commonly used math and string functions (which can be redefined).
- sin()
- cos()
- tan()
- asin()
- acos()
- atan()
- ln(). Natural log.
- log(). Base--10 log.
- sqrt()
- pow(m,n). Raise m to the power of n.
- abs(). Absolute value.
- rand(). Return a random value between 0 and 1.
- srand(). Seed the random number generator.
- length(). Returns length of a string, or number of elements in an array.
- substr(s,n,m). Returns a sub--string of the string s starting at position n and extending m characters.
Dictionaries
Dictionaries are ways of controlling how symbols are found. The dictionary stack defines a search order for how symbols are located in the symbol table. Every symbol that is created is added to the dictionary on the top of the stack. When looking up a symbol, the search proceeds from the topmost dictionary and works down.\medskip \vbox{ \psfig{file=formula_dict.eps,height=3.5in} \medskip \centerline Figure \eqdef{formula_dict: Dictionary Stacks}} \smallskip
In figure \eqref{formula_dict}, there are three stacks A, B, and C. Each stack consists of pointers to the actual dictionaries. Suppose the FORMULA parser was looking for the symbol foo. In stack A, the symbol would be found in the dictionary log. In stack B, the symbol would be found in the dictionary user. When looking up the symbol sqrt using stack A, it would be found in dictionary log. But using stack B, it would be found in dictionary sys.
There are library calls for manipulating the dictionary stack, and creating new dictionaries. There always exists one dictionary. That dictionary is called the system dictionary ( sys).
The purpose of dictionaries is too allow the FORMULA library to be used in multiple contexts by the same application. For example, in one use, the application may add several built--in functions for operating on well logs. Another part of the program might have a functions for operating on rasters. With dictionaries these two different uses of the FORMULA library can be kept separate.
Name Binding
When the user refers to variables and functions in his expression, the binding occurs at compile time (not run--time), and therefore the state of the dictionary stack has no effect on the evaluation of a compiled expression. This is known as static binding.
Using the Library
To use the FORMULA library you must link with the -lformula option and include the file
The FORMULA API
The following is a detailed description of the routines that comprise the FORMULA library.
FORMULA_Init()
char *FORMULA_Init(void)
This function initializes the library. You must call this prior to any of the other functions.
RETURNS:
- NULL The FORMULA library has been successfully started.
- an error string If an error occurs this function will return an error string describing the error.
FORMULA_Done()
void FORMULA_Done(void);
This call de-allocates all memory associated with the FORMULA library. The library can be restarted by another call to FORMULA_Init().
FORMULA_Error()
FORMULA_ERROR *FORMULA_Error(void)
If one of the compile functions returns an error, the application can call this function to fetch an error structure. The error structure contains the exact place in the input where the error occured, plus a descriptive error message.
The FORMULA_ERROR structure is:
typedef struct {
int cpos; /* absolute character position in input */
int lineno; /* what line */
int lpos; /* what character on the line */
char error[512]; /* error message */
} FORMULA_ERROR;
When the error does not correspond to any particular input line, lineno is set to 0. In that case the only valid information from the structure is error.
FORMULA_CompileFile()
FORMULA *FORMULA_CompileFile(filename) char *filename;
This functions opens filename and parses it for an expression. It then compiles the input into a byte-code form. It returns a pointer to the byte-code structure.
If a parsing error, or compilation error occured, NULL will be returned and the caller will have to call FORMULA_Error() to obtain an error message.
RETURNS:
- A byte-code pointer. The expression file was compiled successfully.
- NULL. An error occured in parsing/compiling the expression.
FORMULA_Compile()
FORMULA *FORMULA_Compile(string) char *string;
This call is identical to FORMULA_CompileFile(), except that input is read directly from the string string. This call is handy when the expression has just been read in from the user.
RETURNS:
- A byte-code pointer. The expression string was compiled successfully.
- NULL. An error occured in parsing/compiling the expression.
FORMULA_Eval()
char *FORMULA_Eval(frm, type, size, addr) FORMULA *frm; int type, *size; void *addr;
This is the function that evaluates the compiled expression. Evaluation is accomplished by executing a mini byte-code program ( frm). After evaluation there is always a value to return. The value corresponds to the last expression evaluated (if there was more than one expression). The caller requests what type to be returned by using the type, size, and addr arguments.
type can be one of:
FormulaInt FormulaFloat FormulaDouble FormulaString FormulaArrayInt FormulaArrayFloat FormulaArrayDouble FormulaArrayString
type tells this function what type you want returned to you. Appropriate conversions will be done between numbers, floats and doubles. addr should be a pointer to enough memory to hold the requested type. In the case of type FormulaString, addr must point to enough storage to hold the returned string.
For arrays, size will be set to the number of element in the array. NOTE: Returing arrays is currently not implemented.
RETURNS:
- NULL. The formula was evaluated withou any run-time errors.
- A run time error occured, and the returned pointer describes the error.
Some possible reasons for failure might be:
- Using math operators with strings or arrays.
- An internal error of the system (ie. stack overflow, out of memory).
- Converting a variable incorrectly.
- Accessing an array using a negative index.
- An application added function returned an error.
- An application added variable was modified wrong.
FORMULA_EvalGetType()
char *FORMULA_EvalGetType(frm, ret) FORMULA *frm; FORMULA_ARG *ret;
This does the same thing as FORMULA_Eval() above, however instead of requesting the type for the result to be converted into the type is stored in the FORMULA_ARG structure, with the value. Arrays are not returned by this function. Strings must be free'd with free().
FORMULA_Free()
void FORMULA_Free(frm); FORMULA *frm;
This call is used to free a compiled byte--code program.
FORMULA_AddFunc()
char *FORMULA_AddFunc(name, ret_type, nargs, your_func) char *name; int ret_type, nargs; char *(*your_func)();
This function allows the application to add a new function to the FORMULA envrionment. name is the name of the new function to add. It is added to the current dictionary on top of the search stack.
ret_type is one of the following:
FormulaInt FormulaDouble FormulaString FormulaArrayInt FormulaArrayFloat FormulaArrayDouble FormulaArrayString
ret_type describes the return type of your function. This is currently not being used, and your function is free to return different types at different times.
nargs is the number of arguments your function accepts. To accept variable number of arguments, use FormulaVarArgs ( -1).
Finally you include a callback function which is the function you are adding to the FORMULA environemnt your_func.
your_func is called as follows:
typedef struct {
int type;
int size; /* How many elements, if array type */
union {
long i; /* FormulaInt */
float f; /* FormulaFloat */
double d; /* FormulaDouble */
char *s; /* FormulaString */
void *a; /* FormulaArrayXXXXX */
} u;
} FORMULA_ARG;
char *your_func(int nargs, FORMULA_ARG *args, FORMULA_ARG *ret)
{
}
nargs is the number of actual arguments passed to your function. This should equal what you specified when addeding the function. args is an array of parameters that have been passed to your function. ret is a pointer to a structure of the same format as the arguments. You will fill the ret variable in order to return a value from your function. You should not attempt to modify any of the passed arguments.
Your function should return NULL if everything is ok, otherwise you can return an error string, which will cause the evaluation of the function to fail.
RETURNS:
- NULL. The function was successfully added.
- An error string.
FORMULA_DelFunc()
void FORMULA_DelFunc(name) char *name;
This call removes the function name from the FORMULA environment. The function is looked up using the current dictionary stack. Do not attempt to evaluate any compile expressions that use the function that has been deleted.
FORMULA_AddVar()
char *FORMULA_AddVar(name, type, read_only, size, addr, callback) char *name; int type, read_only, size; void *addr; char *(*callback)();
This function allows the application to add new variables to the FORMULA environment. name is the name of the new variable to add. It is added into the current dictionary on the top of the search stack.
type is one of the following:
FormulaInt FormulaFloat FormulaDouble FormulaString FormulaArrayInt FormulaArrayFloat FormulaArrayDouble FormulaArrayString
type is the type of variable you are adding. This corresponds to the type of pointer you are passing in addr.
Notes on using strings:
If the variable is a string, then addr must be the address of a pointer to a string. The string should be allocated using malloc. When the user assigns a new string to this variable, the string will be free'd and a new string will allocated. If the variable is read--only, then it is not required that the string be allocated using malloc.If the variable is an array of strings ( FormulaArrayString), then the object should consits of an array of pointers to strings. Each string needing to be allocated using malloc, unless the variable is read--only.
Notes on using arrays:
If the variable is an array, then addr should be the address of the 0th element of the array. Application arrays will not be required to change size. If the variable is an array, then size should refer to how many elements are in the array. size is not used for non--array variables.read_only is a flag, if TRUE a run--time error will occur if a users expression attempts to modify the variable. Read--only strings do not have to be malloc'd.
callback is a function that gets called whenever the variable is about to be modified. This function will also be called when a read-only variable is accessed. The purpose of this function is to allow the application to keep tabs on any update that is to happen to the variable. callback is called a follows:
char *callback(index, newvalue, name, size, addr) int index; void *newvalue; char *name; int size; void *addr;
- index If the variable being modified is an array, this will be the element's index to be modified. For non-arrays this value is undefined.
- newvalue is a pointer to the value to be assigned to the variable. This pointer must be cast to the appropriate type of your variable.
- name is the same name as passed into the FORMULA_AddVar() function.
- size is the same size as passed into the FORMULA_AddVar() function.
- addr is the same addr as passed into the FORMULA_AddVar() function.
NOTE: name, size, and addr are passed to the callback function so that the same callback can be used to manage several variables.
If your callback function returns a string, this will cause a run--time error to occur, with your error message as the error. If NULL is returned this indicates that the variable update is allowed.
RETURNS:
- NULL. The variable was successfully added.
- An error string.
FORMULA_DelVar()
void FORMULA_DelVar(name) char *name;
This call removes the variable name from the FORMULA environment. The variable is looked up using the current dictionary stack. If the dictionary that contains this symbol is not on the search stack, then this variable will not get deleted.
FORMULA_AddBinaryOp()
char *FORMULA_AddBinaryOp(name, n, callback) char *name; int n; char *(*callback)();
This function allows the application to add a new binary operator to the FORMULA environment. name is the name of the new operator to add. It is added into the current dictionary on the top of the search stack. Name must follow the naming conventions for an identifier, or match one of the operators.
n is a number 0 through 9. Up to 10 binary operators can be added to the FORMULA environment. There are three precedent levels supported for the operators:
\medskip \begintable level | n | Same precedence as ... \crthick 1 | 0, 1 | Assignment operators \cr 2 | 2, 3, 4, 5 | {\tt == !=} \cr 3 | 6, 7, 8, 9 | {\tt * / \% } \endtable
callback is a function that gets called whenever the operator occurs in an expression. callback is called a follows:
char *callback(int n, args, result) int n; FORMULA_ARG *args; FORMULA_ARG *result;
- n is always 2.
- args[0] is operand on the left--hand side of the operator.
- args[1] is operand on the left--hand side of the operator.
- result is the result of the operator, to be set by the callback.
If your callback function returns a string, this will cause a run--time error to occur, with your error message as the error. If NULL is returned this indicates that the operation succeeded.
RETURNS:
- NULL. The operator was successfully added.
- An error string.
FORMULA_DelBinaryOp()
void FORMULA_DelBinaryOp(name) char *name;
This call removes the binary operator name from the FORMULA environment. The operator is looked up using the current dictionary stack. If the dictionary that contains this symbol is not on the search stack, then this operator will not get deleted.
FORMULA_AddUnaryOp()
char *FORMULA_AddUnnaryOp(name, n, callback) char *name; int n; char *(*callback)();
This function allows the application to add new operators to the FORMULA environment. name is the name of the new unary operator to add. It is added into the current dictionary on the top of the search stack. Name must follow the naming conventions of an identifier, or match one of the existing operators.
n is a number from 0 through 9. Up to 10 new unary operators can be added. The new operator will have the same precedence as the other unary operators.
callback is a function that gets called whenever the operator is used in an expression. This function will also be called with the operand. callback is called a follows:
char *callback(n, operand, result) int n; FORMULA_ARG *operand; FORMULA_ARG *result;
- n is always 1.
- operand is operand on which the unary operator will act.
- result this is the result of your operator.
If your callback function returns a string, this will cause a run--time error to occur, with your error message as the error. If NULL is returned this indicates that the operation succeeded.
RETURNS:
- NULL. The operator was successfully added.
- An error string.
FORMULA_DelUnaryOp()
void FORMULA_DelUnaryOp(name) char *name;
This call removes the unary operator name from the FORMULA environment. The operator is looked up using the current dictionary stack. If the dictionary that contains this symbol is not on the search stack, then this operator will not get deleted.
FORMULA_PushDict()
char *FORMULA_PushDict(dict_name); char *dict_name;
This call adds a dictionary to the top of the search stack. All symbols stored in this dictionary will be found first. (ie. symbols in this dictionary have the highest precendence).
RETURNS:
- NULL. The dictionary was put on the stack successfully.
- An error string. This describes the error that occured.
FORMULA_PopDict()
void FORMULA_PopDict(void);
This call removes the topmost dictionary from the search stack. This call will never remove the sys dictionary.
FORMULA_ResetDict()
void FORMULA_ResetDict(void);
This function resets the dictionary search stack. After a call to this function the dictionary stack contains just the sys dictionary. This call can be used to configure the stack to a known state.
FORMULA_RemoveDict()
void FORMULA_RemoveDict(dict); char *dict;
This function removes all symbols and compiled code that has been added into this dictionary.
Implementation Examples
This chapter presents some examples of using the FORMULA library.
Well log computations
In geological applications a common feature, is to allow the user to build new curves based on existing curves. FORMULA's provide a very powerful way for the user to compose new curves based on other curves.
The following code fragment gives an example or doing well log computations:
float gr;
float gamma;
float au;
float answer;
char *expr;
FORMULA *frm;
FORMULA_PushDict("curves");
FORMULA_AddVar("gr", FormulaFloat, TRUE, NULL, &gr, NULL);
FORMULA_AddVar("au", FormulaFloat, TRUE, NULL, &au, NULL);
FORMULA_AddVar("gamma", FormulaFloat, TRUE, NULL, &gamma, NULL);
expr = "(gr + 10.0)/5000.0 * au";
frm = FORMULA_Compile(expr);
for(i=0; i<NSAMPLES; i++) {
gr = gr_curve[i];
au = au_curve[i];
gamma = gamma_curve[i];
FORMULA_Eval(frm, FormulaFloat, NULL, &answer);
new_curve[i] = answer;
}
FORMULA_Free(frm);
FORMULA_PopDict();
In this example, the application first creates a dictionary ``curves''. Three float's are added to the FORMULA environment. Each float represents a variable that the user can use in his expression. Next, an expression is read from the user. In this case we have the hard coded expression ``(gr + 10)/5000 * au'' .
The expression is parsed and compiled using FORMULA_Compile(). Then for each digital sample of the curve, we set the FORMULA variables to be the current sample, and evaluate the users expression. The result of the expression forms a new sample which is stored in the new curve.
Well Query
Another important area where expressions are useful is during querying for data. Expressions allow the user to enter simple to complex criteria that gives him the power to get exactly the data he wants.Consider the following example queries the user could input:
kb > 1200.0 and owner == "ESSO" and
num_core > 2
kb > 1200 and has_curve("GR") and has_core() and
formation("VKNG") > 800 and not (formation("VKNG") < 300)
(owner like ["AMOCO", "ESSO", "HUNTER" ]) and not has_curve("GR")
owner like "%AMOCO%"
status in [ "ABD", "OIL", "SUSP", "GAS" ]
Notice the functions match, has_curve, has_core and formation. These functions would be added by the application for doing extended operations on the general well record. Variables such as kb and owner would also be added by the application and bound to the values of the general well record (probably as read--only). Operators such as like and in have been added to give the formula environment a more SQL--like behavior.
Here are the rough steps for performing a query on a database:
- Setup the FORMULA environment and add any variables, functions and operators that the user is allowed to access.
- Read and compile a boolean expression from the user.
- Foreach well record in the database do the followng:
- Bind variables from database to the FORMULA variables.
- Evaluate the formula using FORMULA_Eval().
- If result is TRUE, output record.
- Free the formula.
Application Customization
Instead of parsing a configuration file, why not use the FORMULA library to parse it, and keep track of the values? Consider the following configuration file that an application might want to read:
timeout = 10; retry = timeout * 2; author = "Ken Stauffer"; print_options = "-Papple -v"; hour_max = 3*24; disk_space[0] = "/pa"; disk_space[1] = "/pb"; disk_space[2] = "/home/v2y/pc";
The variables could even be bound to variables from the application, so that by parsing this file, the variables are automatically stored with the application.
Because the FORMULA environment allows the definition of complete functions, ``hooks'' could be added to your program. To allow the user to customize the application in complex ways allow him to write his own function. Consider the following contrived example of a hook,
gen_horizon(horz, process, count) {
if( process == "AMP" )
return sprintf("%s.AMP.%d", horz, process, count);
else
return sprintf("%s.%s", horz, process);
}
In this example, a function gen_horizon() has been written as a hook. The application uses this hook whenever it needs to generate a new horizon name. The application would be shipped with a default function, but the user would be able to replace it with his own.
