Skip to content

Lisp globals#

Functions#

This remacs documentation compares the C implementation for the atan function with the ported Rust version. Since emacs-ng isn't about porting the C code base, this example is only intended to show the differences. However these features can be used by functions that are part of new features.

The first thing to look at is atan. It takes an optional second argument, which makes it interesting. The complicated mathematical bits, on the other hand, are handled by the standard library. This allows us to focus on the porting process without getting distracted by the math.

The Lisp values we are given as arguments are tagged pointers; in this case they are pointers to doubles. The code has to check the tag and follow the pointer to retrieve the real values. Note that this code invokes a C macro (called DEFUN) that reduces some of the boilerplate. The macro declares a static variable called Satan that holds the metadata the Lisp compiler will need in order to successfully call this function, such as the docstring and the pointer to the Fatan function, which is what the C implementation is named:

DEFUN ("atan", Fatan, Satan, 1, 2, 0,
       doc: /* Return the inverse tangent of the arguments.
If only one argument Y is given, return the inverse tangent of Y.
If two arguments Y and X are given, return the inverse tangent of Y
divided by X, i.e. the angle in radians between the vector (X, Y)
and the x-axis.  */)
  (Lisp_Object y, Lisp_Object x)
{
  double d = extract_float (y);
  if (NILP (x))
    d = atan (d);
  else
    {
      double d2 = extract_float (x);
      d = atan2 (d, d2);
    }
  return make_float (d);
}

extract_float checks the tag (signaling an "invalid argument" error if it's not the tag for a double), and returns the actual value. NILP checks to see if the tag indicates that this is a null value, indicating that the user didn't supply a second argument at all.

Next take a look at the current Rust implementation. It must also take an optional argument, and it also invokes a (Rust) macro to reduce the boilerplate of declaring the static data for the function. However, it also takes care of all of the type conversions and checks that we need to do in order to handle the arguments and return value:

/// Return the inverse tangent of the arguments.
/// If only one argument Y is given, return the inverse tangent of Y.
/// If two arguments Y and X are given, return the inverse tangent of Y
/// divided by X, i.e. the angle in radians between the vector (X, Y)
/// and the x-axis
#[lisp_fn(min = "1")]
pub fn atan(y: EmacsDouble, x: Option<EmacsDouble>) -> EmacsDouble {
    match x {
        None => y.atan(),
        Some(x) => y.atan2(x)
    }
}

You can see that we don't have to check to see if our arguments are of the correct type, the code generated by the lisp_fn macro does this for us. We also asked for the second argument to be an Option<EmacsDouble>. This is the Rust type for a value which is either a valid double or isn't specified at all. We use a match statement to handle both cases.

This code is so much better that it's hard to believe just how simple the implementation of the macro is. It just calls .into() on the arguments and the return value; the compiler does the rest when it dispatches this method call to the correct implementation.

Attributes#

This macro creates the necessary FFI functions and symbols automatically. It handles normal functions and functions that take an arbitrary number of arguments (functions with MANY as the maximum number of arguments on the C side)

It is used like this:

/// Return the same argument
#[lisp_fn(name = "same", c_name = "same", min = "1"]
fn same(obj: LispObject) -> LispObject {
    obj
}

Here the name argument specifies the symbol name that is going to be use in Emacs Lisp, c_name specifies the name for the Fsame and Ssame statics used in C, and min specifies the minimum number of arguments that can be passed to this function, the maximum number of arguments is calculated automatically from the function signature.

All three of these arguments are optional, and have sane defaults. Default for name is the Rust function name with _ replaced by -. Default for c_name is the Rust function name. Default for min is the number of Rust arguments, giving a function without optional arguments.

In this example the attribute generates the Fsame function that is going to be called in C, and the Ssame structure that holds the function information. You still need to register the function with defsubr to make it visible in Emacs Lisp. To make a function visible to C you need to export it in the crate root (lib.rs) as follows:

use somemodule::Fsome;

Functions with a dynamic number of arguments (MANY)#

This attribute handles too the definition of functions that take an arbitrary number of arguments, these functions can take an arbitrary number of arguments, but you still can specify a min number of arguments.

They are created as follows:

/// Returns the first argument.
#[lisp_fn(min = "1")]
fn first(args: &mut [LispObject]) -> LispObject {
    args[0]
}

Variables#

At the end of the C file where the DEFUN is defined there is a called syms_of.... In this file the C code calls defsubr to setup the link between the C code and the Lisp engine. When porting a DEFUN from C, the defsubr call needs to be removed as well. For instance, if syntax-table-p is being ported then find the line like defsubr (&Ssyntax_table_p); and remove it. The all Rust functions declared with lisp_fn have a defsubr line generated for them by the build so there is nothing to do on the Rust side. DEFSYM

In C, the DEFSYM macro is used to create an entry in the Lisp symbol table. These are analogous to global variables in the C/Rust code. Like defsubr you will most often see these in the syms_of... functions. When porting DEFUNs check to see if there is a matching DEFSYM as well. If there is remove it from the C and below the ported Rust code add a line like this: def_lisp_sym!(Qsyntax_table_p, "syntax-table-p");. Lisp Variables

You may also be aware that the C code must quickly and frequently access the current value of a large number of Lisp variables. To make this possible, the C code stores these values in global variables. Yes, lots of global variables. In fact, these aren't just file globals accessible to only one translation unit, these are static variables that are accessible across the whole program. We've started porting these to Rust now as well.

DEFVAR_LISP ("post-self-insert-hook", Vpost_self_insert_hook,
    doc: /* Hook run at the end of `self-insert-command'.
    This is run after inserting the character.  */);
      Vpost_self_insert_hook = Qnil;

Like DEFUN, DEFVAR_LISP takes both a Lisp name and the C name. The C name becomes the name of the global variable, while the Lisp name is what gets used in Lisp source code. Setting the default value of this variable happens in a separate statement, which is fine.

/// Hook run at the end of `self-insert-command'.
/// This is run after inserting the character.
defvar_lisp!(Vpost_self_insert_hook, "post-self-insert-hook", Qnil);

The Rust version must still take both names (this could be simplified if we wrote this macro using a procedural macro), but it also takes a default value. As before, the docstring becomes a comment which all other Rust tooling will recognize.

You might be interested in how this is implemented as well:

#define DEFVAR_LISP(lname, vname, doc)      \
  do {                      \
    static struct Lisp_Objfwd o_fwd;        \
    defvar_lisp (&o_fwd, lname, &globals.f_ ## vname);      \
  } while (false)

The C macro is not very complicated, but there are two somewhat subtle points. First, it creates an (uninitialized) static variable called o_fwd, of type Lisp_Objfwd. This holds the variable's value, which is a Lisp_Object. It then calls the defvar_lisp function to initialize the fields of this struct, and also to register the variable in the Lisp runtime's global environment, making it accessible to Lisp code.

The first subtle point is that every invocation of this marco uses the same variable name, o_fwd. If you call this macro more than once inside the same scope, then they would all be the exact same static variable. Instead the macro body is wrapped inside a do while false loop so that each one has a separate little scope to live in.

The other subtlty is that the Lisp_Objfwd struct actually only has a pointer to the value. We still have to allocate some storage for that value somewhere. We take the address of a field on something called globals here. That's the real storage location. This globals object is just a big global struct that holds all the global variables. One day when Emacs is really multi-threaded, there can be one of these per thread and a lot of the rest of the code will just work.

#[macro_export]
macro_rules! defvar_lisp {
    ($field_name:ident, $lisp_name:expr, $value:expr) => {{
        #[allow(unused_unsafe)]
        unsafe {
            use $crate::bindings::Lisp_Objfwd;

            static mut o_fwd: Lisp_Objfwd = Lisp_Objfwd {
                type_: $crate::bindings::Lisp_Fwd_Type::Lisp_Fwd_Obj,
                objvar: unsafe { &$crate::bindings::globals.$field_name as *const _ as *mut _ },
            };

            $crate::bindings::defvar_lisp(
                &o_fwd,
                concat!($lisp_name, "\0").as_ptr() as *const libc::c_char,
            );
            $crate::bindings::globals.$field_name = $value;
        }
    }};
}

The Rust version of this macro is rather longer. Primarily this is because it takes a lot more typing to get a proper uninitialized value in a Rust program. Some would argue that all of this typing is a bad thing, but this is very much an unsafe operation. We're basically promising very precisely that we know this value is uninitialized, and that it will be completely and correctly initialized by the end of this unsafe block.

We then call the same defvar_lisp function with the same values, so that the Lisp_Objfwd struct gets initialized and registered in exactly the same way as in the C code. We do have take care to ensure that the Lisp name of the variable is a null-terminated string though.


Last update: May 2, 2024