Tutorial: Extending Elixir with C using NIF

Interoperability of Elixir

A critical aspect of a programming language lies in its interoperability with other programming languages – being able to play nice with others. Whether it is to reuse legacy code or gain better performance with numerical computations, interoperating Elixir with C is a common practice []. The two most popular ways for doing that is either by working with NIF‘s or with ports, using Porcelain.

NIF’s originated from Erlang/OTP R13B03 []. NIF’s are Erlang/Elixir functions written in C, loaded dynamically as a shared library; whereas Ports are separate programs which run separately from the BEAM VM and communicates with the latter via STDIN/STDOUT. NIF’s tend to be simpler to write because they do not have to be concerned about encoding and decoding standard input and outputs, in certain scenarios, this advantage also makes them more efficient. However a segmentation fault in the C code implementing the NIF’s can crash the BEAM VM, making Ports a safer choice. [, ]

In this tutorial we will look at how to implement NIF’s for our C library of choice Libpostal (a C library that does parsing and normalization of global street addresses.). If you want to jump into the code right away, here is the link to the full project https://github.com/SweetIQ/expostal .

Creating NIF’s for Libpostal

We can start by creating a new Elixir project using mix.

1
mix new expostal

Currently, the recommended way of working with C NIF’s in Elixir is to create Makefiles which get invoked by mix compile.

Project Setup

To make mix compile compiles the C NIF’s, we can add the following module definition to mix.exs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
defmodule Mix.Tasks.Compile.Libpostal do
def run(_) do
if match? {:win32, _}, :os.type do
# libpostal does not support Windows unfortunately.
IO.warn("Windows is not supported.")
exit(1)
else
File.mkdir_p("priv")
{result, _error_code} = System.cmd("make", ["priv/parser.so"], stderr_to_stdout: true)
IO.binwrite result
{result, _error_code} = System.cmd("make", ["priv/expand.so"], stderr_to_stdout: true)
IO.binwrite result
end
:ok
end
end

Depending on the C library you want to interoperate with and the platform you develop and deploy on, you might need multiple Makefile’s, each targeting different operating systems. In our case, since Libpostal does not run on Windows, we print a warning and exit the program.

Makefile

Next, we can create our Makefile which compiles the NIF’s defined in src/parser.c and src/expand.c into priv/parser.so and priv/expand.so respectively. With normal C libraries, we most likely only need to put everything inside a single dynamic library (i.e. just priv/your_library.so). But in our case, since Libpostal’s expand function requires loading machine learning model that is not required by the parser function, it is best to keep them separate.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
MIX = mix
CFLAGS += -g -O3 -ansi -pedantic -Wall -Wextra -Wno-unused-parameter

ERLANG_PATH = $(shell erl -eval 'io:format("~s", [lists:concat([code:root_dir(), "/erts-", erlang:system_info(version), "/include"])])' -s init stop -noshell)
CFLAGS += -I$(ERLANG_PATH)

# adjust these as your library desires
CFLAGS += -I/usr/local/include -I/usr/include -L/usr/local/lib -L/usr/lib
CFLAGS += -lpostal
CFLAGS += -std=gnu99 -Wno-unused-function

ifeq ($(wildcard deps/libpostal),)
LIBPOSTAL_PATH = ../libpostal
else
LIBPOSTAL_PATH = deps/libpostal
endif

ifneq ($(OS),Windows_NT)
CFLAGS += -fPIC

ifeq ($(shell uname),Darwin)
LDFLAGS += -dynamiclib -undefined dynamic_lookup
endif
endif

.PHONY: all libpostal clean

all: libpostal

libpostal:
$(MIX) compile

priv/parser.so: src/parser.c
$(CC) $(CFLAGS) -shared $(LDFLAGS) -o $@ src/parser.c

priv/expand.so: src/expand.c
$(CC) $(CFLAGS) -shared $(LDFLAGS) -o $@ src/expand.c

clean:
$(MIX) clean
$(RM) priv/*

If the C library you are working with is not installed system-wide (i.e. under /usr/local or /usr), or if you’d like to embedded the C library within your project, check out how hoedown project embeds its C dependency.

Deciding whether to embed the C library or require a system-wide installation is a controversial design decision. [Ω] From the developer of the Node.JS binding for Libpostal:

Usually when dynamically linking to a native library, it’s either assumed that the library is installed separately, or that the dependency is included with the binding. Let’s call these the “lean repo” and the “fat repo” approaches respectively. node-postal is an example of a lean repo, whereas a fat repo would be something like node-snappy.

libpostal is a bit trickier than a library like Snappy because it’s not just software - there are also data/model files which need to be downloaded from the web…

I felt that the same argument can be applied to this Elixir binding.

Implementing NIF’s

We’ve finished setting up the build process, it’s time that we actually implement those Native Implemented Functions. For Libpostal parser, our goal is to create an Elixir/Erlang function that calls libpostal_parse_address from the Libpostal C library. The signature of libpostal_parse_address is as of the following:

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct libpostal_address_parser_response {
size_t num_components;
char **components;
char **labels;
} libpostal_address_parser_response_t;

typedef struct libpostal_address_parser_options {
char *language;
char *country;
} libpostal_address_parser_options_t;

libpostal_address_parser_response_t *libpostal_parse_address(char *address, libpostal_address_parser_options_t options);

When passed in an address, libpostal_parse_address returns the address components as a libpostal_address_parser_response_t structure. For example, when passed in 845 Sherbrooke St W, Montreal, QC H3A 0G4 as address and together with default options, the function returns:

1
2
3
num_components: 5,
components: ["845", "Sherbrooke St W", "Montreal", "QC", "H3A 0G4"]
labels: ["house_number", "road", "city", "state", "postalcode"]

This is not a very Elixir-esque way of returning values. In Elixir, we can elegantly use a Map type to represent the label-component key-values. We will see how we can do that later.

Load and unloading

In order for the BEAM VM to interact with C functions, we need to register them with the VM. The src/parser.c file starts with the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <libpostal/libpostal.h>
#include <erl_nif.h>

static ERL_NIF_TERM
parse_address(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]) {}

static ErlNifFunc funcs[] = {
{"parse_address", 1, parse_address}};

static int
load(ErlNifEnv *env, void **priv, ERL_NIF_TERM info) {}

static int
reload(ErlNifEnv *env, void **priv, ERL_NIF_TERM info) {}

static int
upgrade(ErlNifEnv *env, void **priv, void **old_priv, ERL_NIF_TERM info) {}

static void
unload(ErlNifEnv *env, void *priv) {}

ERL_NIF_INIT(Elixir.Expostal.Parser, funcs, &load, &reload, &upgrade, &unload)

A dynamic library implementing NIF’s needs to registers itself via the ERL_NIF_INIT macro, providing its namespace, functions to expose and series of function (load, reload upgrade, unload) that defines the life cycle of the NIF library. [δ]

We are particularity interested by the load and unload function. When the parser NIF library loads, we need to initialize Libpostal to load a machine learning model shared by process-local threads. We do that by calling libpostal_setup and libpostal_setup_parser functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
static int
load(ErlNifEnv *env, void **priv, ERL_NIF_TERM info)
{
if (!libpostal_setup())
{
fprintf(stderr, "Error loading libpostal");
return 1;
}
if (!libpostal_setup_parser())
{
fprintf(stderr, "Error loading libpostal parser");
return 1;
}

return 0;
}

Similarity, we want to make sure to properly free up resource when the Erlang VM decides to unload the module.

1
2
3
4
5
6
static void
unload(ErlNifEnv *env, void *priv)
{
libpostal_teardown();
libpostal_teardown_parser();
}

The reload and upgrade functions are implemented as the following:

1
2
3
4
5
6
7
8
9
10
11
static int
reload(ErlNifEnv *env, void **priv, ERL_NIF_TERM info)
{
return 0;
}

static int
upgrade(ErlNifEnv *env, void **priv, void **old_priv, ERL_NIF_TERM info)
{
return load(env, priv, info);
}

Implementing parse_address function as NIF

Next, we can finally implement the parse_address function, if you remember seeing previously, libpostal_parse_address takes as input an address string and emit a custom struct that defines the components and labels. When a user calls parse_address in Elixir, we need to call libpostal_parse_address under the hood. Except in this case, when parse_address is called, the input is not a C char*, but an Elixir string. We need to cast this Elixir string into a C char pointer and then pass it into libpostal_parse_address. The output of libpostal_parse_address is a C struct, but we want to output it as a Elixir/Erlang Map, so that the user can enjoy the elegancy of a modern programming language.

Enough said, here’s a carefully commented implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
parse_address(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])
{
// we can initialize libpostal_parser_opttions with its default
libpostal_address_parser_options_t options = libpostal_get_address_parser_default_options();

// we make an empty erlang/elixir map object
ERL_NIF_TERM components = enif_make_new_map(env);

// this is a placeholder for the address we want to read from erlang caller
ErlNifBinary address_bin;

// we read the elixir string (erlang binary)
// |---- argv[0] means the first argument passed in
if (!enif_inspect_iolist_as_binary(env, argv[0], &address_bin))
{
// we blame the user if address isn't a string
return enif_make_badarg(env);
}

// we make a local copy of the address
char *address = strndup((char*) address_bin.data, address_bin.size);

// ask libpostal to parse it
libpostal_address_parser_response_t *response = libpostal_parse_address(address, options);

const char *component, *label;

// here we are casting the response into a erlang/elixir Map by iterating over the response components
size_t i;
for (i = 0; i < response->num_components; i++)
{
component = response->components[i];
label = response->labels[i];

ERL_NIF_TERM component_term;

// convert the C char array string into a elixir string (erlang binary)
unsigned char *component_term_bin = enif_make_new_binary(env, strlen(component), &component_term);
strncpy(component_term_bin, component, strlen(component));

// insert it into the map along with its label converted as atom
enif_make_map_put(env, components,
enif_make_atom(env, label),
component_term,
&components);
}

// remember to do garbage cleaning when working with no-gc languages
enif_release_binary(&address_bin);
libpostal_address_parser_response_destroy(response);
free(address);
return components;
}

Implementing parse_address function in Elixir

Now we have the NIF implemented, we need to create its Elixir counter part. The Elixir module needs to load the NIF as it initializes, and define the signature of the NIF function. As shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
defmodule Expostal.Parser do
@moduledoc """
Address parsing module for Openvenue's Libpostal, which does parses addresses.
"""

@on_load { :init, 0 }

app = Mix.Project.config[:app]

# loading the NIF
def init do
path = :filename.join(:code.priv_dir(unquote(app)), 'parser')
:ok = :erlang.load_nif(path, 0)
end

@doc """
Parse given address into a map of address components
## Examples
iex> Expostal.Parser.parse_address("845 Sherbrooke St W, Montreal, QC H3A 0G4")
%{city: "montreal", house_number: "845",
road: "sherbrooke st w", state: "qc",
postalcode: "h3a 0g4"}
"""
@spec parse_address(address :: String.t) :: String.t
def parse_address(address)
def parse_address(_) do
# if the NIF can't be loaded, this function is called instead.
exit(:nif_library_not_loaded)
end

end

And there we go, libpostal’s parse_address function can now be invoked inside Elixir:

1
2
3
4
iex> Expostal.Parser.parse_address("845 Sherbrooke St W, Montreal, QC H3A 0G4")
%{city: "montreal", house_number: "845",
road: "sherbrooke st w", state: "qc",
postalcode: "h3a 0g4"}

Summary

This tutorial covered how we can create Elixir NIF’s from scratch using Expostal (an Elixir binding for Libpostal) as an example. NIF serves as a bridge between C code and Elixir code, allowing you to call C functions inside Elixir. The steps to create an Elixir NIF is as the following:

  1. Create a new project and setup Mix compile task.
  2. Create Makefile (or multiple of them, if supporting multiple OS is required)
  3. Implement the NIF’s in C
  4. Implement the Elixir module counterpart (init and function definitions)

The entire experience is not that much different from implementing a binding for Python or for Node.JS. But as Elixir/Erlang is a language that supports concurrent programming by design, one must pay more attention to the thread-safety aspects of the implementation when implementing NIF’s. This is a challenge that Node.JS binding implementors do not have to worry, because of its single-threaded design.

If you wish to download the full source code, it is available on Github: https://github.com/SweetIQ/expostal . And star the project while you are at it!