The project proposes an innovative distributed architecture combining machine-to-machine industry-mature protocols (i.e., MQTT and CoAP) to enhance the scalability of gateways for the efficient IoT-cloud integration. We have applied the approach in the practical experience of efficiently and effectively extending the implementation of the open-source gateway that is available in the industry-oriented Kura framework for IoT.

We extend Kura framework, originally suited only with MQTT, implementing a CoAP support, with Californium framework, that allows easily to add any type of object and expose it externally as a resource accessible with REST-call methods to improve the overall system scalability. Both protocols are open, lightweight and typically used in constrained environments, with high comparable performance.

We optimize node management via hierarchical tree organization:

Architecture integrating MQTT and CoAP.

MQTT is used for inter-node communications inside the hierarchical tree organization to retrieve resource information and synchronize the tree management procedures, while CoAP is used for intra-node interactions where we need direct and very responsive lightweight communications, with low reliability constraints.

Some MQTT advantages:

  • Publish/subscribe model allows reaching easily multiple nodes
  • Hierarchical topics allows to observe the whole hierarchy by using wildcards
  • Reliability, i.e. TCP communications and differentiated QoS levels
  • NAT support

MQTT drawbacks:

  • TCP protocol usage more suitable for resource-full nodes
  • Persistent node-broker TCP session, generating useless resource consumption

Some CoAP advantages:

  • High performance
  • Smaller packets
  • Minimal overhead
  • Low resource comsumption

CoAP drawbacks:

  • Lack of advanced quality support, i.e. UDP-based
  • NAT tunneling and port forwarding issues
  • It does not allow to aggregate IoT resources into a hierarchical organization.

Our IoT gateway dynamically joins a multi-layer hierarchy, structured into three levels, where each node specifies domain and group name used for identifying it. Level0 includes the root node and enables inter-domain communication, while level1 and level2 includes all the nodes belonging, respectively, to a specific domain or domain/sub-group.

Hierarchy levels
Hierarchical tree structure of our extended Kura gateways.


The internal node structure is composed of multiple bundle in order to guarantee a service with loosely-coupled components, taking advantages from the Californium internal classes to create the CoAP support. In the following we details the bundles implemented.

CoAPTreeHandler (CTH) bundle dynamically manages the hierarchy and dispatch node requests: it provides hierarchy metadata to all nodes, returns the references to create a tree-structure organized with domains/groups; manages multiple node replications; and handles children nodes in case of parent disconnections. Internally, CTH consists of CTHListener and CTHCollections. CTHListener exposes an interface to interact with CTH, triggered by request from nodes that require actions on the hierarchy. The CTHCollections class stores information about the overall hierarchy and consists of two collections optimized for lookup and node substitutions: a Map-extension data structure to store node paths and properties; a data structure to map parent-children associations by using Guava, and a library that simplifies collection updates, i.e., dynamic creation of one-to-many associations.

MQTT broker to exchange both resource and control messages to synchronize CoAP servers.

The Resource Directory (RD) is a data structure used to store endpoints and resource belonging to different domains, groups or subgroups and enables dynamic operations, e.g. register, maintain, lookup, remove endpoint and resource description, in order to: support dynamically nodes disconnections; discover every connected broker in order to send REST request even if the resource and the node are not directly connected. The RD is accessible with REST requests and all the result are returned with the CoRE Link Format.

The CoAP Server is a service dedicated to the management of all the properties of the registered resource. It can be coordinated with other servers to perform callback methods in relation to the received messages. CoAP servers manage the endpoints and allows to execute and receive requests to exchange remote resource.

Californium Server is a service that benefits from Californium internal classes to run a centralized CoAP server that can perform REST operations without MQTT-messages interrupts.

The Remote Query Resource (RQ) bundle supports the resource lookup of the CoAP servers inside the hierarchy, allowing to retrieve information from the CoAP servers in the other nodes, interfacing a CoAP server request with the MQTT broker. The RQ and CoAP servers are completely uncoupled in order to allow the server to call the RQ, in case of queries on remote resource, or directly use CoAP protocol on local resource.

The CoAP Resource is a service dedicated to the external exposition of the resource. It can perform server operations, adding, deleting or modifying its own attributes. The CoAP resource types used in the system are:

Example of ExtendedResource implementation:

public class TemperatureResource extends ExtendedResource{
	private static String resourceName = "temperature";
	private String resourceType = "temp";
	private transient CoAPServer cs;

	private transient Timer timer = new Timer();
		public TemperatureResource(){
		this.addAttribute("position", "bologna");
		this.addAttribute("p1", "4000");
	protected void activate(ComponentContext componentContext){
		this.updateDomainAndGroup(cs); // If we want to add domain and group attributes
		cs.addResource(this, ExtendedResource.class);
		timer.scheduleAtFixedRate(new ValueTimer(), 5000, 5000);
	protected void deactivate(ComponentContext componentContext){
		cs.removeResource(this, ExtendedResource.class);
	public void handleGET(CoapExchange exchange){
	protected void setCoAPServer(CoAPServer cs){ // associate a CoAP server
		this.cs = cs;
	protected void unsetCoAPServer(CoAPServer cs){ // unlink the CoAP server
		cs = null;
	class ValueTimer extends TimerTask{ ... }

DTLS Support

We provide the CoAP server implementation with the DTLS support to protect and limit the access on the resources, in particular on actuators that can modify the environment, allowing the execution of REST operations (e.g. POST, PUT, DELETE) only to authenticated users.

In particular, we use the existing subproject Scandium inside Californium adding the CoAPDTLSOptions and CoAPDTLSBase classes that include all the options and information about the DTLS Connector to be use jointly with the CoAP server.

We also add additional fields on the Kura WebUI to manage the usage of different keystore and truststore, modifing the related path and password.

MQTT Optimization

Object Serialization (OS). MQTT is content-less and only support byte array as payload content, thus, serialization is performed very often during normal system operations: each CoAP message is serialized and then used into the Kura DataService class for MQTT communication and de-serialized during responses. In this first optimization we consider a serialization based on Kryo framework. Kryo is an open-source object graph serialization framework for Java language that provides performance, efficiency and API easy to use. Kryo provides multiple serializations that outperforms normal Java serialization.

DataService (DS) and DataTransportService. The DataService class is the Kura component used to manage MQTT communication and offers several configuration options, delegating to the DataTransportService the implementation of the transport protocol to interact with the associated MQTT broker. When DataService receives a publish request, it stores the message into a DataStore and submits the message on the internal executor of DataTransportService. DataStore is a heavy storage structure that, in case of many messages published concurrently, may even cause unavailability of the device because of its high CPU consumption. Therefore, to reduce associated overhead, we have decided to exploit only the lower-level DataTransportService class, by removing the support to the features, irrelevant for our solution, of message priority management and message storage on behalf of temporarily disconnected devices. In addition, our DataTransportService performs event propagation by accessing methods and client listeners of the MQTT Paho implementation.

RD Optimization

RD Parsing (RP). In order to decrease regular expressions and to speed up the parsing operations, we create every CoRE link format with the resource path in the first position of the RD attributes list. We adopt Guava libraries that provide better mechanisms for strings management. Splitter is a specialized Guava class that contains string handling methods and allows to perform string operations easily and much faster with consequently benefit on RD functionalities and performance.

Requests Aggregation (RA). Since every node can send POST messages, it is likely that some nodes may send information about multiple resource. Thus, we optimize messages management, grouping multiple resource together into a single CoRE link format composition, instead of creating a single message for each resource request.

String Refactoring (SR). During normal operations, the system makes extensive usage of string utilities. Java Scanner class benefit from regex but they are relatively slow because they perform a high number of elaboration to recognize different patterns inside a string. We optimize string usage for the whole code, replacing the Scanner class with Guava Splitter class for string functionalities and using StringBuilder for string declaration, instead of normal Java String.

In our tests we have used growing numbers of sequential MQTT requests and CoAP POST ones, distributed over a realistic testbed environment consisting of 10 nodes organized in a 4-layer tree, 20 service bundles per node, 2 devices per service bundle, and 10 sensors per device:

Test Environment
Deployment scenario example.

MQTT Optimization

MQTT tests have been started sending 1000 requests between two RaspberryPi. Without optimization, the system stops a lot before the end of the execution with exceptions about too many messages stored and too many messages waiting the ACK. Thanks to our object serialization optimization, we have observed a significant decrease of the total time needed to serve all the requests (around 49%), but anyway this is still insufficient to manage such a large peak with no exception occurrences. By adding also our DataService optimization, we have experienced another significant performance improvement, as well as the ability not to enter in exception situations due to overload. After that preliminary experimentation, we have evaluated the behavior of our prototype while further increasing the number of MQTT exchanged messages. We report the total time to complete MQTT transmissions in relation to the optimizations introduced. Note that, due to their limited capabilities and the high number of operations to perform, sometimes RaspberryPi nodes are affected by connection lost errors with the MQTT broker; this has shown to be prevalently due to the inability of sending MQTT heartbeat messages in time to keep the associated connection alive.

MQTT performance
MQTT performance.

RD Optimization

By passing to some relevant results about RD (CoAP-centered) evaluation, we have started RD tests by sending 500 POST requests: already with this number of requests, the default Californium configuration has shown non-negligible performance issues; for instance, when working on top of a limited gateway such as the RaspberryPi one, this load peak may frequently generate errors. Therefore, we have applied our original RD optimizations, also in different partial subsets; we report the related performance results in terms of total time to complete the POST requests on RD, with the different possible subsets of optimizations introduced.

RD performance
RD performance.

Finally, we execute tests on RD with DTLS support. DTLS may slow down the system in case of many new nodes, because every time a new node sends a message must perform the DTLS handshake to create the DTLS session. We consider both normal iteration between Kura devices inside the hierarchy and, the worst case, about communications only from external independent clients. In Figure 4 we illustrate the results for 500 requests.

RD with(out) DTLS performance
RD with(out) DTLS performance.

Resource Consumption

Beyond communication performance, we analyze resource usage because other operations might be performed during communications and device must be responsive. We show the CPU and RAM usage on a RaspberryPi, after the load of 500 resource with 4 attributes each via POST REST calls.

Initial non-optimized CPU usage
Initial non-optimized CPU usage.
Initial non-optimized memory usage
Initial non-optimized memory usage.

CPU usage is above the 92% and memory usage reaches 80%, with a total of about 550 active threads at the same time. Once the RD contains a high amount of resource, a resource-limited device, like RaspberryPi, becomes not-responsive or completely unusable. Every time RD stores a resource, Californium associates a thread to the resource in order to monitoring its validity time and delete the resource once expires. With a high amount of requests, this mechanism is not suitable due to the increasing number of threads, proportional to the number of servers that execute a POST. Thus, we modify the Californium threads mechanism, using a single thread that refreshes the RD deleting expired resource at regular time intervals. The update time intervals may vary in relation to the responsiveness we want to achieve and the resource we dispose on the gateway: long time intervals allow to save resource but can request endpoints that are no longer available; short time intervals allows to have a device more responsive to endpoints changes but consume more resource. With this modification we can save lots of resource related to minor threads usage, considering that for our purpose the resource removal must not be strictly real-time and in the unlucky event of a request to an expired endpoint we just perform a useless CoAP request that is very lightweight. We show the final resource-related results, also considering the decrease to 35 active threads.

Final CPU usage
F inal CPU usage.
Final memory usage
Final memory usage.

    Alessandro Zanni   Ph.D. student in Computer Science

    Gianvito Morena   M.Sc. student in Computer Science

Our Mobile Middleware Research Group activities focus on the study and development of support solutions for a broad range of scenarios in the context of mobile networking and distributed services. ranging on a wide spectrum of subjects dealing with the creation of novel solutions supporting the development of distributed services that can be classified in the following categories: Cloud Computing, Mobile Computing, Context-based Systems, NGN & Future Internet Scenarios, Sustainable Infrastructures for Smart Environments

More information can be found here.

For any suggestions, comments or further details do not hesitate to contact me.

     alessandro.zanni3 AT

     Department of Computer Science Engineering (DISI), University of Bologna

     Via del Risorgimento 2, Bologna, Italy