Please disable any ad blocking software, it is known to cause issues when displaying the PDF. Alternatively, click here for the pitch deck.
Current graph analytics systems only allow you to analyse the graph as snapshots in time. To perform any variant of time analysis, the data must be re-loaded in each specific snapshot and re-analysed. For example, if you are analysing the spread of contagion in a graph, you would have to re-load the graph for every single point of change. This is an extremely time-consuming and costly process.
What if you could load and analyse the entire history temporally?
Given that the amount of data available is exponentially increasing, adding the historical element of time significantly increases the size of the dataset. This causes issues with existing systems.
What if there was a system that could deal with this in a scalable manner?
Data is evolving and changing all the time, with new points being added each second. In current systems, any slight changes or updates requires a full re-ingestion. Given the size and scale of temporal data, this cost is far to great when needing to perform real-time analytics.
What if there was a system that was able to update and change data dynamically?
Raphtory is a distributed system that takes any source of data (either previously stored, or a real time stream), and creates a dynamic temporal graph that is partitioned over multiple machines. In addition to maintaining this model, graph analysis functions can be defined that will be executed across the cluster nodes, and will have access to the full history of a graph. Raphtory is designed with extensibility in mind; new types of data, as well as new analysis algorithms can be added to the base project, in order to support additional use cases.
Raphtorys core components for modelling and ingestion consist of Spouts, Graph Routers and Graph Partition Managers. These can be seen on the left of the overview above. Spouts attach to a user specified data source external to Raphtory. Tuples are then pulled from this source and pushed into the system. These raw data tuples are received by the Graph Routers which convert each into graph updates via a user defined parsing function; adding, removing or updating vertices and edges. Updates are then forwarded to the Graph Partition Manager handling the affected entity. By decoupling these processes the same data may be modelled as many different graphs by connecting the same Spout to Routers with unique parsing functions or, alternatively, the same Router may be connected to various Spouts pulling from independent data sources to join them into one graph.
Graph Partition Managers, as their name suggests, handle all operations of the partition, ingesting graph updates, synchronising with peers and performing analysis. As updates arrive via the pool of Graph Routers the Manager will create entity objects as required and insert updates into the histories of affected entities at the correct chronological position. This removes the need for centralised synchronisation as updates and inter-manager messages may be executed in any given arrival order whilst still resulting in the same history. However, messages between routers and partition managers are additionally watermarked to track the most recent update time (the live graph) and to know when in the graphs history is synchronised and, therefore, safe to analyse.
2x Winner’s ConceptionX Show and Tell
Several paid consultancy contracts