NetworkXTimeseries: A Python Library for Network Timeseries Data Structures

Several of my projects over the last few months have leaned in the direction of longitudinal studies. The question every sociologist asks is “how did we get here?”, so it makes sense that one would like to explore how things have been changing before now. My conclusion is that if networks provide meaningful investigation into the types of questions we are trying to answer, then we need to understand how these networks change over time.

This Python library is useful because it allows one to easily transition from a traditional networkx  object to simple time series dataframe representations in Pandas and Numpy. I’ve already used it in several projects and I hope you can use it too!

networkxtimeseries Python Library GitHub

Challenges of Network Time Series

A network time series is simply a set of network data measured over time. Edges and vertices may appear, disappear, and reappear over time and the values associated with them may also change. A time series is most easily conceived as a table with columns for signal components and rows for measurements, but what if each measurement is a network? A traditional adjacency list representation occupies at least two dimensions and has no precise way to account for the non-existence of nodes and edges. In contrast, a network conceived of as sets of nodes and edges maintains no fluidity between time measurements, so would be difficult to work with using traditional time series methods.

This library attempts to ease the transitions between each way of thinking about about your network timeseries data. The time_measure method takes a user function to extract network attributes at the Graph, Node, or Edge level; it places results into a dataframe where index is the timeseries and columns are nodes or multiindexed edges. Measured properties can also be applied back to the network time series as node or edge attribtues using the setNodeAttrDF or setEdgeAttrDF methods. This makes it easy to do things like measure the eigenvector centrality of each node over time and set it as node attributes over the whole timeseries at once. Extracted graph properties can also be plotted directly or analyzed using timeseries or signal processing techniques for noise removal, statistical estimation, or machine learning.

Conceptual Functionality

Working with this networkxtimeseries object is fairly straightforward. First give the object a time series list or array and optionally a set of nodes. Each time series value will simply be an index into the network at a given point in time. You can add edges, edge attributes, and node attributes as needed by simply indexing into the nts object: nts[t]. Use traditional networkx library methods to access attributes of these networks, like nts[t].edges(), nts[t].nodes(), or nts[t].add_node(u, name=’hello’). The time_measure function takes a user function that will take an entire graph and return measurements at the graph, node, or edge level, depending on what is specified.

Graph measurements might be node/edge counts, network constraint, average shortest path, average degree, or network diameter and the time_measure will return a dataframe indexed by time series and with columns with as many attributes as measured. This is automatically determined by the keys of the dictionary returned by the user function. Node measurement tables will be similar but instead will contain multi-indexed columns: first by node name and then by attribute name (again returned by user function). Edge measurement tables will have columns by ‘from’ edge, ‘to’ edge, then attribute name.

In this way, construction of the networkx time series is done using the discrete math notation and functionality while higher-level processing can be done after measurement extraction in dataframes or signals using numpy/pandas. The setNodeAttrDF and setEdgeAttrDF methods then allow the processed dataframes to be applied back to the data structures themselves for further analysis.

Future Work

The next major development will occur in two areas: noise measurement and characterization, and dynamic network visualization.

Robust time dynamic network visualization is, in some sense, the key element to this library. Analysis is important, but if any arbitrary network can be thrown into this structure and visualize, we have something. In my opinion, that would be a requirement before anyone might have interest in the library. I’m currently working on copying Java code for xgmml files in the Cytoscape DynNetwork plugin. So far it *sort of* works as intended, but has some small issues here and there. If anyone has other ideas about how to do this, let me know: dcornell [at] umail.ucsb.edu.

Noise measurement and characterization is important because it may or may not be evident how consistent the timeseries values are. More than just measuring the noise of individual node or edge signals, these metrics need to capture the suitability of a network changing over time as a whole. This could be important to situations when time resolution is a hyper-parameter, or when trying to fit some signal within the noise. Networks are more complicated than vector signals because measurements are about the relationships between signals and how they fit into an  overall structural schema. Further research needs to go into this.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s