This week is a double whammy of large consumer focused organizations looking at network dynamics! Very exciting times to have so many interesting approaches being published each week.
When you can get workstations with 1TB of RAM, dozens of cores, and multiple GPUs for most datasets do you really need a distributed machine learning solution?
From my personal experiences most people write low-throughput code and use parallelization as the wrong tool to speed things up. However once there is a decoupling between what is creating the data and what is consuming it, like a host that saves events to S3 and another that reads S3 and creates features, it does make sense to go parallel to improve networking throughput between the two nodes or cluster. The reasons for having the decoupling typically come from building high availability services which somewhat is in conflict with single node workstations for machine learning.
Twins Andrew and Adam Kelleher created a graph construction system called Pound which takes event data and creates a graph of nodes and edges which can be explored with network analysis tools. Of particular interest is understanding network diffusion of information cascades, visualized below by Adam Kelleher:
Quora is able to connect questions to experts who can answer them effectively. This explores a graph theoretic look at the Quora network dynamics by constructing a graph of users with an incremental algorithm to add new users and follow relationships. A sparsifying step is done to remove paths that take longer than some time threshold, and then longest path is computed. The author explored answer propagation dynamics through this metric finding some interesting results like that follower count for good answers significantly helps in the beginning of a post, as time goes on the low follow count users tend to converge.
An interesting discussion about how useful small data is; things like qualitative studies or human insight are extremely valuable when a computer does not easily have access to the data. The author presents several real world cases where small data is useful even with “big data” approaches.