# Concurrence Topology': A Tool for Describing High-Order Statistical Dependence

• Jun 9, 2014 · 6:15 PM

A number of people have asked for a talk on Topological Data Analysis so here it is, courtesy of Steve Ellis.

Data analytic methods possessing the following three features are desirable:
(1) The method describes "high-order dependence" among variables. (2) It does so with few preconceptions. And (3) it can handle at least dozens, maybe hundreds of variables.

However, if approached in a naive fashion, data analysis having these three features triggers a "combinatorial explosion": The output from the analysis can include thousands, maybe millions of numbers. Few methods exist possessing all three features yet which avoid the combinatorial explosion. Ellis has devised a data analytic method he calls "Concurrence Topology (CT)" which does so.

CT takes an apparently radically new approach to solving this problem. It starts by translating data into a "filtration", a series of "shapes". The shapes in the series are called "frames." A filtration is like a building. The frames are like floors of the building. But while the floors of a building are two-dimensional, the frames of a filtration can have dimension much higher than two.

A filtration can have holes that are like elevator shafts in a building. Such holes indicate relatively weak or negative association among the variables. CT uses computational algebraic topology to describe the pattern of holes. Normally, there are no more than a few dozen holes, so CT avoids the combinatorial explosion. Often one can identify small groups of variables that are closely associated with a given hole. This process facilitates interpretation of the hole.

A limitation of CT is that, so far, it only works with binary data. But quantitative data can always be binarized.

Steve Ellis wrote software in R (available upon request) implementing Concurrence Topology. A paper, written by Arno Klein and Ellis, introducing CT and demonstrating it on fMRI data has been accepted by a topology journal.

Pizza begins at 6:15, announcements, giveaways and the talk start at 7, followed by the local bar.

• ##### Bruce E.

Here's an economical construction that may illustrate what a hole is. The actual construction is more elaborate.

We need to work in an artificial space that we might call concurrence space. In that space, each observable event gets its own dimension. So if we have 3 events, we might call them X, Y, and Z. In the brain firing data, each represents a lighting up of a particular location in the brain. But in concurrence space, each is simply plotted at position 1 on its own axis.

Suppose that we have the following correlations:
({X, Y}, t1)
({Y, Z}, t2)
({Z, X}, t3)

If we ignore time, we see that these form the boundary of a triangle. But the body of the triangle would not be filled in in our model because we don't have a correlation ({X, Y, Z}, t) for any t.

The homology analysis would allow us to observe this hole. We could say X-Y, Y-Z, and Z-X seem to be independent concurrences because we know that they do not jointly occur together.

1 · June 15, 2014

• ##### Steven E.

The issue of whether the features represented by different holes are independent is an interesting one, which I hope to discuss in my talk. It seems that "usually" holes not independent, but I have developed a method to detect a signature of such independence when it does occur. (And apparently it sometimes does.)
Roughly speaking, a hole is a pattern in which a group of regions are not active at the same time (not even indirectly via concurrence with activity of other regions), but lots of smaller subgroups are active at the same time. (The "indirectly via concurrence with activity of other regions" part makes holes structures that involve all regions.)

June 4, 2014

• ##### Gustavo

Consider 3 variables x₁ , x₂ , x₃. Suppose there were no cooccurance relationship btw x₁ & x₂, btw x₂ & x₃ and btw x₃ & x₁. Cannot we then say there is no cooccurance btw x₁ , x₂, x₃ ?

June 13, 2014

• ##### Steven E.

Yes, a concurrence among any group of variables automatically implies a concurrence among any subset of the group. But note that a concurrence of three variables is represented by a solid triangle, not by a collection of lines. As I said in my talk, I think it is a mistake to try to view concurrence topology as a branch of graph theory.

June 13, 2014

• ##### Saran L. S.

It was fine but I have a simple cell phone. I would have appreciated the written portion as a guide.

June 12, 2014

• ##### Bruce E.

Thank you for the slide deck. Interesting that the technique shows structural differences in real data sets.

June 9, 2014

• ##### Ritesh B.

Binary bottle ( R code )

June 9, 2014

• ##### Ritesh B.

June 9, 2014

• ##### Richard Y.

I unfortunately can't make it. Will slides or video be available post talk.
Thank You
Richard

June 9, 2014

• ##### Richard Y.

I wish I was there. Thank you for looking into recording this.

June 9, 2014

• ##### Daniel C.

The slides are very thorough (see below) and I just posted the links for the github

June 9, 2014

June 9, 2014

• ##### Daniel C.

1 · June 9, 2014

• ##### Bruce E.

The technique introduced in Ayasdi's promo video seems to be to represent data relationships using a graph of nodes and edges. It's suggestive that they form nodes by looking at "groups" of data. Each node could represent, for example, (a) a group of values within a single variable, or (b) a pattern of correlation across variables. Then the graph that they build could be either (a) a refinement of the schema graph or (b) a new analysis graph showing relationships across the newly identified groups.

I do think that the brain-event correlation data is graph-like underneath. Each functionally valid composite event is a subgraph consisting of a number of localized firing foci (nodes) connected by a firing sequence (a set of edges). What I think needs to be teased apart is that multiple composite events may co-occur. So this is a factoring problem: to discover the meaningful subgraphs of a rich graph built from correlations of firing events.

June 9, 2014

• ##### Gustavo

Maybe not. By agglomerating connections you may be loosing the resolution of the message transmitted e.g. some subnets may be sending messages to other subnets but because of your low resolution you are averaging over messages that activate and those that deactivate the subnet it is "connected" to.

June 9, 2014

• ##### Tiffany L.

Family emergency, unable to attend. Enjoy all.

June 9, 2014

• ##### Michael W.

Is this akin to how Ayasdi ( http://www.ayasdi.com/) is approaching their analytics?

June 8, 2014

• ##### Bruce E.

I most likely can't make it ... but I thought of an analogy.

This problem seems somewhat similar to the genetic sequencing problem, in which there are lots of little sequences that have to be matched up and added together to figure out the larger sequences they are part of.

In your problem, it sounds like you have observations that appear to form agglomerations of events, and underneath there are little groupings of genuinely correlated events that need to be identified. I wonder whether the techniques used in genetic sequencing could be reversed to derive the significant subsequences.

Do the genuinely correlated areas light up simultaneously as far as the observation method is concerned?

June 8, 2014

• ##### Bruce E.

In pattern recognition terms, it seems that finding the holes would isolate significant features (one complex feature per hole) that are, comparatively speaking, mutually independent. For the brain scan data, each hole would represent a combination of brain areas, and the different holes would represent combinations of brain areas that are not active concurrently. Is that correct?

June 4, 2014

• ##### andy

make this a free event!?

June 3, 2014

• ##### Jared L.

Hi Andy, years ago we decided to have food before the Meetup to encourage socializing so the the \$5 is a nominal amount to cover those costs which can be quite significant.

June 3, 2014

## Your organizer's refund policy for Concurrence Topology': A Tool for Describing High-Order Statistical Dependence

Refunds are not offered for this Meetup.

### New York, NY

Founded Mar 12, 2009

#### Organizers:

Contact

20% off tickets to Strata Hadoop World

• ##### NY R Meetup

\$100 off registration using code nyhackr

• ##### Tibco Spotfire

Sponsorship for the New Years Party

• ##### Strata Conference

Pizza & 20% discount to Strata + Hadoop World NY with code UGNYHACKR20

#### People in this Meetup are also in:

• ##### NoSQL NYC

2,359 NoSQLers

• ##### The New York Python Meetup Group

8,077 Python Programmers and Users

• ##### The New York City Java Meetup Group

7,399 Java enthusiasts

• ##### NYC Predictive Analytics

4,334 Members

• ##### Spark-NYC

1,466 Sparkettes

• ##### NYC Data Science

5,644 Data Scientists