Visualizations and Updates

22 11 2011

Over the past month most of my work has been in doing the R visualization for Orgpedia. I made a visualization that lets you analyze various financial properties of the financial sectors in the US (Sector and Financials):


The visualization itself is through Google Motion Charts which is in Google’s Visualization API. It is an interactive multidimensional  graph of a dataset of sectors and the mean of various financial properties across the sector’s companies. The data shown above is represented is represented in millions USD. The Motion Chart allows for really neat temporal analysis of data in various forms. Clicking the play button shows the change in properties from 2008 to 2011. There are also three different styles you can view the data: bubbles(shown above), bar charts, line graphs. These can be switched in the top corner.

The dataset behind the visualization was created in R. I made a sparql query that would access Orgpedia’s datasets and pull out sector of the US and the companies and their stock tickers within the sectors. Then I took these companies and pulled in their income statements from Google Finance and went through each sector and averaged various properties from the sector’s companies’ financial statements. The data manipulation in R took some getting used to, but now its very easy for me to transform data frames, matrices, and other objects in R. After the dataset was created and cleaned for non-existent values its just defining properties of the Motion Chart and running it. It generates a html file with the graph and data represented in javascript. All the data processing and manipulation takes around 15 minutes mostly due to the large amount of data to be downloaded.

Over the course of this past weekend I also worked on another visualization for Orgpedia with Alex for a conference Eric and Xian were going to today. The visualization utilizes data from LittleSis.org and gathers data about board members of various compaines in the US and shows the members in a force graph that shows which board members are on multiple boards (Board Members Network):

The graph visualization is done using the D3 visualization toolkit’s Forced Graph. Each node represents a board member. The clustered colored nodes are a group of members on the same board. The multicolored nodes represent board members that are on multiple boards. Mousing over a node shows you their name and the companies they work for. Clicking a node takes you to their LittleSis.org page. The graph shows many interesting relationships between various companies and board members. Especially Steven S Reinemund who resides on 5 different boards.

Alex and I worked over Saturday aggregating all the information from little sis and creating a json file from the data. Initially LittleSis’s API was down for a couple of hours, but we just made up some fake data and played around with formatting it and structuring it into a usable json file in python. Eventually the api came back up and we got actual data to work with. It took a lot of tweaking in the python script in order to make sure all the data was made into correctly parsed json, there were a lot of edge cases in the names of people and companies. Eventually we got the script fully working and a valid, and LARGE, json file of all the board members of different companies and all the companies they worked for. Then it was time for me to work on the visualization, which took me the the rest of saturday night into sunday morning. The D3 framework is very elegant in the way it handles graphs and graphics. All it needs is a structured json file with nodes and links and defined graphical properties and it can create a physics based graph model. Although it does not scale very well due to it being in javascript so with the initial set of 2800 board members the graph just froze the browser. I designed some graph algorithms in python to get the connected components on this large member graph and remove the components that were only one board strong. This brought the count down to 2400 nodes, while a smaller set it was still too large for D3. Oddly enough most companies in the 314 sized company set we used were all part of one giant connected graph of board members. So then I decided to do some more analysis on the original companies and eventually just decided to only use companies that contained members on 4 or more companies. This did make lower the set of nodes but it created a very sparse set of around 200 nodes. Then I also randomly added companies that had board members on 2 or more companies up to a certain limit of the companies, this generated a graph of around 400 nodes which was a good size for the visualization. Then from John’s advice I scaled the size of each dependent on how many board’s the node’s member was part of, and also color coded the nodes in a pie chart fashion so that people on multiple boards had all the boards’ colors.

Due to a very heavy course load over the past month I could mostly only work on one of my projects at the TWC. So I haven’t progressed much as I wanted to on the side of the wineagent web interface. Although I did get object deletion working correctly and I also fixed the odd routing errors I had from before. There’s only about a month left in the semester and I don’t think I’ll be able to finish the project, but I will be working on the project over the winter break. The only thing left to do is finish up the editing of the wine objects and some small object validation states and then I have to create the form and model for dish objects, but it should just be mainly using the same structure as the wines so there shouldn’t be much trouble creating it.

Advertisements

Actions

Information

One response

29 11 2011
OrgPedia Board Members Network « Alexei's TWC Blog

[…] has a blog post up here detailing some of what he did on the visualization side. And be sure to check out the final […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: