TWC Semester 2 Reflection

14 12 2011

This was a very very busy and also fun semester. I worked on two main projects over the semester. One was with the Wine Agent team and Evan Patton continuing the work on what I started last semester with the web interface for data input for wines and food for restaurants and wineries. This semester I got a lot of work done on the backend of the system. The platform I’m using is Ruby On Rails and it uses an ActiveRecord database model to represent objects. I had to create my own object model using a triple store instead of a regular relational database. Evan helped put up a test Joseki triple store for me to test with. I created a SPARQL 1.1 updater module in ruby that can send INSERT and DELETE queries to a SPARQL endpoint through simple POST requests. Then I had to convert my ActiveRecord Wine model into a custom triple store model. Most of the time was spent on appropriately creating object members so that the Rails framework treats my Wine model as an ActiveRecord model while in reality it using http requests and SPARQL queries to pull and push all the data from a Joseki triple store. The cool part of the work was taking ruby object members and automatically converting them into parts of a SPARQL query based on some prior metadata of what they represented.

When I had that set and finished I was amazed at the simple fact that this new Wine model was no longer constrained by a regular database! Its entities are subgraphs that are on the triple store, this means even if an entity is created it can have extra metadata annotated onto it and all other entities in the same fashion by simple SPARQL queries. Also all the people who have their URIs attached to the wines are also on the semantic graph. This means that if you find a person you can immediately find all the wines they own all in the same graph space! There’s no need to traverse through multiple database tables, its all elegantly stored through semantic web technologies! After I got the upload to the triple store working correctly I went into getting editing and deletion working right. Deletion was easy due to already having code on to automatically generate a SPARQL query of a Wine entity, all I had to do was shift it to a different query structure based on DELETE from the SPARQL 1.1 Update specifications. The editing  was a little more troublesome due to Rails having odd routing issues when trying to access specific entities. I eventually just made each Wine auto generate a unique URI for access in the website based on the Wine’s own URI. Then editing was working right but not correctly due to me not putting enough validations on the data. Although it is the end of the semster I will still be working on this project over winter break and hopefully finish it. All that is left is the correct Wine validations and then mirroring the Wine model into a Dish model with its own forms. The Dish model should probably be easier due to looser restrictions on the Dish entity from its vocabulary in an RDF file. The Dish form might cause a little trouble due to having a tree like structuring heirarchy that represents a menu, but thats just more web design that I need to pick up. After the Wine and Dish models are set I just have to make overall site much prettier and the project should be finished!

The second project I worked on over the semester was doing visualizations with Orgpedia and LittleSis data for Xian Li and John Erickson. I made a very elaborate post on how I made the visualizations in my last blog post. It got featured along with Alexei’s post on PlanetRDF! A quick summary is that I made two visualizations. The first was a mashup of data from Orgpedia and Google Finance done using the Google Visualization Toolkit in R, which is a great mathematical programming language with great community and suites of libraries. I found the financial sectors of the US and their companies through a couple SPARQL queries and then pulled in all the financial information of the companies through a helpful R package called Quantmod. Then I put all the data together neatly into one big dataset and passed it along to the Motion Charts library which generated an html page with all the data in a very awesome visual form. The final visualization shows a motion chart of US sectors and you can change different financial properties to compare them by. The second visualization was made by me and Alexei Bulazel in which we took in and parsed data about board members of various companies using the LittleSis API. Then I took the data and turned it into a graph and did some pruning on the data and pushed it onto a D3 forced graph visualization. After a bunch of visual tweaks the final visualization showed a physics graph that shows a subset of company board members clustered together with board members on multiple boards having multiple colors on their nodes. Doing the visualizations were really fun and much easier than I thought. I also picked up R in my work and that is a tool I will definitely be using in the future for its ease of use and heavy community support.

This was a really fun and busy semester, I got a lot of work done here at the TWC and in work I was doing at the Cognitive Robotics Lab. I also picked up a lot of really neat tools and frameworks this semester that I will definitely use again on future projects. SPARQL is one of these tools and will be very helpful in the future when I need access to open datasets for data mining projects. I had a great experience working in the lab and received a lot of help from people working in the lab. The TWED talks were also pretty awesome to see all the branches of semantic web technologies. I’d recommend anyone reading this article to go check out the videos from TWED. I’m also very happy that today’s TWED talk deals with Semantic Web Agents, which was the initial reason I became interested in the research done at the TWC. What perfect way to end the semester!


Visualizations and Updates

22 11 2011

Over the past month most of my work has been in doing the R visualization for Orgpedia. I made a visualization that lets you analyze various financial properties of the financial sectors in the US (Sector and Financials):

The visualization itself is through Google Motion Charts which is in Google’s Visualization API. It is an interactive multidimensional  graph of a dataset of sectors and the mean of various financial properties across the sector’s companies. The data shown above is represented is represented in millions USD. The Motion Chart allows for really neat temporal analysis of data in various forms. Clicking the play button shows the change in properties from 2008 to 2011. There are also three different styles you can view the data: bubbles(shown above), bar charts, line graphs. These can be switched in the top corner.

The dataset behind the visualization was created in R. I made a sparql query that would access Orgpedia’s datasets and pull out sector of the US and the companies and their stock tickers within the sectors. Then I took these companies and pulled in their income statements from Google Finance and went through each sector and averaged various properties from the sector’s companies’ financial statements. The data manipulation in R took some getting used to, but now its very easy for me to transform data frames, matrices, and other objects in R. After the dataset was created and cleaned for non-existent values its just defining properties of the Motion Chart and running it. It generates a html file with the graph and data represented in javascript. All the data processing and manipulation takes around 15 minutes mostly due to the large amount of data to be downloaded.

Over the course of this past weekend I also worked on another visualization for Orgpedia with Alex for a conference Eric and Xian were going to today. The visualization utilizes data from and gathers data about board members of various compaines in the US and shows the members in a force graph that shows which board members are on multiple boards (Board Members Network):

The graph visualization is done using the D3 visualization toolkit’s Forced Graph. Each node represents a board member. The clustered colored nodes are a group of members on the same board. The multicolored nodes represent board members that are on multiple boards. Mousing over a node shows you their name and the companies they work for. Clicking a node takes you to their page. The graph shows many interesting relationships between various companies and board members. Especially Steven S Reinemund who resides on 5 different boards.

Alex and I worked over Saturday aggregating all the information from little sis and creating a json file from the data. Initially LittleSis’s API was down for a couple of hours, but we just made up some fake data and played around with formatting it and structuring it into a usable json file in python. Eventually the api came back up and we got actual data to work with. It took a lot of tweaking in the python script in order to make sure all the data was made into correctly parsed json, there were a lot of edge cases in the names of people and companies. Eventually we got the script fully working and a valid, and LARGE, json file of all the board members of different companies and all the companies they worked for. Then it was time for me to work on the visualization, which took me the the rest of saturday night into sunday morning. The D3 framework is very elegant in the way it handles graphs and graphics. All it needs is a structured json file with nodes and links and defined graphical properties and it can create a physics based graph model. Although it does not scale very well due to it being in javascript so with the initial set of 2800 board members the graph just froze the browser. I designed some graph algorithms in python to get the connected components on this large member graph and remove the components that were only one board strong. This brought the count down to 2400 nodes, while a smaller set it was still too large for D3. Oddly enough most companies in the 314 sized company set we used were all part of one giant connected graph of board members. So then I decided to do some more analysis on the original companies and eventually just decided to only use companies that contained members on 4 or more companies. This did make lower the set of nodes but it created a very sparse set of around 200 nodes. Then I also randomly added companies that had board members on 2 or more companies up to a certain limit of the companies, this generated a graph of around 400 nodes which was a good size for the visualization. Then from John’s advice I scaled the size of each dependent on how many board’s the node’s member was part of, and also color coded the nodes in a pie chart fashion so that people on multiple boards had all the boards’ colors.

Due to a very heavy course load over the past month I could mostly only work on one of my projects at the TWC. So I haven’t progressed much as I wanted to on the side of the wineagent web interface. Although I did get object deletion working correctly and I also fixed the odd routing errors I had from before. There’s only about a month left in the semester and I don’t think I’ll be able to finish the project, but I will be working on the project over the winter break. The only thing left to do is finish up the editing of the wine objects and some small object validation states and then I have to create the form and model for dish objects, but it should just be mainly using the same structure as the wines so there shouldn’t be much trouble creating it.

TWC Update

21 10 2011

So three weeks have past and I’ve actually gotten quite a lot of work done. Where I ended off my last post was making an SPARQL 1.1 update module for ruby. I got to work on that and made a simple update POSTer and worked out a nice modularized way to just pass it triples and the uri and it takes care of generating the query code and sending the query to the endpoint and returning the response triples. Evan put up a test joskei triple store for me to work on. I then converted the submit process on my wine creation form to convert the final wine ruby object into a set of triples and then push them up to the triple store. I accidentally messed up the syntax of the update and then joseki’ update service caught an exception and halted and stopped the joseki server. Thankfully Evan got me administrative rights to restart the server so after a half week lost due to the server crash I got back to work. I fixed up my update syntax and it worked! I was amazed at the fact that I was taking a ruby object making into a small graph of triples and then putting that onto a graph in a triple store somewhere else on the internet! After I got the upload working the next part was downloading the triples and turning them back into a ruby wine object. This took a smidge longer than I hoped due to having to refactor some of the initial upload code. But I did get it working and then I just stood back bewildered that I’m transforming a ruby object into a graph pushing it somewhere and pulling it back and converting back again into a new ruby object! The data is forever online and convertible into a new object model on any programming language! I absolutely love this idea of abstract data storage in graphs! Anyway, rant over. What I have left to do getting editing and deleting to work correctly. I worked on editing a bunch but I’m still stuck due to some odd routing errors in rails but that’s a work in progress.

Along with the wineagent project I also volunteered to help Xian and the Orgpedia project by doing R visualizations with the corporate data. I also have to use R for my Advanced Experimental Stat class because it is based in SPSS and the teacher wants to transition the syllabus into R. Although I’ve been busy  over the past week or so with school work I have picked up R and started in the visualization. I’m currently doing a simple visualization of linear regressions of sector net profits over time. The only hitches I’m running into is figuring out which graphs have what data, but I’ve heard that there might be some Orgpedia hackathons so that problem should go away. My dream visualization is analyzing company lobbying data and seeing there are any joint clusters of companies that started to fund certain lobbies at a given point in time and mashup and correlate that with any large news at that time with those companies, but I think I should start out with some simple visualizations first. As a side note on R, I’ve found it to be a very useful mathematical/statistics language. There is also an ide called RStudio which makes working in R way more productive and efficient.

What the semantic web is

30 09 2011

I think the semantic web is a great human achievement. We’re continually cataloging, labeling, and linking human knowledge in a way that is easily human parse able to computers. I think this is a key step in creating autonomously learning intelligence. The semantic web is just a gigantic map of labeled data that can be applied to machine learning. We are not only re-writing our centuries of knowledge into another form, but that form of data is also very computable!  It’s the half way point to general intelligence. The other half is just building the software to autonomously parse this map and traverse it. In order for a human to get the knowledge of the world we first have to learn to perceive and read the knowledge and then learn how to connect and use it. With the semantic web we’ve created a very easy medium to which a machine can perceive knowledge and now its the analysis that is left to be constructed.  Once a human has a way of analyzing past knowledge it can then be extrapolated in order to infer from new situations. That is why with the semantic web as an intial stepping stone, we can then step onto harder mediums like text and speech in order to truly have a machine “understand” our physical mediums. And from that point, the possibilities are endless…

New Fall Semester!

26 09 2011

Its been a month into the new semester and I’ve gotten a lot of work done with the Wine Agent Web Interface. I was working on the wine entrance form last semester and I left off with issues using ajax and rails. I worked on the issue more over the summer and got it working and fixed some other random bugs in the rails app itself. At the end of the summer I had the wine entrance form ready to go! Every time a use picks a new wine type in the form the form dynamically updates itself using ajax and refreshes the new property values in the form for the new wine off of the properties sparql queries I made last semester. I also did a lot of other work over the summer too like making a robotics ROS library for the Cognitive Robotics lab and also picking up some neat things like Lisp, CUDA and other neat comp sci stuff.

My work this semester now deals with making Rails have a model based on a triple store rather than a sql database. The process of transforming the Rails model is a very hacky process due to making sure all the Rails controllers and validations are fully compatible to the new model. I’ve spent the last two to three weeks working this. At my current point I have most of the model transferred into a triple store back end that works off of accessing the triple store off of GET and POST sparql requests. I’m currently in the process of building a sparql 1.1 update module in ruby so that the Rails Wine model can use it to access the triple store to keep the wine data. After that’s finished and working I just have to make a dish entrance form which should be a permuted clone of the wine form along with a dish model for Rails and then make the site look a little prettier and it should be mostly all done!

TWC Reflection

6 05 2011

The semester working at the TWC has been really fun. I learned alot about the power of semantic web technologies and the whole linked data movement. In the beginning of the semester I looked into the major semantic technologies like SPARQL, ontologies, RDF and such and found the whole structure data formats for machine readable information to be really interesting. I especially like the power of SPARQL and its ability to query over large RDF graph patterns from any endpoint. I even made a neat little SPARQL query wrapper for Processing. Learning how to use SPARQL gave me a really powerful tool in my belt of programming!

I then went on to join the Wine Agent team and was given the task of making the web interface for the wine agent. This would allow users to use the wine agent without having to use the mobile apps. Before this I only made mostly static web pages so everything from this point was a learning process. I initally go back into the web developing grind and played around with jQuery a bit. In the end I had this really nice slidebox interface that was visually cool but had no actual content. I then had to go make a login system for the site and I got stuck at a multifork road on which framework I should pick up to continue the project. I eventually decided on Ruby On Rails because it seemed the simplest to pick up and overall easiest to work with.

I spent a week or two picking up Rails and what a good investment that was! Rails is an amazing platform for doing web development in! It takes care of alot of grunt coding for you by generating files off of dynamic templates. Then all you have to do is go in and customize it to the way you see fit and only have to care for more of the content of the site at first and don’t have to initially worry about all the nitty gritty backend. It also has many plugins through Ruby Gems that allow easy installation and running of multiple addon libraries. I will definitly be using rails for any future personal web services I implement.

I started the base for the web interface and finally implemented a login system among other things and it was easy peasy! I worked with building a wine model and user model for the site so that I could have it set up so that users can login and enter wines into a form and it would update it on the local database. Eventually I learned that the information isn’t going to be on a database but updated through SPARQL 1.1 onto a server somewhere at TWC. I built up the form for the wine and then came the hard part, getting the information for its elements. Here’s where I spent more time than I wished figuring out how to query the Wine.rdf file in order to get all the wine properties and each wine’s restrictions. The general properties were pretty easy to get, but each wine’s restrictions took alot more time. In the end I got it to work though, and I can now access all the wine properties I want off of the graph using some pretty neat SPARQL queries I made. Currently I’m learning how to make AJAX work nicely with rails so that I can make the wine form more dynamic and adjust to different wine types and restrictions. For future tasks I need to fully finish the dynamic form and make a SPARQL 1.1 module in Ruby to send the new added wines to an outward server instead of my local machine. I will be continuing to work on this project more over the summer on my free time.

Overall, doing research at the TWC has been a really fun experience and it has given me a perspective on the new frontiers of web technologies that I would have never gotten if I hadn’t joined the lab. The tools I picked up here give me a great resource to rely upon for most of my future projects!

Web Interface Update 2

6 05 2011

After a week or two of tinkering around with SPAQRL queries on collections in an RDF file I finally found a way to get all the properties I needed! I can now access every single one of the wine properties for any wine on the RDF file! The many weeks I’ve used up tackling this have finally come to fruition!

Anyways, my current task is to make the wine form dynamic so that the user can choose a wine type and the form changes itself so that the rest of the properties are filled with the wine’s restrictions. Evan pointed me onto AJAX for this and then I started digging into AJAX in Rails. It turned out there was a perfect field wrapper called observe_field that would allow me to do an ajax remote function call on a select box everytime it changed! But sadly it was deprecated in Rails 2 and onwards. I did more research and found out that I can use an extra on_change option on my select box where I can make an ajax call every time it changes. The interface itself is made up of using partials which are just small snippets of ruby embedded html that you can use in various places. I am just going to make the form partial update a partial within itself when to update the restrictions part of the entire form.

Its the end of the semester, but I won’t be stopping my work on the interface. I got approval to continue on it over the summer on my own personal time. So there will probably more blog posts over the summer.