Mining the Social Web

Analyzing Data From Facebook, Twitter, Linkedin, and Other Social Media Sites
A book about using Python and the APIs provided by several popular social networks to extract information and trends.

Though maybe a bit broader than deep in some area, the author has pulled together a lot of information and made it a practical resource for mining information. This isn’t going to teach you how to do state-of-the-art text processing — the author is up-front about that — but it’s going to get you started with practical examples in a lot of areas.

Mining the Social web is a book about how to, using the Python language and the APIs provided by several popular social networks, extract information and trends. There is a distinction between the social web of websites like Twitter, Facebook and the semantic social web provided by microformats and blogs. Typically microformats and blogs are not considered when discussing the social web. This book covers them both.

The book is organized into ten chapters. The first deals with setting up a Python development environment. The next covers Microformats (a way of indicating semantic information such as contact, geographic or event information into regular HTML via class names). A chapter on analyzing email follows it.

The APIs provided by the major social networks are covered in individual chapters. Chapters two and three are on Twitter. Another on LinkedIn and a third on Google’s Buzz. Facebook is covered in chapter 9. Blogs and natural language processing are covered in chapter 7 while even email is given its chapters near the start of the book. One of the things readers may be unfamiliar with Python will notice how many libraries there are available to access all these things easily. I’d suggest that had another language had been used, the book would have been twice as big.

READ  Hackers

While each chapter is is pretty-much stand-alone, the reader should have some familiarity with data analysis or natural language processing methodologies. The book doesn’t teach Python and readers are expected to know a bit about technologies and products like OAuth, CouchDB, Redis as well as Python libraries like MapReduce, NumPy. The book is laid out with plenty of screen-shots and lots of source code. Some examples are using specifically Linux shells, but not many.

As you might guess, there is a lot of topics covered here, everything from using various APIs to data analysis tools. So there simply isn’t enough pages to go into too much depth- the fact that the authors are upfront about in the opening. As long as readers are aware that the book is a starting point, rather than a recipe to state-of-the-art natural language processing, you’ll probably find plenty of useful information here.