Data Case Study on Arcadia, CA
About 2 years ago, an older house in the neighborhood sold for about $1 million. Within a span of a year, the old house was torn down and a new one was built. It went on to sell for around $4 million. The same thing happened just up the street. Similar layout and selling price. A third is finishing construction and should be on the market very soon. I thought it was just a few isolated examples of investors buying up older houses and transforming them into mega houses until I drove around the southern suburbs of the city. There were so many new homes under construction. There is an occasion 1-story home sandwiched between 2 towering mansions. It was pretty ridiculous and gave me an idea to do some digging.
I spent last Sunday afternoon on Zillow and Trulia, mining home listings from the sites. I ended up using the Trulia API because it was able to return all the listings in a particular city and time range. Zillow wasn’t able to return the data in this format (read somewhere that they used to be able to but took that feature away due to some data contract). I made a iPython notebook showcasing my data and analysis. The surprising takeaway was that the average house listing was $1 million back in early 2012 and increased to almost $2 million as of Q2 2014. Note that the data is not 100% representative in that Trulia listings are not comprehensive. The real numbers might fluctuate a bit but the trend is there.
I also looked at the demographics from Census data. I had a few other ideas but from a quick search, it will be pretty difficult to obtain the necessary data to do the analysis. I wish the federal/state/local government data will be more readily available in the future.
To view the notebook on nbviewer: http://nbviewer.ipython.org/github/scku/Arcadia/blob/master/Arcadia.ipynb
Here’s the source repository for anyone that’s interested: https://github.com/scku/Arcadia