Data Science with Python - Starting (Installation of the required)

February 16, 2016

Introduction - What is Data Science?

With the evolution of data storage technology and growing processing speeds, Data Science is trending and has already been elected the most promising career by many different sources (just an example, from Harvard Business Review: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/). Well, let’s go to what matters.

Data Science consists in extracting knowledge or helping in decision making through the usage of computing in a significant quantity of data. Some practical examples to help in understanding:

  • Usage of user information (like age, previously visited websites, gender, etc) to better direct online ads, improving the click through rate;
  • Filtering and grouping emails between Spam and Non Spam;
  • Recommendation Systems, like the one used by Netflix to suggest content that you are more inclined to watch based on what you already watched.

Data Science With Python

A programming language that is commonly used for Data Science is Python. The famous multi-purpose language can also be used for Data Science, especially when paired with the NumPy and Pandas packages.

The easiest way that I know to use these packages is through the Anaconda distribution. Anaconda is a Python distribution created by the company Continuum, that already includes lots of packages that are very popular to work with Data Science, math, scientific computing. There are installers for Python 2 and Python 3, with versions for Windows, Linux and OSX. You can download it on: https://www.continuum.io/downloads

The installation is pretty simple. With it, Anaconda will also install Spyder, an excellent IDE for Data Science with Python. You can also use any other IDE and run your code with the Anaconda Prompt, that will be available after Anaconda installation.

Below is how Spyder IDE looks:

Spyder IDE working
Spyder IDE working

When you open it for the first time, it will probably be a little different than that, because i changed the look of it to a darker style. You can change styles in Tools > Preferences > Editor, on the option Syntax Color Scheme.

Using Spyder is really simple. By default, on the upper right corner there is an object inspector, a variable inspector and a file inspector, divided by tabs. On the lower right corner is the console, and Spyder is already integrated with IPython, but you can also use the default Python console if you want. And on the left there are the open files.

Python also has Data Science packages besides the ones in Anaconda, with more tools being developed every day. Eventually, when we talk about them and when we need to use them, I’ll teach you how to install them. Some examples are Theano and Lasagne, for Machine Learning, and Bokeh, for Data Visualization.

In the next posts I’ll talk more about Pandas, starting with the basics (what it’s all about, data manipulation, reading files, etc) and them we can start to play around with more advanced stuff (data visualization, machine learning, etc).

Stay tuned! :)