Data science and data scientists are all the rage right now in the information technology space. Every company wants one; every job candidate touts they are one. But what actually does that mean to companies and potential employees? I decided to take a course on data science to see if I could find out!
My co-worker, Gabriella Melki, recommended the Coursera Data Science specialization by John Hopkins Bloomberg School of Public Health. The entire specialization contains a set of 9 courses, but you can take each one individually. I started with the first course, called "The Data Scientist's Toolbox". Over the four week timeframe, I was able to view lectures and perform the assignments at my own pace. I've listed below my thoughts on the course and what I learned about data science.
Week 1: Introduction to Data Science
Data science is about data, specifically about answering questions, and science, following a method to discover an answer. A data scientist is the person that uses data to answer questions. The reason data scientists are such a commodity is because people either can't find the data they need or there is so much data, it is difficult to wade through it all. There are a variety of tools available to help with this task. A common tool is the R language, which is what the rest of the course used. The course teaches the students to accomplish tasks on their own, such as suggesting good places to research R functions (help.search, anyone?) and recommending the use of online forums like StackOverflow to get help on confusing errors. The first week wrapped up with an overview of all the courses in the specialization and a few questions to make sure you were paying attention.
Week 2: Learning the Tools
I used to program about 10 years ago, but have focused more on SQL in recent years. So I had a bit of a learning curve when it came to the tools needed to work with R programs. Week 2 of the course walked the students through installing and the basics on each of these tools. Specifically, I learned about Git Bash, Git, GitHub, R packages, and RStudio. Although it seemed a little overwhelming at first, the course went step-by-step through each installation and explained each of the commands needed to use each of the tools. At the end of Week 2, we ran some of the commands to show our mastery of the commands taught.
Week 3: Understanding Questions, Data, and Approach
As we learned in Week 1, data science is all about trying to answer questions with data. Based on the data you have and the answer you need, you may ask different questions and use different approaches. For example, you may just want to describe the data, to make a prediction about the information, or more. Data comes in all shapes and sizes: qualitative versus quantitative, large versus small, confounding versus predicting. The most important thing is to ask the right question first, design a logical experiment, and then investigate the data to find the answer. Be careful not to force your results through experious correlations or data dredging! You don't want just any answer - you want the most accurate one. At the end of the third week, we completed a project to show our full understanding of the process and tools that we had learned over the past three weeks.
Week 4: Completing the Project
Week 4, the last week of the course, we checked everyone else's work. This grading exercise was also interesting to me. They asked every student to check over the work for at least three other students. Then they assigned a grade based on those results. I'm guessing they have some way to verify the information - maybe kick out the highest/lowest values and average the rest? Or kick out the students' results with the most variety in their scores? Sounds like a good data science problem. I'm not sure of the answer, but I completed the course with a 100%!
If you're interested in taking the data science course described above, visit here: https://www.coursera.org/course/datascitoolbox
Thursday, December 17, 2015
Is Data Science a Buzzword? aka: My first Coursera Course
Tuesday, November 10, 2015
Using Power BI Custom Visualizations
Power BI (https://powerbi.microsoft.com) is Microsoft’s tool that provides fast analysis and reporting to developers and business users. Microsoft releases features on a monthly basis to this tool, so this post may be out of date before it’s even published! One of the more recent releases includes the ability to create and publish custom visualizations for use by others.
Power BI Visuals Gallery
The Power BI Visuals Gallery is where you can publish, search, and download custom visuals for use in Power BI Desktop and the Power BI website. People in the community and Microsoft have published visualizations that enhance the dashboard experience and still interact with the other visualizations as though they came from out-of-the-box! (On a side note, do we need to stop saying out-of-the-box now that everything is cloud-first…) The types of visuals run the gamit from charts, graphs, animations, and slicers.
Searching for a Custom Visual
To start, go to the Power BI Visuals Gallery: https://app.powerbi.com/visuals.

It warns you that this isn’t supported by Microsoft, but if you like living on the edge, click “I agree”. (I chuckled a little that the visualization published by Microsoft is not supported by Microsoft, but I’m not complaining - I’m just happy to have this cool tool!)

The visualization will download as a *.pbiviz (Power BI Visualization) file to your desktop.
Installing a Custom Visual
Next, you need to install the custom visualization. This shows the install process for Power BI Desktop, but the process is similar to the Power BI Website. Kick off the install through the File > Import menu or by clicking the ellipses (…) in the Visualizations pane.


And hurray, you have installed the visual!
Verifying the Custom Visual
You can check to see that the visual is installed by looking at the visualization pane within Power BI Desktop. You will see the new visualization in the Visualization pane, available for your use.

When you next open the report, you will get a message warning you about custom visuals again. Just select “Enable custom visuals”, and you are all set. Happy dashboarding!
Versions: Power BI Visuals Gallery on 11/2/2015, Power BI Desktop v2.28.4190.122
Tuesday, November 3, 2015
My Week at the PASS Summit 2015
- SSRS reports pinning to Power BI
- Auto-insights coming in Power BI
- DAX will have intellisense, comments, formatting, and new functions
- SSAS tabular will optimize DirectQuery querying
- Power BI will provide a new visualization every week
- Power BI can use data from on premise multidimensional SSAS databases
- Incremental package deployment (no more "all-or-nothing")
- Ability to turn on an "optimize buffer size" option in packages
- Execution of R in SQL, which can be called through and SSIS package
- Ability to adjust how the ForEach Loop loops through files
- Information about the Summit: http://www.sqlpass.org/summit/2015/About.aspx
- Watch some session on PASSTV: http://www.sqlpass.org/summit/2015/Live.aspx
- Reporting Roadmap released during the Summit: http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx
- Latest version of SQL Server 2016 CTP 3: http://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016