Thursday, December 17, 2015

Is Data Science a Buzzword? aka: My first Coursera Course

Data science and data scientists are all the rage right now in the information technology space. Every company wants one; every job candidate touts they are one. But what actually does that mean to companies and potential employees? I decided to take a course on data science to see if I could find out!

My co-worker, Gabriella Melki, recommended the Coursera Data Science specialization by John Hopkins Bloomberg School of Public Health. The entire specialization contains a set of 9 courses, but you can take each one individually. I started with the first course, called "The Data Scientist's Toolbox". Over the four week timeframe, I was able to view lectures and perform the assignments at my own pace. I've listed below my thoughts on the course and what I learned about data science.

Week 1: Introduction to Data Science
Data science is about data, specifically about answering questions, and science, following a method to discover an answer. A data scientist is the person that uses data to answer questions. The reason data scientists are such a commodity is because people either can't find the data they need or there is so much data, it is difficult to wade through it all. There are a variety of tools available to help with this task. A common tool is the R language, which is what the rest of the course used. The course teaches the students to accomplish tasks on their own, such as suggesting good places to research R functions (help.search, anyone?) and recommending the use of online forums like StackOverflow to get help on confusing errors. The first week wrapped up with an overview of all the courses in the specialization and a few questions to make sure you were paying attention.

Week 2: Learning the Tools
I used to program about 10 years ago, but have focused more on SQL in recent years. So I had a bit of a learning curve when it came to the tools needed to work with R programs. Week 2 of the course walked the students through installing and the basics on each of these tools. Specifically, I learned about Git Bash, Git, GitHub, R packages, and RStudio. Although it seemed a little overwhelming at first, the course went step-by-step through each installation and explained each of the commands needed to use each of the tools. At the end of Week 2, we ran some of the commands to show our mastery of the commands taught.

Week 3: Understanding Questions, Data, and Approach
As we learned in Week 1, data science is all about trying to answer questions with data. Based on the data you have and the answer you need, you may ask different questions and use different approaches. For example, you may just want to describe the data, to make a prediction about the information, or more. Data comes in all shapes and sizes: qualitative versus quantitative, large versus small, confounding versus predicting. The most important thing is to ask the right question first, design a logical experiment, and then investigate the data to find the answer. Be careful not to force your results through experious correlations or data dredging! You don't want just any answer - you want the most accurate one. At the end of the third week, we completed a project to show our full understanding of the process and tools that we had learned over the past three weeks.

Week 4: Completing the Project
Week 4, the last week of the course, we checked everyone else's work. This grading exercise was also interesting to me. They asked every student to check over the work for at least three other students. Then they assigned a grade based on those results. I'm guessing they have some way to verify the information - maybe kick out the highest/lowest values and average the rest? Or kick out the students' results with the most variety in their scores?  Sounds like a good data science problem.  I'm not sure of the answer, but I completed the course with a 100%!

image
If you're interested in taking the data science course described above, visit here: https://www.coursera.org/course/datascitoolbox

Tuesday, November 10, 2015

Using Power BI Custom Visualizations

Power BI (https://powerbi.microsoft.com) is Microsoft’s tool that provides fast analysis and reporting to developers and business users.  Microsoft releases features on a monthly basis to this tool, so this post may be out of date before it’s even published!  One of the more recent releases includes the ability to create and publish custom visualizations for use by others.

Power BI Visuals Gallery
The Power BI Visuals Gallery is where you can publish, search, and download custom visuals for use in Power BI Desktop and the Power BI website.  People in the community and Microsoft have published visualizations that enhance the dashboard experience and still interact with the other visualizations as though they came from out-of-the-box! (On a side note, do we need to stop saying out-of-the-box now that everything is cloud-first…)  The types of visuals run the gamit from charts, graphs, animations, and slicers.

Searching for a Custom Visual
To start, go to the Power BI Visuals Gallery: https://app.powerbi.com/visuals.



For example, Microsoft released a custom visual that handles advanced slicing, known as the Chiclet Slicer.  Once you find that visual (or another that you like), click on it, and click the “Download Visual” button.



It warns you that this isn’t supported by Microsoft, but if you like living on the edge, click “I agree”. (I chuckled a little that the visualization published by Microsoft is not supported by Microsoft, but I’m not complaining - I’m just happy to have this cool tool!)



The visualization will download as a *.pbiviz (Power BI Visualization) file to your desktop.

Installing a Custom Visual
Next, you need to install the custom visualization.  This shows the install process for Power BI Desktop, but the process is similar to the Power BI Website.  Kick off the install through the File > Import menu or by clicking the ellipses (…) in the Visualizations pane.



Select your downloaded .pbiviz custom visualization, and you will be warned yet again to make sure you really, really, really want to import the custom visualization.  If so, click the “Import” button.



And hurray, you have installed the visual!

Verifying the Custom Visual
You can check to see that the visual is installed by looking at the visualization pane within Power BI Desktop.  You will see the new visualization in the Visualization pane, available for your use.



When you next open the report, you will get a message warning you about custom visuals again. Just select “Enable custom visuals”, and you are all set.  Happy dashboarding!


Versions: Power BI Visuals Gallery on 11/2/2015, Power BI Desktop v2.28.4190.122

Tuesday, November 3, 2015

My Week at the PASS Summit 2015

Last week the SQL PASS organization held the annual PASS Summit in Seattle, Washington.  The PASS Summit is a week-long conference that brings thousands of SQL Server, Business Intelligence, and Business Analyst professionals together to learn all about best practices in use today and about new features coming in the next version.  I was able to attend by volunteering to help with PASS and by the good graces of my company and had an amazing week!  Many thanks go to all of the organizers, speakers, volunteers, and sponsors who put on another great event.

The conference has two full-day preconference sessions on Monday and Tuesday which can be purchased in addition to the conference.  These trainings are amazing and definitely worth your while if you want more training.  I ended up flying out on Tuesday to start with the main conference on Wednesday.  On Tuesday, I explored Seattle a bit, and even visited the space needle (my first time in all my years visiting Seattle!).  I checked into the conference, visited with some friends at the Denny Cherry Associates and SIOS #sqlkaraoke event, and went to bed early to prepare for the next day.

Wednesday

Wednesday, the first day of the conference, started with some great sessions.  James Phillips and team presented the Foundation Session on Microsoft Business Intelligence, which talked about the new business intelligence vision (consistency and modernization across all of the reporting tools).  Here are a few of my favorite features:
  1. SSRS reports pinning to Power BI
  2. Auto-insights coming in Power BI
  3. DAX will have intellisense, comments, formatting, and new functions
  4. SSAS tabular will optimize DirectQuery querying
  5. Power BI will provide a new visualization every week
  6. Power BI can use data from on premise multidimensional SSAS databases

I attended a few more sessions and closed out the day by attending an executive meet and greet sponsored by Microsoft and the PragmaticWorks #sqlkaraoke event.

Thursday

Thursday was another great session day.  I attended a Microsoft SSIS Focus session on the new features coming in SSIS 2016.  Jimmy Wong presented many of the new features and took feedback from the group on their thoughts.  Some of the cool new things that will be coming include:
  1. Incremental package deployment (no more "all-or-nothing")
  2. Ability to turn on an "optimize buffer size" option in packages
  3. Execution of R in SQL, which can be called through and SSIS package
  4. Ability to adjust how the ForEach Loop loops through files

There were a few other sessions, and then I closed out the evening at the Community Appreciation Party at the EMP museum (an amazing museum if you haven’t been there).

Friday

On Friday, I hosted a Birds of a Feather table, where people working on similar topics (in my case, it was business intelligence architecture and design) sit together during lunch to discuss the topic.  I had a great table with a split between more advanced professionals and newbies to BI and architecture.  We covered topics from choosing products to modeling to ETL performance tuning and more.  Thanks everyone who attended!

Friday was the last day, so I chatted with some friends, explored Seattle more, and relaxed a bit before week 2 at the MVP Summit.  I can't post anything about it, but be sure I'm sharing your feedback with Microsoft as much as I can :)

More information

For more information on the PASS, the Summit, and announcements, visit the following links:
  1. Information about the Summit: http://www.sqlpass.org/summit/2015/About.aspx
  2. Watch some session on PASSTV: http://www.sqlpass.org/summit/2015/Live.aspx
  3. Reporting Roadmap released during the Summit: http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx
  4. Latest version of SQL Server 2016 CTP 3: http://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016