Thursday, December 17, 2015

Is Data Science a Buzzword? aka: My first Coursera Course

Data science and data scientists are all the rage right now in the information technology space. Every company wants one; every job candidate touts they are one. But what actually does that mean to companies and potential employees? I decided to take a course on data science to see if I could find out!

My co-worker, Gabriella Melki, recommended the Coursera Data Science specialization by John Hopkins Bloomberg School of Public Health. The entire specialization contains a set of 9 courses, but you can take each one individually. I started with the first course, called "The Data Scientist's Toolbox". Over the four week timeframe, I was able to view lectures and perform the assignments at my own pace. I've listed below my thoughts on the course and what I learned about data science.

Week 1: Introduction to Data Science
Data science is about data, specifically about answering questions, and science, following a method to discover an answer. A data scientist is the person that uses data to answer questions. The reason data scientists are such a commodity is because people either can't find the data they need or there is so much data, it is difficult to wade through it all. There are a variety of tools available to help with this task. A common tool is the R language, which is what the rest of the course used. The course teaches the students to accomplish tasks on their own, such as suggesting good places to research R functions (help.search, anyone?) and recommending the use of online forums like StackOverflow to get help on confusing errors. The first week wrapped up with an overview of all the courses in the specialization and a few questions to make sure you were paying attention.

Week 2: Learning the Tools
I used to program about 10 years ago, but have focused more on SQL in recent years. So I had a bit of a learning curve when it came to the tools needed to work with R programs. Week 2 of the course walked the students through installing and the basics on each of these tools. Specifically, I learned about Git Bash, Git, GitHub, R packages, and RStudio. Although it seemed a little overwhelming at first, the course went step-by-step through each installation and explained each of the commands needed to use each of the tools. At the end of Week 2, we ran some of the commands to show our mastery of the commands taught.

Week 3: Understanding Questions, Data, and Approach
As we learned in Week 1, data science is all about trying to answer questions with data. Based on the data you have and the answer you need, you may ask different questions and use different approaches. For example, you may just want to describe the data, to make a prediction about the information, or more. Data comes in all shapes and sizes: qualitative versus quantitative, large versus small, confounding versus predicting. The most important thing is to ask the right question first, design a logical experiment, and then investigate the data to find the answer. Be careful not to force your results through experious correlations or data dredging! You don't want just any answer - you want the most accurate one. At the end of the third week, we completed a project to show our full understanding of the process and tools that we had learned over the past three weeks.

Week 4: Completing the Project
Week 4, the last week of the course, we checked everyone else's work. This grading exercise was also interesting to me. They asked every student to check over the work for at least three other students. Then they assigned a grade based on those results. I'm guessing they have some way to verify the information - maybe kick out the highest/lowest values and average the rest? Or kick out the students' results with the most variety in their scores?  Sounds like a good data science problem.  I'm not sure of the answer, but I completed the course with a 100%!

image
If you're interested in taking the data science course described above, visit here: https://www.coursera.org/course/datascitoolbox

Tuesday, November 10, 2015

Using Power BI Custom Visualizations

Power BI (https://powerbi.microsoft.com) is Microsoft’s tool that provides fast analysis and reporting to developers and business users.  Microsoft releases features on a monthly basis to this tool, so this post may be out of date before it’s even published!  One of the more recent releases includes the ability to create and publish custom visualizations for use by others.

Power BI Visuals Gallery
The Power BI Visuals Gallery is where you can publish, search, and download custom visuals for use in Power BI Desktop and the Power BI website.  People in the community and Microsoft have published visualizations that enhance the dashboard experience and still interact with the other visualizations as though they came from out-of-the-box! (On a side note, do we need to stop saying out-of-the-box now that everything is cloud-first…)  The types of visuals run the gamit from charts, graphs, animations, and slicers.

Searching for a Custom Visual
To start, go to the Power BI Visuals Gallery: https://app.powerbi.com/visuals.



For example, Microsoft released a custom visual that handles advanced slicing, known as the Chiclet Slicer.  Once you find that visual (or another that you like), click on it, and click the “Download Visual” button.



It warns you that this isn’t supported by Microsoft, but if you like living on the edge, click “I agree”. (I chuckled a little that the visualization published by Microsoft is not supported by Microsoft, but I’m not complaining - I’m just happy to have this cool tool!)



The visualization will download as a *.pbiviz (Power BI Visualization) file to your desktop.

Installing a Custom Visual
Next, you need to install the custom visualization.  This shows the install process for Power BI Desktop, but the process is similar to the Power BI Website.  Kick off the install through the File > Import menu or by clicking the ellipses (…) in the Visualizations pane.



Select your downloaded .pbiviz custom visualization, and you will be warned yet again to make sure you really, really, really want to import the custom visualization.  If so, click the “Import” button.



And hurray, you have installed the visual!

Verifying the Custom Visual
You can check to see that the visual is installed by looking at the visualization pane within Power BI Desktop.  You will see the new visualization in the Visualization pane, available for your use.



When you next open the report, you will get a message warning you about custom visuals again. Just select “Enable custom visuals”, and you are all set.  Happy dashboarding!


Versions: Power BI Visuals Gallery on 11/2/2015, Power BI Desktop v2.28.4190.122

Tuesday, November 3, 2015

My Week at the PASS Summit 2015

Last week the SQL PASS organization held the annual PASS Summit in Seattle, Washington.  The PASS Summit is a week-long conference that brings thousands of SQL Server, Business Intelligence, and Business Analyst professionals together to learn all about best practices in use today and about new features coming in the next version.  I was able to attend by volunteering to help with PASS and by the good graces of my company and had an amazing week!  Many thanks go to all of the organizers, speakers, volunteers, and sponsors who put on another great event.

The conference has two full-day preconference sessions on Monday and Tuesday which can be purchased in addition to the conference.  These trainings are amazing and definitely worth your while if you want more training.  I ended up flying out on Tuesday to start with the main conference on Wednesday.  On Tuesday, I explored Seattle a bit, and even visited the space needle (my first time in all my years visiting Seattle!).  I checked into the conference, visited with some friends at the Denny Cherry Associates and SIOS #sqlkaraoke event, and went to bed early to prepare for the next day.

Wednesday

Wednesday, the first day of the conference, started with some great sessions.  James Phillips and team presented the Foundation Session on Microsoft Business Intelligence, which talked about the new business intelligence vision (consistency and modernization across all of the reporting tools).  Here are a few of my favorite features:
  1. SSRS reports pinning to Power BI
  2. Auto-insights coming in Power BI
  3. DAX will have intellisense, comments, formatting, and new functions
  4. SSAS tabular will optimize DirectQuery querying
  5. Power BI will provide a new visualization every week
  6. Power BI can use data from on premise multidimensional SSAS databases

I attended a few more sessions and closed out the day by attending an executive meet and greet sponsored by Microsoft and the PragmaticWorks #sqlkaraoke event.

Thursday

Thursday was another great session day.  I attended a Microsoft SSIS Focus session on the new features coming in SSIS 2016.  Jimmy Wong presented many of the new features and took feedback from the group on their thoughts.  Some of the cool new things that will be coming include:
  1. Incremental package deployment (no more "all-or-nothing")
  2. Ability to turn on an "optimize buffer size" option in packages
  3. Execution of R in SQL, which can be called through and SSIS package
  4. Ability to adjust how the ForEach Loop loops through files

There were a few other sessions, and then I closed out the evening at the Community Appreciation Party at the EMP museum (an amazing museum if you haven’t been there).

Friday

On Friday, I hosted a Birds of a Feather table, where people working on similar topics (in my case, it was business intelligence architecture and design) sit together during lunch to discuss the topic.  I had a great table with a split between more advanced professionals and newbies to BI and architecture.  We covered topics from choosing products to modeling to ETL performance tuning and more.  Thanks everyone who attended!

Friday was the last day, so I chatted with some friends, explored Seattle more, and relaxed a bit before week 2 at the MVP Summit.  I can't post anything about it, but be sure I'm sharing your feedback with Microsoft as much as I can :)

More information

For more information on the PASS, the Summit, and announcements, visit the following links:
  1. Information about the Summit: http://www.sqlpass.org/summit/2015/About.aspx
  2. Watch some session on PASSTV: http://www.sqlpass.org/summit/2015/Live.aspx
  3. Reporting Roadmap released during the Summit: http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx
  4. Latest version of SQL Server 2016 CTP 3: http://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016



Wednesday, September 3, 2014

Accidental SharePoint Designer 101

I fully profess to know little to nothing about SharePoint, but I occasionally get pulled into setting up little sites or adding web parts for some of my reporting and business intelligence work.  Each time, I have to relearn the start-up steps to create what is needed!  So I decided to record a few of my go-to places so I can remember next time.  I used SharePoint 2010 to document the below steps, but the directions may be applicable to other versions.  Also, these steps assume you have full control of your site.

Getting Started

The first step is start editing the page rather than looking at it like an end user.  Do this by:

  1. Select the Page tab at the top of the screen
  2. Click the Edit Page button/drop down list
  3. Select the Edit Page option

PS. When you're done, do these same steps, except select the "Stop Editing" button.

Content Creation

You may need to create a document library, a list, or another type of container.  I like to create my content first because it makes the design portion later. To do this:

  1. On the Site Actions menu in the top left corner, select the appropriate content creation option:
    • New Document Library
    • New Site
    • More Options
  2. Specify the important info, and be sure you're happy with the view of everything

Web Part Addition

As long as you're in editing mode, this is straightforward: Just select the Add a Web Part link where you want your web part.

PS. this is where creating your content comes in handy because you can select your content without having to pick the web part type.

Web Part Editing

To make any changes to the web part, such as style, name, or option, you can do that in the page itself.  To do this:

  1. In editing mode, hover your mouse over the web part you want to modify
  2. On the right side of the toolbar (at the top of the web part, next to the title), click the down arrow
  3. Click the Edit Web Part menu option
  4. On the right side of the screen, make any setting changes you would like, and be sure to select the OK button at the bottom.

Color and Style

Next, we want to make the color and style to match either the rest of the system or to match our own colors! Change this by:

  1. On the Site Actions menu in the top left corner, select the Site Settings option
  2. Under the Look and Feel section, click the Site theme link
  3. Either inherit your parent's theme, specify your own theme, or customize your color options

Left Navigation Menu

My usual goal is to make the page look less "SharePoint-y", which includes removing the items from the left navigation menu and making links to other sites or pages within the site.  You can change the menu by:

  1. On the Site Actions menu in the top left corner, select the Site Settings option
  2. Under the Look and Feel section, click the Navigation link
  3. Scroll down to Navigation Editing and Sorting and have fun playing!

Security

Hopefully, someone is going to use the page that you just put so much time and energy into!  So we need to give those people access.  Do this by:

  1. On the Site Actions menu in the top left corner, select the Site Permissions option
  2. Give access to either Windows or SharePoint groups and decide what permission they get

PS. It is important have a good security plan in place, and hopefully you can work with your SharePoint administrator on this.

Good luck if you end up an "accidental SharePoint designer" like me!

Tuesday, July 29, 2014

Upgrading your SSIS Management Framework: Part 3

At this point, you understand the options for moving an SSIS framework to the latest version of SSIS, and you've upgraded the logging portion of the framework using a hybrid approach.  The final step in the framework upgrade is handling your configurations.  Let's walk through an existing configuration implementation and how you can upgrade it by combining your existing implementation with the standard SSIS framework.

Overview

A typical "old-school" configuration scheme is described in the SSIS PDS book or in this blog post here: http://jessicammoss.blogspot.com/2008/05/ssis-configuration-to-configuration-to.html.  Starting in SSIS 2012, the configuration scheme uses environments and parameters when using the Project Deployment Model, as discussed here: http://msdn.microsoft.com/en-us/library/hh213290(v=sql.110).aspx.

In both scenarios, the core ideas in a configuration scheme are:

  1. Provide the ability to move packages through environments without having to touch the packages
  2. Provide one location where connection strings / variables are stored, so in case a value changes, you don't have to change the value in multiple places

Assumptions

To enable our hybrid approach, we will utilize the SSIS 2012+ catalog as our "master" version of the configuration values and modify the previous framework to use its data.  This example counts on the following assumptions.  If your system is different, you may need to make some modifications to the implementation described here.

  1. Assumption for the old framework: All configuration entries modify Variables, rather than Connections or other object types.
  2. Assumption for the new framework: You have a CommonConfigurations environment on each server that holds your values for your development, test, and production servers, as well as an environment for each package that would have its own package level values.

Configurations

To tie the two systems together, we will use the environments stored in the 2012+ catalog to pass to the earlier framework.  The earlier framework retrieves all of its information from a table called dbo.SSIS Configurations, so we can just replace that table with a view that points to the new catalog!

Start by renaming your old table to [SSIS Configurations Old]:

EXEC sp_rename 'dbo.SSIS Configurations', 'SSIS Configurations Old';
GO


Next, create a view named [SSIS Configurations] that reads from the SSIS catalog:



CREATE VIEW dbo.[SSIS Configurations]
AS
    SELECT CAST(e.name AS NVARCHAR(255)) as ConfigurationFilter
        , CAST(ev.value AS NVARCHAR(255)) AS ConfiguredValue
        , CAST('\Package.Variables[User::' + ev.name + '].Properties[Value]'
                AS NVARCHAR(255)) AS PackagePath
        , CAST(ev.type AS NVARCHAR(20)) AS ConfiguredValueType
    FROM [SSISDB].[catalog].[environment_variables] ev
    LEFT JOIN [SSISDB].[catalog].[environments] e
        ON ev.environment_id=e.environment_id
GO


Compare the output from SSIS Configurations and SSIS Configurations Old, and add any additional variables needed to the environment in the SSIS catalog.  From now on, you will add new configurations only to the new catalog.  You have one place to keep the "master" values and you only change them in one place!



Good luck!



Keep in mind that based on your framework implementation, not all of this may be applicable.  However, I hope that it gives you something to think about as you are evaluating your framework upgrade options!

Friday, June 27, 2014

Upgrading your SSIS Management Framework: Part 2

Based on Part 1 of Upgrading your SSIS Management Framework, you’ve decided to go with a hybrid approach for your framework.  The hybrid approach which will use some components of the custom framework (this post will use the framework provided in SSIS PDS, but the concepts are applicable to any custom framework) and also utilize the standard SSIS framework.  This allows you to tie your existing package ecosystem with the latest and greatest built-in framework. Let’s talk through an overview of what we’re going to do and then explain each of the steps needed to implement it.

Overview

When it comes down to it, we need to accomplish two main things for this hybrid approach: tie our logging tables together and tie our configuration tables together.  When it comes to logging, each system has its own important identifier (ID) that can get you to anything else in the system.  The important ID in the custom SSIS framework is the PackageLogID, and the important ID in the standard SSIS framework is the ServerExecutionID.  The work you need to do is to map these two executions together.  When it comes to configurations, you want to be able to have one place to modify your connection strings and common variables that will modify all of your packages.

Logging

Let’s start with logging.  As previously mentioned, our goal here is to the two logging systems together.  Since we don’t want to modify the standard SSIS framework (that would defeat the purpose of moving to that framework!), we’ll do our modifications in the existing framework.  However, we want to be sure not to change anything in the packages themselves because that would cause a lot of rework.  Fortunately, we can do this in the table and stored procedure that the framework utilizes.

Begin by adding a new column to your main table that contains package executions, such as:

IF NOT EXISTS(SELECT * FROM sys.columns WHERE [name] = N'ServerExecutionID' AND [object_id] = OBJECT_ID(N'PackageLog'))
BEGIN
    ALTER TABLE [dbo].[PackageLog]
    ADD [ServerExecutionID] bigint NULL
END


Next, you will modify the stored procedure to populate the field you just added:



UPDATE dbo.PackageLog
SET ServerExecutionId = (
    SELECT MAX(execution_id) AS execution_id
    FROM SSISDB.catalog.executions
    WHERE status = 2 AND end_time IS NULL
        and package_name = @PackageName )
WHERE @PackageLogID = @PackageLogID


This will insert the ServerExecutionID from the standard framework into the custom framework.  Next, you can modify your reporting queries to utilize the new column.  For example, I modified the existing standard framework query for the “All Executions” report to include information from the old framework in the following query:



SELECT TOP(10)
    a.[execution_id],
    CAST(a.[start_time] AS smalldatetime) AS shortStartTime,
    CONVERT(FLOAT, DATEDIFF(millisecond, a.[start_time], ISNULL(a.[end_time], SYSDATETIMEOFFSET())))/1000 AS duration
FROM [catalog].[executions] a
INNER JOIN [catalog].[executions] b ON
    a.[package_name] = b.[package_name] AND
    a.[project_name] = b.[project_name] AND
    a.[folder_name] = b.[folder_name]
WHERE b.[execution_id] = ? AND
    a.[status] = 7
UNION ALL
SELECT a.PackageLogID,
    CAST(a.[StartDateTime] AS smalldatetime) AS shortStartTime,
    CONVERT(FLOAT, DATEDIFF(millisecond, a.[StartDateTime], ISNULL(a.EndDateTime, SYSDATETIMEOFFSET())))/1000 AS duration
FROM SSIS_PDS.dbo.PackageLog a
INNER JOIN SSIS_PDS.dbo.PackageVersion b ON a.PackageVersionID=b.PackageVersionID
INNER JOIN SSIS_PDS.dbo.PackageVersion c ON b.PackageID=c.PackageID
INNER JOIN SSIS_PDS.dbo.PackageLog d ON c.PackageVersionID=d.PackageVersionID
WHERE d.ServerExecutionID = ?
     AND a.ServerExecutionID IS NULL
ORDER BY [start_time] DESC


In this way, you can see all executions together, and you can use similar queries to tie any executions from the old framework to the new one!



Configurations



Next week, we’ll look at configurations and how to manage those in both frameworks.

Tuesday, June 17, 2014

Upgrading your SSIS Management Framework: Part 1

Background

Before SQL Server 2012, SQL Server Integration Services (SSIS) had no built-in logging, auditing, and configuration framework.  All of the pieces were available to build your own, but everyone ended up doing that just a little bit differently.  Most of us consultants came up with our own variation to implement at client sites and ensure that all of those functions at that one client were the same.  I'm especially proud of the framework that Rushabh Mehta and I developed that is published in Microsoft SQL Server Integration Services: Problem, Design, Solution (SSIS PDS) and implemented by many others as well.

Along came SSIS 2012, when Microsoft realized this "multiple-different frameworks" spread was happening, and thought "how cool would it be if we could standardize the framework so ALL our clients have the same one".  This would not only reduce initial development time of the framework, but also ensure that upgrades and future maintenance would go smoothly. Coinciding with (or perhaps due to) this decision, Microsoft moved the execution of SSIS packages to run inside of SQL Server.  Having a consistent framework is wonderful, and I'm a big fan of using the consistent built-in framework.

Upgrade Options

However, what do you do if you've already implemented a custom framework, similar to one in SSIS PDS?  There are a couple of options:

  1. Move lock, stock, and barrel to the new framework. To do this, you would have to upgrade all existing packages, remove the components that logged to the framework, and change your configurations schema to use parameters.  This means that you would never have to worry about upgrading your framework again because Microsoft will take care of it.  On the other hand, you would lose ties to your existing log records.  If you have a small number of packages or you have only used a custom framework for a short amount of time, I would recommend this option.
  2. Stick with what you've got. This option is the most easy to implement, but provides the least amount of value.  The new SSIS framework contains much more logging than most of the custom frameworks, and when a package fails and you don't know why, mo' logging = mo' better.  I do not recommend this option.
  3. Use a hybrid approach, which the potential to phase out the custom framework in the future.  This option includes a little bit of up-front work, but will be maintainable (and hopefully enhanced!) in future SSIS versions.

Okay, let’s do it!

Next week, we'll look at how to implement the hybrid approach based on the PDS framework.