Sunday, July 03, 2016

NDepend Pro review

Background

NDepend is a tool for performing code analysis with regards to different aspects like Code complexity, Code Coverage, static code analysis similar to the one done by StyleCop, Dependency management and many more useful features. In a typical enterprise application there are different  tools used to achieve these things. I have worked mostly with .Net framework in my prefessional experience and Microsoft Visual Studio is the default option to perform many of these things. I have used other tools like NCover for measuring code coverage. Back in 2011 I had used NDepend specifically to measure the Cyclomatic Complexity of different modules in our application.

How did I come across NDepend Pro?

Recently I was approached by NDepend developer to try the Pro version and evaluate its features. This post is about my experience of using NDepends (almost after a gap of 5 years) and see how it has evolved over the period of time.

I started with a very small codebase which I had developed for simulating the Martingale theory. The code is availabe on github. The codebase is very simple. It consists of a library which computes the amount for a particular trade based on the payout percentage. There are set of unit tests which are used to test the functions of the library. There is also a console application which acts as the client for invoking the library functions.

Analysis results

Below is the output of the analysis done using NDepend pro. Lets look at some of the details available via different tabs

Dashboard

NDepend Analysis Results

The summary breaks down into following categories on the dashboard

  1. # Lines of code
  2. # Types
  3. Comment %
  4. Method complexity
  5. Code Coverage by Tests (Allows you to import code coverage data from other tools)
  6. Third-party usage
  7. Code Rules

I find the Code Rules section personally very useful as it gives you hyperlinks to drill-down further into the details.

Dependency Graph

dependency graph

This graph gives a visual representation of the relationship between different assemblies. The best part I like is the interactivity of the graph. You can hover over the nodes and the affected nodes are dynamically highlighted using different colors. In the above example we have only 3 namespaces. But you can imagine how useful this can be in a real project when you have 100’s of classes and different namespaces involved.

There are multiple options to customize the way Depepndency Graph is represented. The default is based on the number of lines of code. But you can change it to any of the options shown below

Dependency graph options

Dependency Matrix

Dependency Matrix gives a matrix view of how different assemblies are dependent on one another. I find this feature very helpful as it gives within an instance a quick representation of the links between different assemblies in the application. The tools which I have used in the past like Visual Studio, NCover etc do not provide such feature.

Dependency Matrix

A well designed application will have good distribution of classes across different assemblies. You will also be able to see what could be the impact of replacing one thing with another. Lets take an example. Assuming you use a third party component like Infragistics in your application and for some reason you wish to replace it with something else. Using depepndency matrix you could find out which assemblies are dependent on Infragistics.

There are multiple options available via the context menu which gives in depth analysis of the code. I have not yet explored these options so far.

Metrics Heatmap

Metrics Heatmap

Heatmap is a feature which shows how classes are spread across different namespaces based on the cyclomatic complexity. The default measure of cyclomatic complexity can be changed to various other options like Il Cyclomatic Complexity, Lines of Codes, Percentage Comments etc etc.

For a small codebase like my sample Martingale theory tester, this analysis using NDepend is quite interesting. To make full use of the wonderful features the tool provides I intend to use this on a much larger codebase. I was recently refering to the CQRS Journey code from Microsoft Patterns and practices team. Let me see what I can discover from this decent size code using NDepend. I will keep the readers of this blog posted with the details in future posts.

Conclusion

I have always been a fan of code quality tools. NDepend has lot to offer in this area. I particularly liked the Dashboard and Dependency Matrix along with Dependency Graph. I have scratched just the top of the surface and I am excited to try other features offered by the tool. The feature I am interested in exploring more in future posts is the benchmarking of codebase. Based on my past experience and whatever little I have seen so far of the latest version, I would highly reccomend NDepend for code analysis. I personally find the integrated nature of the tool which provides so many aspects related to code quality in one single place. You can chose to run it as a standalone application which is what I did on this intance or you can integrate it within Visual Studio IDE. I also like the fact that it integrates nicely into the build process which is a must in todays world. Interoperability with other tools like TFS, TeamCity, SonarCube is another benefit.

I personally like the options offered to customize the default settings and configurations. The rules for example are mainly derived from what Visual Studio uses by default. You can always chose to filter the rules not relevant to your analysis. Another nice feature is the Code Query Language offered by NDepend using LINQ. This gives you a great ability to explore your code using queries.

There is so much to explore in NDepend that it is impossible to do it in one blog post. One feature which I did not cover in this post is the Queries and Rules Explorer.In my opinion it deserves a dedicated post. I will try to cover some of these features in future. Until next time Happy Programming.

Monday, January 04, 2016

Configure Standalone Spark on Windows 10

Background

Its been almost 2 years since I wrote a blog post. Hopefully the next ones will be much more frequent. This post is about my experice of setting up Spark as a standalone instance on Windows 10 64 bit machine. I got back to bit of programming after a long gap and it was quite evident that I struggledd a bit in configuring the system. Someone else coming from a .Net background and new to Java way of working might face similar difficulties tat I faced over a day to get Spark up and running.

 

What is Spark

Spark is an execution engine which is  gaining popularity due to its ability to perform in memory parallel processing. It claims to be upto 100 times more faster compared to Hadoop MapReduce processing methods. It also fits more in the distributed computing paradigm related to big data world. One of the positives of Spark is that it can be run in standalone mode without having to setup nodes in the cluster. This also means that we do not need to set up Hadoop cluster to get started with Spark. Spark is written in Scala & support Scala, Java, Python and R languages as of writing this post in January 2016. Currently it is one of the most popular projects among the different tools used as part of Hadoop ecosystem.

 

What is the problem in installing Spark in stand alone mode on Windows machine?

I started with downloading a copy of Spark distribution 1.5.2 Nov 9 2015 from the Apache website. I chose the version which is pre-built for Hadoop 2.6 and later. If you prefer you can also download the source code & build the whole package. After extracting the contents of the downloaded file, I tried running the Spark-shell command from the commnand prompt. If everything is installed successfully, we should get a Scala shell to execute our commands. Unfortunately on Windows 10 64 bit machine, Spark does not start very well. This seems to be a known issue as there are multiple resources on the internet which talk about it. 

When the Spark-shell command is executed, there are multiple errors which are reported on the console. The error which I received showed problems with creation of SqlContext. There was a big stack trace which was difficult to understand.

Personally this is one thing which I do not like about Java. In my past exxperience I always found it very difficult to debug issues as the error messages showed some error which may not be the correct source of the problem. I wish Java based tools and applications in future will be easier to deploy. In one sense it is good that it makes us aware of many of the internal things, but on the other hand sometimes you just want to install the stuff and get startedd with it without wanting to spend days configuring it.

I was referring to the Pluralsight course relted to Apache Spark fundamentals. The getting started and the installatioon module of the course was helpful in the first step to resolve the issue related to Spark. As suggested in the course, I changed the verbosity of the output for Spark from INFO to ERROR and the amount of info on the consoe reduced a lot. With this change, I was immediately able to get the error related to missing Winutils which is like a utility required specifically for Windows systems. This is reported as an issue SPARK-2356 in the Spark issue list. 

After copying the Winutils.exe file from the pluralsight course in the Spark installation’s bin folder, I was getting the permissions error for the tmp/Hive folder error. As reccommended in different online posts, I tried changing the permissions using chmod and setting it to 777. This did not seem to fix the issue. I tried running the command with administrative previlages. Still no luck.I updated the PATH environemnt variable to point to the Spark\bin directory. As suggested, I added the Spark_HOME, HADOOP_HOME to environment variables. Initially I had put the Winutils.exe file in the Spark/bin folder. I moved it out to dedicated directory named Winutils and updated the environemnt variable for HADOOP_HOME to this directory. Still no luck.

As many people had experienced the same problem with the latest version of Spark 1.5.2, I thought of trying an older version. Even in 1.5.1 I had the same issue. I went back to 1.4.2 version released in November 2014 and that seemed to create the SqlContext correctly. but the version is more than a year old, so there wass no point sticking to the outdated version. 

At this stage I was contemplating the option of getting the source code and building it from scratch. Having read in multiple posts about setting JAVA_HOME environment variable I thought of trying this apparoach. I downloaded the Java 7 SDK and created the environment variable to point to the location where jdk was installed. Even this did not solve the problem.

 

Use right version of Winutils

As a last option, I decided to download the Winutils.exe from a different source. In the downloaded contents, I got Winutils and some other dlls as well like Hadoop.dll as shown in the figure below.

Winutils with hadoop dlls

After putting these contents in the Winutils directory and running the Spark-shell command everything was in place and SqlContext was successfully created.

I am not really sure which step fixed the issue. Was it the jdk and setting of JAVA_HOME environment.Or was it the update of winutils exe along with other dll. All this setup was quite time consuming. Hope this is helpful for people trying to setup standalone instance of Spark on Windows 10 machines.

While I was trying to get Spark up & running, I found following links which might be helpful in case you face similar issues

The last one was really helpful from where I took the idea of separating Winutils exe into different folder and also to install JDK & Scala. But setting scala envirnment variables were not required as I was able to get the scala prompt without scala installation.

Conclusion

Following are the steps I followed for installing Standalone instance of Spark on Windows 10 64 bit machine

  • JDK (6 or higher version)
  • Download Spark distribution
  • Download correct version of Winutils.exe dll
  • Set Environment variables for JAVA_HOME, SPARK_HOME & HADOOP_HOME

Note : When running the chmod command to set 777 attributes for tmp/hive directory make sure to run the command prompt with Administrative privilages.

Submit Apache Spark job from Command Line to HDInsight cluster

Background This is the 3rd part of the Step by Step guide to run Apache Spark on HDInsight cluster. The first part was about provisioning t...