Skip to main content

Debugging in production

Abstract
In this article we have discovered how we can create a dump and which the basic tools to analyze it are. By means of dump files we can access the information we could not normally access. Some data can only be accessed through these dumps and not by other means (Visual Studio debug).
We could state that these tools are very powerful, but they are rather difficult to use, as they require quite a high degree of knowledge.

How many times has it happened to you to have a problem in production or in the testing environment which you are unable to reproduce on the development machine? When this thing happens, things can go off the track, and we try out different ways of remote debugging. Without even knowing, helpful tools can be right near at hand, but we ignore them or simply don’t know how to use them.
In this article I will present different ways in which we can debug without having to use Visual Studio.
Why not using Visual Studio?
Though Visual Studio is an extremely good product, which helps us when we need to discover bugs and to debug, it won’t be of great help to us in production. The moment when we have a bug in production, the rules of the game change. In production, the application is compiled for release, and debugging is no longer possible.
When do we need these tools?
The moment we cannot reproduce the problem on our development machines. No matter what we do, we are not able to reproduce the problem we are dealing with. Since we cannot reproduce the problem, it’s like looking for a needle in haystack.
If by chance the problem appears, but we do not have a reproduction scenario, we find ourselves back in the situation above mentioned.
Another case is when the memory occupied by our application increases in time, the phenomenon emerging only on the production machines. We can only guess what the problem is, but we do not know the exact cause. That is why we may “fix” totally different code areas.
What solutions do we have?
Generally, there are two possibilities at hand. The first one is entirely based on logs. Through logs we are able to identify the application areas which do not function appropriately. But using logs can be two-edged. It is required that you know what exactly must appear in logs and how often. Otherwise, you may end up with thousands of pages of useless and almost impossible to analyze logs. If we end up having too many logs, we may be taken by surprise by the alteration of the behavior of the application.
When it is possible, we can send the PDBs on the production machine. This way we will have access to the entire stack trace generated by an exception.
Logs can be of great help for us to solve different problems that emerge in production. But even if logs are very useful, they won’t help us every time. There are different problems which can appear and which are extremely difficult to identify by using logs. For example, a dead-lock would be almost impossible to identify by means of logs.
Another alternative that is available for us is creating memory dumps and analyzing them.
What is a memory dump?
A memory dump is a snapshot of the process on a certain moment. Besides the information regarding the allocation of memory, a snapshot also contains information on the state of different threads, objects and cod. By using this information we can obtain very valuable information regarding the process that is running. This snapshot represents the image of the memory in 32 or 64 bits format, depending on the system.
Generally, there are two types of memory dump. The first one is minidump. This is the most uncomplicated memory dump that can be done, which consists of mere information on the stack – the state of the process or on the calls that are made and so on.

The second type of memory dump is full dump. It contains all of the information that can be obtained, including a snapshot on memory. It takes much longer to obtain a full dump compared to a minidump and the dump file itself is much bigger.
How can we generate a memory dump?
There are different applications which allow us to do this. Some of them allow us to automatically generate a dump, according to different parameters.
In the case we need to generate a memory dump, the easiest solution is the Task Manager. All we have to do is to click right on a process and select “Create dump file”. We can do the same thing also by using Visual Studio or “adplus.exe”. The last alternative is a debug tool for Windows which can be found on almost all machines on which Windows runs.
In the following example, we place an order in adplus to create a memory dump at this moment:
adplus –hang –o C:myDump –pn MyApp.exe
By means of pn option we specify the name of the process for which we wish to create a dump. If we want to create a dump automatically we can use the –crash option.
adplus –crash –o C:myDump –pn MyApp.exe
adplus –crash –o C:myDump –sc MyApp.exe
If it is necessary for us to automatically create a dump, besides “adplus.exe” we can use DebugDiag and “clrdmp.dll”. The three options we have in order to automatically create a dump are rather similar. DebugDump allows us to set up the system so that it automatically generates a memory dump the moment when the CPU level is higher than X% within a certain time span.
Besides these tools there are many others on the market. Depending on your requirements, you can use any tool of this type.
How do we analyze a dump?
The native debugger for a dump is represented by Windbg. This is a powerful tool, by means of which one can get very valuable information. The only problem of this tool is that it is not very friendly. We will see a little later what the alternatives to Windbg are. We must remember that in almost all the cases, the alternatives to Windbg are using this debugger behind – it’s just that they display a friendlier and more useful interface.
An alternative to Windbg is any Visual Studio that is more recent than Visual Studio 2010. Beginning with Visual Studio 2010, they offer us the possibility to analyze the dumps for .NET 4.0+. What we can do in Visual Studio is not as advanced as what Windbg allows us to do, but generally it can suffice.
Windbg

The first step we need to take after opening Windbg is to upload a dump (CTRL+D). Once uploaded, a dump can be visualized in different manners. For example, we can analyze the threads, the memory, the allocated resources and so on.
In order to be able to do more, for instance to visualize and analyze the managed code, we need to upload additional libraries such as Son of Strike (SOS) or Son of Strike Extension (SOSEX). These two libraries open new doors for us, as they are able to analyze the data from the dump in an extremely useful way.
Son of Strike (SOS)
SOS allows us to visualize the process in itself. It allows us to access the objects, threads and information from the garbage collector. We can even visualize names of variables and their value.
One must know that all the information that can be accessed is part of the managed memory. Therefore, SOS is highly connected to CLR and its version. When we upload the SOS module, we must make sure we are uploading the one that is correspondent to the .NET version of our application.
.loadby sos mscorks
.loadby sos clr
It the examples above, we have uploaded the SOS module for .NET 3.5-, and in the second example, we have uploaded SOS for .NET 4.0+.
All the SOS orders start with “!”. The basic order is “!help”. If we wish to visualize the threads list, we can employ the “!threads” order which has an output that is similar to the following:
0:000> !threads
ThreadCount: 5
UnstartedThread: 0
BackgroundThread: 2
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
Lock
     ID    OSID    ThreadOBJ Count Apt Exception
…
Debug a crash
So far we have seen there are many tools available for us to create and analyze a dump. Time has come now to see what we have to do in order to be able to analyze a crash.

  • 1. Launch the process
  • 2. Before it “crashes”, we order adplus to create a dump the moment when the process “crashes”
  • adplus –crash –pn [numeProcesor]
  • 3. Launch Windbg (after the crash)
  • 3.1 Upload the dump
  • 3.2 Upload SOS
  • 3.3 !threads (to see which thread has crashed)
  • 3.4 !PrintException (on the thread that has crashed in order to see the exception)
  • 3.5 !clrstack (to see the stack of calls)
  • 3.6 !clrstack –a (to see the stack together with the parameters)
  • 3.7 !DumpHeap –type Exception (it lists all the exceptions that are not related to GC).

One must know that the results are according to the way in which the application is compiled. For instance, if there has been a code optimization performed during the compilation. Moreover, the exception list we can get may be quite long due to some orders such as !DumpHeap, which returns all the exceptions encountered – even the ones which have been pre-created, such as ThreadAbord.
How do we identify a deadlock?
A deadlock emerges when two or more threads are waiting for the same resource. In these cases, a part of the application, if not the entire application, gets blocked.
In this case, the first step is to create a dump using the order:
Addplus –hang –o –c:myDump –pn [NumeProces]
Then, it is necessary for us to analyze the stack trace for each thread and see whether it is blocked (Monitor.Enter, ReadWriteLock.Enter…). Once we have identified these threads, we can find the resources used by each thread, together with the thread that keeps these resources blocked.
For these final steps, the order “!syncblk” comes to our help. It lists for us the units of memory for a certain thread.

This article was written by Radu Vunvulea for Today Software Magazine.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP