Sameer's Data Island: 2010

Sunday, August 15, 2010

Concatenating the pdf files into a single consolidated pdf file

Hi folks,
I’m back with a helpful stuff for those people looking for concatenating the pdf files into a single consolidated pdf file. Also this piece of code exhibit use of iTextSharp open source library. You can get this library from below mentioned link.
iTextSharp Library
Ok, Lets start step by step
1) Download the library.
2) Create a Console application.
3) Add the reference to itextsharp.dll assembly.
4) Add below given code into appropriate files and add these files to your project.
5) Build this application.
6) Now you are set to go.
What this program do?
It a command line application which takes output pdf file name along with path as the first argument followed by list of the source file(i.e. source pdf files) names along with path for each file. On successful execution of the application all the pdf documents will get consolidated into output pdf file.

Program.cs



   1:  using System;

   2:  namespace ConcatenatePDF

   3:  {

   4:      /// <summary>

   5:      /// 

   6:      /// </summary>

   7:      class Program

   8:      {

   9:          /// <summary>

  10:          /// 

  11:          /// </summary>

  12:          /// <param name="args"></param>

  13:          static void Main(string[] args)

  14:          {

  15:              if(args.Length>1)

  16:              {

  17:                  try

  18:                  {

  19:                      string[] sourceFile = new string[args.GetUpperBound(0)];

  20:                      Array.Copy(args, 1, sourceFile, 0, args.GetUpperBound(0));

  21:                      string strMsg=PDFOperations.ConcatenatePDF(sourceFile, args[0]);

  22:                      Console.WriteLine(strMsg);

  23:                      Console.ReadKey();

  24:                  } catch (Exception ex)

  25:                  {

  26:                      Console.WriteLine(ex.Message);

  27:                  }

  28:              }

  29:          }

  30:      }

  31:  }

PDFOperations.cs




   1:  #region Using Directives

   2:  using System;

   3:  using System.Collections.Generic;

   4:  using System.IO;

   5:  using System.Security;

   6:  using System.Security.Permissions;

   7:  using System.Text;

   8:  using iTextSharp.text;

   9:  using iTextSharp.text.pdf;

  10:  #endregion

  11:  namespace ConcatenatePDF

  12:  {    /// <summary>

  13:      /// PDFOpertaions class holds the methods to perform various tasks as of now

  14:      /// PDFOperations class have the method for concatenate the PDF file into a single pdf file.

  15:      /// </summary>

  16:      public class PDFOperations

  17:      {

  18:          /// <summary>

  19:          ///  This Method can be used for concatenating the more than on pdf file into a single 

  20:          /// pdf file. 

  21:          /// </summary>

  22:          /// <param name="sourcePdFfiles">This string arrays will hold the filenames along with the path to be 

  23:          /// concatenated

  24:          /// </param>

  25:          /// <param name="destinationPdfFile">Name of the output file along with path</param>

  26:          public static String ConcatenatePDF(Array sourcePdFfiles, string destinationPdfFile)

  27:          {

  28:              int numberOfPages;

  29:              StringBuilder stringBuilder= new StringBuilder();

  30:              List<string> srcPDFFileList = new List<string>();

  31:              Document pdfDoc = new Document();

  32:              FileIOPermission f2 = new FileIOPermission(FileIOPermissionAccess.Write | FileIOPermissionAccess.Read, destinationPdfFile);

  33:              try

  34:              {

  35:                  f2.Demand();

  36:                  PdfCopy copy = new PdfCopy(pdfDoc, new FileStream(destinationPdfFile, FileMode.CreateNew));

  37:                  pdfDoc.Open();

  38:                  foreach (string file in sourcePdFfiles)

  39:                  {

  40:                      if (File.Exists(file))

  41:                      {

  42:                          PdfReader reader = new PdfReader(file);

  43:                          numberOfPages = reader.NumberOfPages;

  44:                          for (int pageCount = 0; pageCount < numberOfPages; )

  45:                          {

  46:                              copy.AddPage(copy.GetImportedPage(reader, ++pageCount));

  47:                          }

  48:                      } else

  49:                      {

  50:                        stringBuilder.Append(String.Format("{0}: {1}", file, ": This file does not exists.\n"));

  51:                      }

  52:                  }

  53:                  pdfDoc.Close();

  54:                  if (File.Exists(destinationPdfFile))

  55:                  {

  56:                      stringBuilder.Append(String.Format("{0}: {1}", destinationPdfFile, ": created successfully.\n"));

  57:                  }

  58:              } catch (UnauthorizedAccessException s)

  59:              {

  60:                  stringBuilder.Append(String.Format(" Error Occurred :{0} \n", s.Message));

  61:                 

  62:              }

  63:              catch (SecurityException s)

  64:              {

  65:                  stringBuilder.Append(String.Format(" Error Occurred :{0} \n", s.Message));

  66:              }

  67:             return stringBuilder.ToString();

  68:          }

  69:      }

  70:  }

Please provide the feedback and revert for any clarification

Thursday, August 5, 2010

Convert the value ( i.e. Duration in Seconds ) to HH:MM:SS

///

/// Convert the value ( i.e. Duration in Seconds ) to HH:MM:SS
/// Ex: 5430 seconds is equal to 01 hour 30 min 30 sec therefore
/// output is 01:30:30
///

///
///
private string ConvertDurationFormat(int intDurationInSec)
{
const int iUnitOfSec = 60;
int iSec = (intDurationInSec % iUnitOfSec);
int iRemainder = Convert.ToInt16(intDurationInSec / iUnitOfSec);
int iMin = (iRemainder % iUnitOfSec);
int iHour = Convert.ToInt16(iRemainder / iUnitOfSec);
return String.Format("{0:00}:{1:00}:{2:00}", iHour, iMin, iSec);
}

Sunday, April 25, 2010

Day One Great Indian Developer Summit 2010

Topic: Service Oriented Application: The “Dublin” way by Bijoy Singhal

Let me tell something about Bijoy Singhal , he is an evangelist with Microsoft india and he works on evangelizing development technologies.

Session start with the explanation why we need workflow post which bijoy explain what was the developer task while building the workflow, those are as follows

Write workflow based service.
Host and Deploy the Service.
Persist, monitor and manage the Service.

Then he explained about Windows Server AppFabric ( a.k.a Dublin) which is a set of integrated technologies that make it easier to build, scale and manage Web and composite applications that run on IIS. Also told where we can get the windows Server AppFabric

Download the Beta 2 of Windows Server AppFabric and the installation notes.

Bijoy also spoke about the Velocity (a high-performance, distributed, highly-scalable in-memory cache), the Service Bus and Access Control Services.

He added” By bringing in the capabilities formerly in code-name Dublin, AppFabric provides host capabilities for .NET 4.0 Windows Workflow Foundation and Windows Communication Foundation services, with extended management through the IIS manager, plus the ability to do service monitoring. This is much needed functionality as V1 (V3.0) and V2 (3.5) of both WCF and WF didn't ship with much tooling - you had to roll out your own for hosting, monitoring, management and so forth”

At whole I just got to know few concepts apparently I believe that until I create one application and host it I will not be comfortable in this particular area.

But interesting stuff l like to learn till then cye.

Tuesday, April 20, 2010

Day One of Great Indian Developer Summit 2010

Hi I’m back with the updates about Great India Developer Summit 2010

- Day 1 GIDS.NET

Day One was dedicated to .net technology. We had lot of session on the .net technology. .Net framework areas covered on day 1 session are ASP.NET,ADO.net Entity framework,WCF,WWF,Cloud application development,AJAX etc,.

The First Session that i attended was "Business Intelligence Design Patterns- BI Made Easy" by Stephen forte, the Regional Director of Microsoft New York.

Day started with a welcome speak by the host of GIDS program, sorry I don’t know her name. Post which Stephen took over the charge and made the audience dive into the world of Business Intelligence Design pattern.

He explained the Advantages of Business Intelligence design pattern by explaining the advantages of BI and ETL process.

These are the points that were discussed during session

For any Transactional Database Data Entry optimization is utmost key important area.
Online Banking is the best example of Business Intelligence design pattern.
Deciding the schedule for updating data is important will working with business Intelligence Design Patterns.
Deciding the details that needs to be Transfer to data warehouse is also important. These are called Facts
Processes involved in building the Data ware house are

o Summarization of Database

o Transformation Process

o Running Reports against the Data ware house.

Some of the Terminology involved in the Data ware house paradigm are

o Facts table – Captures the data at details level

o Measures- Numeric facts, a column in a facts table

o Dimension Table-is a structure that categories the data.

There are two model or schema involved in the Data ware house

o Star Schema

o Snowflake schema

In Star Schema these are the points to be aware of

o Fact table are in Third Normal Form

o Dimensional tables are de-normalized to (2NF) and Contains Hierarchies and Partitions

o Star Schema diagram depicted below

In Snow Flakes Schema is derived from Star, but normalized dimension table.

Monday, April 19, 2010

SQL query for identifing the 20 slowest queries on your server

Hi All,

I was reading a book from manning publisher titled "SQL Serer DMV's IN ACTION" and came across this useful query for finding the 20 Slowest queries in your server which is very useful stuff for the people who are facing database performance issue. I thought i will post this on my blog and will be handy resource .

All the credit goes to the author of this finest book

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT TOP 20
CAST(total_elapsed_time / 1000000.0 AS DECIMAL(28, 2)) #A
AS [Total Elapsed Duration (s)]
, execution_count
, SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1, #B
((CASE WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE
qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS [Individual Query]
, qt.text AS [Parent Query]
, DB_NAME(qt.dbid) AS DatabaseName
, qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
INNER JOIN sys.dm_exec_cached_plans as cp
on qs.plan_handle=cp.plan_handle
ORDER BY total_elapsed_time DESC #C

#A Get query duration
#B Extract SQL statement
#C Sort by slowest queries

Explanation from the book mentioned above :

The DMV sys.dm_exec_query_stats contains details of various metrics that relate to an individual SQL statement (within a batch). These metrics include query duration (total_elapsed_time),the number of times the query has executed (execution_count).Additionally,it records details of the offsets of the individual query within the parent query.To get details of the parent query and the individual query, the offset parameters are passed to the DMF sys.dm_exec_sql_text.

The Cross Apply statement can be thought of as a join to a table function that in this case takes a parameter,. Here the parameter is the id of the cached plan that contains the textual representation of the query. The query’s cached plan is also output, as XML. The results are sorted by the total_elapsed_time. To limit the amount of output only the slowest 20 queries are reported on.

The results show the cumulative impact of individual queries, within a batch or stored procedure. Knowing the slowest queries will allow you to make targeted improvements,confident in the knowledge that any improvement to these queries will have the biggest impact on performance.

The cached plan is probably the primary resource for discovering why the query is running slowly, and often will give an insight into how the query can be improved.The NULL values in the Databasename column mean the query was run either ad hoc or using prepared SQL (i.e. not as a stored procedure). This itself can be interesting since it indicates areas where stored procedures are not being re-used, and possible areas of security concern. Later, an improved version of this query will get the underlying database name for the ad-hoc or prepared SQL queries from another DMV source.

Take Care , C u soon

Sunday, April 18, 2010

GIDS 2010 Event -20th April 2010

Hi All,

Will be attending Great Indian Developer Summit- 2010 on 20th April 2010.
Lot of Technical sessions will be held during the Events,But i will be attending this sessions

Business Intelligence Design Pattern:BI Made Easy
Service Oriented Application "The Dublin" Way.
Overview of Cloud Computing and Introduction to windows Azure
Lights on the Cloud.
Migrating your application to window Azure
Building a 3-tier application with ASP.NET,WCF,RIA Services and ADO.NET Entity Framework.
Advance T-SQL Querying and programming inside SQL Server.
Testing with Dependencies.

These are 50 min sessions, I will post details that have been discussed during the Sessions for each topics above.

Till then take care,