Transforming TXT Files into XML Using Linq to Xml (XLinq)

Some weeks ago I started working on a little code
sample to demonstrate the Xml transformation capabilities of Linq to Xml (aka XLinq).
The code was originally intended to be used during a demo on a major developer’s conference
here in Brasil (
http://www.baboo.com.br/absolutenm/templates/content.asp?articleid=25189&zoneid=224).
I decided to post this sample along with some explanations here, mainly because it
seemed to have caught the attention of some folks who attended my session. So please
bear with me and give yourself a chance to fall in love with this wonderful technology
as I did as soon as I started working with it.



 

Our goal here is to take a group of log files
from IIS (Internet Information Services) and extract some analytical page access information
from them. The log files generated by IIS are stored in the \%WINDIR%\System32\Logfiles\W3SVC1,
as the following picture shows:

 


>

 

Each file stores information about
the requests received by IIS for a given web site on a given day. The content of each
file looks something like this:

 

 

The lines beginning with the # char
are just comments. Each other line represents a specific hit to a web server resource,
specifying the time the request occurred, the IP address of the requesting computer,
the HTTP method used (GET, POST, etc), the resource location and finally the HTTP
status code of the request (200 for a successful request, 404 for a page not found
status, etc).


 

Our intent is to read all the lines
of each of the files existing in the directory, and count the number successfully
accesses of each file, producing a resulting Xml document that is similar to the one
shown below:

 

 

 

 

Well, it seems to be like a lot of work, right? Yes!
It is.
But as we are going to use LINQ to accomplish our goal, most part of
the complexity will be abstracted from us, the developers. We are going to employ
a query semantics that will turn the code much simpler that its procedural counterpart
would be. Besides that, the Linq to Xml API will make the transformation to the Xml
format a very natural task.


 

Before showing you the code, let me say that although
my language of choice is C#, I decided, for the purpose of this demo, to write this
transformation using VB 9. The main motivation behind that decision was the fact that
VB 9 will support the concept of Xml Literals and Xml axis members, which still don’t
have a correspondence in C# 3.0. Maybe those concepts will be incorporated in C# 3.0
as well, but the decision is up to Microsoft and we don’t have a definitive position
until now. That said, let’s see the code:

>>> 

    Private Function GenerateXmlLog() As XElement



 

 
     
Dim xmlContent As XElement
= _

            <IISLog>

                <%= _

                From logFile In New DirectoryInfo(Me.LogFilesDirectory).GetFiles()
_

                    Select GetXmlFromLogFile(logFile)
_

                %>

            </IISLog>


 

        Dim summary As XElement
= _

            <Summary>

                <%= _

                From entry In xmlContent…<Entry> _

                    Where entry.<Status>.Value
=
“200” _

                    Group By entry.<Url>.Value
_

                    Select _

                        <Entry>

                            <Url><%= It.Key %></Url><Hits><%= Count(It) %></Hits>

                        </Entry> _

                %>

            </Summary>


 

        Return summary



 

    End Function


 

    Private Function GetXmlFromLogFile(ByVal logFile As FileInfo) As XElement



 

        Dim sr As StreamReader
=
New StreamReader(logFile.FullName)

        Dim fileContent As String =
sr.ReadToEnd()



 

        Dim logIis
= _

            <Date Id=<%= logFile.CreationTime() %>>

                <%= _

                From line In fileContent.Split(Environment.NewLine)
_

                    Where Not line.StartsWith(“#”)
_

                    Select _

                        <Entry>

                            <Time>

                                <%= line.Split(
).Skip(0).Take(1) %>

                            </Time>

                            <Ip>

                                <%= line.Split(
).Skip(1).Take(1) %>

                            </Ip>

                            <Url>

                                <%= line.Split(
).Skip(3).Take(1) %>

                            </Url>

                            <Status>

                                <%= line.Split(
).Skip(4).Take(1) %>

                            </Status>

                        </Entry> _

                %>

            </Date>


 

        Return logIis



 

    End Function

 

This code is all we need to get
the work done. Impressive, isn’t it? The magic lays on the set semantics we are employing
here by means of the Language Integrated Query features. The creation of the final
Xml document is also facilitated by the Xml literal features of VB 9. In a future
post I will show how this code could be written in C# 3.0, which has some conceptual
differences that were brilliantly pointed out by Anders Hejlsberg and Amanda Silver
in this post that I started at the XLinq MSDN Forum:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=574140&SiteID=1.


 

Notice that the code above accomplishes
its task in two phases. The first phase creates a plain Xml document from the TXT
file, which is stored into the xmlContent variable, which is of type XElement. The
second phase takes this Xml fragment and transforms it into another document (summary),
this time containing the summary information that composes the final Xml format.


 

At the end of the demo, I showed
a Windows Forms application written in C# that queries the Xml document created by
the previous code and plots a bar chart with the selected files and their specific
hits count. For example, the following query would show a chart as shown in the figure
below:

 

var succefullRequests
=

   from entry in log.Elements(“Entry”)

   where entry.Element(“Url”).Value.EndsWith(“aspx”)
&&

         !entry.Element(“Url”).Value

           
                  .EndsWith

              
               
(

                 
                     
“login.aspx”,

                    
                  
StringComparison
.CurrentCultureIgnoreCase

                       
       )

   select new

   {

      Url =
entry.Element(
“Url”).Value,

      Hits
= (
int) entry.Element(“Hits”)

   };



 





 

var top10
= (
from request in succefullRequests

             orderby request.Hits ascending

             select request).Take(10);


 




 

 

The first Linq query above gets
all the aspx pages from the Xml document, taking off the Login.aspx page. After that,
the result of this query is used in the second query where only the top 10 most accessed
pages are retrieved. This result set is finally plotted onto the chart. Notice that
the first query is projecting an anonymous class that has two properties: Url and
Hits. This clearly shows the flexibility we will have when using Linq and Linq to
Xml in the near future, when this technology finally gets released.

 

The chart in the figure was created
using pure GDI+ code. Shame on me, because I didn`t have enough competency to make
it a WPF code. Maybe in the future can I take some time and do this.


 

That’s all for this post. I hope
you have gotten some interest in this subject of Linq and Linq to Xml and also that
I could have shown an interesting example of how these technologies will change the
way we write (and read) code in the future.


 

Thanks for your time!