LINQ Series – Parte 1



 

Na primeira
parte desta série
, fizemos uma breve introdução ao Linq e vimos como preparar
o ambiente de desenvolvimento para começarmos a trabalhar com essa maravilhosa tecnologia.
Gostaria de, nesta, que chamo “parte 1”, fazer um pequeno exercício de compreensão
que, acredito, possa ser esclarecedor de algumas das motivações do Linq. Então, vamos
à questão: como você faria para escrever no console todos os números pares existentes
em um array de inteiros qualquer?



 

Acredito
que a resposta mais básica e simplória para essa questão – embora claramente válida
– é a utilização de um loop for (ou foreach) que iteraja por todos os elementos do
array, teste cada um deles para verificar se é par e, caso positivo, escreva-o no
console. O código abaixo mostra como isso pode ser feito:

 >

 

            int[]
numbers = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };



 

            foreach (int number in numbers)

            {

                if (number
% 2 == 0)
Console.WriteLine(number);

            }

 

Este
código é perfeitamente válido para as versões 1.1 e 2.0 do Framework .NET e cumpre
a tarefa proposta. Mas vamos tentar mudar um pouco o foco procedural e iterativo do
código acima para algo mais declarativo. O que isso quer dizer? Teria alguma forma
de tormarmos mais clara a intenção desse nosso código? Vamos explorar um método introduzino
na classe Array no .NET 2.0 chamado ForEach<T>.



 

            int[]
numbers = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };



 

            Array.ForEach

            (

                numbers,

                delegate(int number)

                {

                    if (number
% 2 == 0)
Console.WriteLine(number);

                }

            );



 

O
método Array.ForEach<T>() é um método genérico estático que pode ser usado em
arrays de qualquer tipo. Veja como a assinatura deste método está definida na classe
System.Array:



 

public static void ForEach<T>(T[]
array,
Action<T>
action);



 

O
tipo Action<T>, no segundo parâmetro, é um delegate definido da seguinte forma:



 

public delegate void Action<T>(T
obj);


 

O
método ForEach iterage por todos os elementos do array passado e, para cada elemento,
chama o método referenciado pelo delegate passando o elemento como argumento para
este método. Simplificando, o método referenciado pelo delegate será chamado para
cada item do array.



 

No
nosso exemplo, o delegate é definido anonimamente, ou seja, não foi criado um método
separado para esta finalidade. O código a ser executado foi simplesmente passado para
a operação ForEach. Este recurso, chamado Anonymous Delegates, foi introduzido no
.NET 2.0 e é um dos grandes habilitadores do design do Linq, como veremos em posts
futuros.



 

Embora
o recurso de anonymous delegates seja extremantente intessante, o código acima parece
ter ficado muito mais obscuro do que o código que utilizava o loop foreach tradicional.
Por conta disso, vamos dar um passo à frente e verificar como este código poderia
ser escrito usando o recurso de linguagem de consulta integrada (Linq – Language Integrated
Query), que será incorporado ao C# 3.0 e ao VB 9:



 

                int[]
numeros = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };



 

                IEnumerable<int>
numerosPares =
from numero in numeros

                                                where numero
% 2 == 0

                                                select numero;



 

                foreach (int par in numerosPares)

                {

                    Console.WriteLine(par);

                }



 

Embora
tenhamos escrito mais linhas do que no código original, a semântica de manipulação
de conjuntos proporcionada pela query Linq deixa claro que estamos consultando os
números pares de um conjunto original (array numbers). O resultado da query é mantido
em uma variável do tipo do tipo IEnumerable<int> denominada numerosPares. Para
escrever esses números no console, simplesmente usamos uma construção foreach tradicional.



 

O
código acima pode ser digitado em qualquer projeto que utilize um dos templates de
projeto do LINQ Preview, como vimos na parte
anterior desta série
. Na realidade, o código é dependente do assembly System.Query.dll,
que é referenciado automaticamente quando usamos um desses templates.



 

Transforming TXT Files into XML Using Linq to Xml (XLinq)

Some weeks ago I started working on a little code
sample to demonstrate the Xml transformation capabilities of Linq to Xml (aka XLinq).
The code was originally intended to be used during a demo on a major developer’s conference
here in Brasil (
http://www.baboo.com.br/absolutenm/templates/content.asp?articleid=25189&zoneid=224).
I decided to post this sample along with some explanations here, mainly because it
seemed to have caught the attention of some folks who attended my session. So please
bear with me and give yourself a chance to fall in love with this wonderful technology
as I did as soon as I started working with it.



 

Our goal here is to take a group of log files
from IIS (Internet Information Services) and extract some analytical page access information
from them. The log files generated by IIS are stored in the \%WINDIR%\System32\Logfiles\W3SVC1,
as the following picture shows:

 


>

 

Each file stores information about
the requests received by IIS for a given web site on a given day. The content of each
file looks something like this:

 

 

The lines beginning with the # char
are just comments. Each other line represents a specific hit to a web server resource,
specifying the time the request occurred, the IP address of the requesting computer,
the HTTP method used (GET, POST, etc), the resource location and finally the HTTP
status code of the request (200 for a successful request, 404 for a page not found
status, etc).


 

Our intent is to read all the lines
of each of the files existing in the directory, and count the number successfully
accesses of each file, producing a resulting Xml document that is similar to the one
shown below:

 

 

 

 

Well, it seems to be like a lot of work, right? Yes!
It is.
But as we are going to use LINQ to accomplish our goal, most part of
the complexity will be abstracted from us, the developers. We are going to employ
a query semantics that will turn the code much simpler that its procedural counterpart
would be. Besides that, the Linq to Xml API will make the transformation to the Xml
format a very natural task.


 

Before showing you the code, let me say that although
my language of choice is C#, I decided, for the purpose of this demo, to write this
transformation using VB 9. The main motivation behind that decision was the fact that
VB 9 will support the concept of Xml Literals and Xml axis members, which still don’t
have a correspondence in C# 3.0. Maybe those concepts will be incorporated in C# 3.0
as well, but the decision is up to Microsoft and we don’t have a definitive position
until now. That said, let’s see the code:

>>> 

    Private Function GenerateXmlLog() As XElement



 

 
     
Dim xmlContent As XElement
= _

            <IISLog>

                <%= _

                From logFile In New DirectoryInfo(Me.LogFilesDirectory).GetFiles()
_

                    Select GetXmlFromLogFile(logFile)
_

                %>

            </IISLog>


 

        Dim summary As XElement
= _

            <Summary>

                <%= _

                From entry In xmlContent…<Entry> _

                    Where entry.<Status>.Value
=
“200” _

                    Group By entry.<Url>.Value
_

                    Select _

                        <Entry>

                            <Url><%= It.Key %></Url><Hits><%= Count(It) %></Hits>

                        </Entry> _

                %>

            </Summary>


 

        Return summary



 

    End Function


 

    Private Function GetXmlFromLogFile(ByVal logFile As FileInfo) As XElement



 

        Dim sr As StreamReader
=
New StreamReader(logFile.FullName)

        Dim fileContent As String =
sr.ReadToEnd()



 

        Dim logIis
= _

            <Date Id=<%= logFile.CreationTime() %>>

                <%= _

                From line In fileContent.Split(Environment.NewLine)
_

                    Where Not line.StartsWith(“#”)
_

                    Select _

                        <Entry>

                            <Time>

                                <%= line.Split(
).Skip(0).Take(1) %>

                            </Time>

                            <Ip>

                                <%= line.Split(
).Skip(1).Take(1) %>

                            </Ip>

                            <Url>

                                <%= line.Split(
).Skip(3).Take(1) %>

                            </Url>

                            <Status>

                                <%= line.Split(
).Skip(4).Take(1) %>

                            </Status>

                        </Entry> _

                %>

            </Date>


 

        Return logIis



 

    End Function

 

This code is all we need to get
the work done. Impressive, isn’t it? The magic lays on the set semantics we are employing
here by means of the Language Integrated Query features. The creation of the final
Xml document is also facilitated by the Xml literal features of VB 9. In a future
post I will show how this code could be written in C# 3.0, which has some conceptual
differences that were brilliantly pointed out by Anders Hejlsberg and Amanda Silver
in this post that I started at the XLinq MSDN Forum:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=574140&SiteID=1.


 

Notice that the code above accomplishes
its task in two phases. The first phase creates a plain Xml document from the TXT
file, which is stored into the xmlContent variable, which is of type XElement. The
second phase takes this Xml fragment and transforms it into another document (summary),
this time containing the summary information that composes the final Xml format.


 

At the end of the demo, I showed
a Windows Forms application written in C# that queries the Xml document created by
the previous code and plots a bar chart with the selected files and their specific
hits count. For example, the following query would show a chart as shown in the figure
below:

 

var succefullRequests
=

   from entry in log.Elements(“Entry”)

   where entry.Element(“Url”).Value.EndsWith(“aspx”)
&&

         !entry.Element(“Url”).Value

           
                  .EndsWith

              
               
(

                 
                     
“login.aspx”,

                    
                  
StringComparison
.CurrentCultureIgnoreCase

                       
       )

   select new

   {

      Url =
entry.Element(
“Url”).Value,

      Hits
= (
int) entry.Element(“Hits”)

   };



 





 

var top10
= (
from request in succefullRequests

             orderby request.Hits ascending

             select request).Take(10);


 




 

 

The first Linq query above gets
all the aspx pages from the Xml document, taking off the Login.aspx page. After that,
the result of this query is used in the second query where only the top 10 most accessed
pages are retrieved. This result set is finally plotted onto the chart. Notice that
the first query is projecting an anonymous class that has two properties: Url and
Hits. This clearly shows the flexibility we will have when using Linq and Linq to
Xml in the near future, when this technology finally gets released.

 

The chart in the figure was created
using pure GDI+ code. Shame on me, because I didn`t have enough competency to make
it a WPF code. Maybe in the future can I take some time and do this.


 

That’s all for this post. I hope
you have gotten some interest in this subject of Linq and Linq to Xml and also that
I could have shown an interesting example of how these technologies will change the
way we write (and read) code in the future.


 

Thanks for your time!