Say Perl

Worldwide Perl Blogging

Totally
311 feeds,
9627 posts.

R for some breathing when correcting exams

30 January, 11:08, by manu, machine translated from French

Yeah, the title of this post definitely falls in the genre "Valve rotten."

With the season of examinations for students, has inexorably between the corrections for the teacher. So far, I practiced the "correction calculator. That is to say, after having corrected copies, I took my trusty calculator, an examination of the pile, and I am the sum of points obtained for different issues, then I divide the total by the divider how to get my score of 20.

This method is probably performed by many teacher, but she does not really satisfy me. Indeed, if the total points achieved by my students is interesting, I also have the results by question 1 , or even establish correlations between certain variables.

A solution chosen by many teachers today is to use a spreadsheet such as Excel or OpenOffice.org Calc. We encode the data, we construct its forms and, hey presto, we have the results. The only problem from my point of view (and yes, I quibble) is that we must systematically rebuild his spreadsheet:

  1. with the data, and it's hard to do otherwise!
  2. with formulas, there by cons could abstract

And abstract forms, why not write a script! Well, I could do with Perl, or any other scripting language (Python, Ruby, etc.), but since my goal is to make statistics on the data, as well use a scripting language specialized in the treatment of data. I named R . There are a lot of books that are out on the data analysis, and R lately. So I had the opportunity to read some introductions on this language, and therefore without being an expert, I wrote the following script

This code is relatively trivial and probably is not idiomatic for advanced programmers. But it works, and I ask only get better 2 .

The first line, #!/usr/bin/env Rscript shebang is the classic script. In the case of R, a script exists so as to pass the right arguments to the interpreter R.

This script receives arguments, so I have to recover them. I stayed in a basic management, and so I used the function commandArgs(TRUE) which returns a list of arguments. The order of these arguments is arbitrary. In my case:

  1. first argument must be the name of the CSV file;
  2. The second argument should be the total consideration.

The first operation is to calculate the divisor needed to obtain the final score.

The second operation is to "load" the data.

And then I built a second data structure containing the results of the operation, and I loop to process the data.

Then I display the average for the course, and the data structure.

And the job is done!

Side improvement, here's what crossed my head:

  1. management of specific files (CSV managing different separators, symbols to frame data, etc.);
  2. management of the columns "reviews" and column "work practices";
  3. produce a final result, taking into account the fact that the examination accounts for 60% of the final grade, while work accounts for 40%;
  4. use getopt to manage the arguments of the script;
  5. create a CSV file containing the calculated data rather than simply display.

Footnotes Page:

One which enables me to recognize the hard questions, which might point to a subject that I have less well explained

2 so if you have any suggestions in.

Modern Perl

25 October, 23:34, machine translated from French

Different exercise now, since it is a critique of a book.

Few days ago I received a copy of the book Modern Perl , released October 29 editions Pearson. It is interesting in more ways than one: it is an original book, not a translation of yet another book on Perl, it focuses on the Perl-called "Moderne", in other words tools like Moose , DBIx:: Class , etc., and is written by people involved in the Perl community (these are authors of CPAN modules that organize conferences Perl).

I want to commend the authors in passages of the book: Maddingue , BooK , jq and dams .

The first pleasant surprise is the size of the book in paperback. I find it convenient to carry in his bag (user of the metro, good) and leave it lying on the desk without taking up space. The second pleasant surprise is the number of topics: a solid introduction to basic Perl programming object, regular expressions, databases, XML manipulation, and tools to work on the web.

One very positive part of his book on object-oriented programming. The authors chose to present Moose as "The 'object model to use. They do not address all of the basic object model in Perl (no explanation of bless, etc.), but it seems a wise choice in the context of learning Perl. If the person comes from another language object, she finds herself immediately with bases she knows (accessors, methods, inheritance, etc.) and possibly new paradigms such as Roles. If the person already knows Perl, no repetition.

Disliking particular SQL, I found nice to have a good introduction to DBIx:: Class, which complements the section on DBI. Databases called "NoSQL" are also presented, including CouchDB.

The last part is devoted to the Web, with the presentation of tools for manipulating HTML, réccuperer content, or automate behavior on the pages for WWW:: Mechanize.

Moreover, if after reading this book it comes to the idea of wanting réccuperer content on web pages using regular expression, it is likely that you got to read this book upside down, or is it sheer love of provocation against the perpetrators.

A, and a negative point, you'd tell me? Of course I do. To my great regret, nowhere in the book he is referring to Dancer . Too bad!

Overall it is a good book for who wants to discover Perl in 2010. All the tools are expected to use every day are presented. The book's organization, and having many examples, will be practical for beginners. I think in order some for work, to make it available to our (future) students.

Modern Perl is published by Pearson , released October 29. ISBN: 978-2-7440-2419-1 (22 €)

Modern Perl, the survival guide Programmer Perl

22 October, 17:41, by sukria, machine translated from French

Pearson publish editions on October 29 a newcomer to the series " Survival Guide ": Modern Perl .

This book, co-written by Sebastien Aperghis Tramoni-Philippe Bruhat, Damien Krotkine Quelin and Jerome wants to be the most current practices in Perl. I received a copy, and I propose you to enjoy.

A quick glance at the table of contents gives color: this book will guide the developer Perl, whatever their level. Indeed the first part is clearly an introduction to the language all the basics are explained.
A beginner may therefore refer to the first part of the book to learn to program in Perl, but an experienced developer will probably go directly consult the following sections.

If this first part is clearly devoted to beginners, the following will certainly interest the more experienced Perl developers, because they address all important issues that may arise in the daily life of a programmer: the methodology Object (with Moose), management (SQL, Files, Serialization), the management of the dates, event programming (OCB), the parsing of XML document, HTML or manipulation of network request (LWP).

This book, part of its format and content, seems to be the perfect companion Perl programmer, just the same way that small French-English bilingual dictionaries are for the high school student in English class.

Another title would probably have been "Best of the CPAN, a thematic guide to the essentials for the programmer." For it is ultimately what it is: an informed selection and precise about what is best in the CPAN to respond to common problems.

Modern Perl is published by Pearson , released October 29. ISBN: 978-2-7440-2419-1 (22 €)

Small toolbox for podcasts

12 September, 21:02, by manu, machine translated from French

The tool of the day has probably already been written and rewritten by three-quarters of programmers on the planet, but hey, this is part of the small gym weekly.

So I needed a tool allowing me to download files from a podcast from the command line. Nothing so complex, and any tool that was already working to my knowledge (and I admit to not having tried!).

Here's the script:

#! / Usr / bin / env perl
use strict;
use warnings;

package App:: Podcast;

use Getopt:: Long;
use LWP:: UserAgent;
use XML:: LibXML;
use File:: Spec;
use File:: Basename;
use Carp;
use Term:: ANSIColor;

__PACKAGE__-> Run () unless caller ();

sub new (
    my ($ class,% args) = @ _;
    my $ self = ();

    $ Self-> (url) = delete $ args (url);
    $ Self-> () output_dir = delete $ args (output_dir) | |'./';
    $ Self-> (ua) = LWP:: UserAgent-> new (
        agent => 'Mozilla/5.0'
        show_progress => 1,
        cookie_jar => (),
    )

    bless $ self, $ class;
)

sub run (
    my $ config = ();
    GetOptions ($ config, 'url = s', 'output_dir = s');
    _usage die () unless _Validation ($ config);

    my $ App = Podcast:: Podcast-> new (
        url => $ config-> (url)
        output_dir => $ config-> (output_dir)
    )

    Podcast-$> download_enclosures ();
)

get_enclosures sub (
    my ($ self) = @ _;

    my $ res = $ self-> (ua) -> get ($ self-> (url));
    if ($ res-> is_success) (
        my $ xml = $ res-> decoded_content;
        my $ feed = XML:: LibXML-> load_xml (string => $ xml);
        my @ nodes = $ feed-> findnodes ('/ / enclosure / @ url');
        my @ encl_urls = map ($ _-> value) @ nodes;
        return @ encl_urls;
    )
    else (
        carp $ self-> (url), $ res-> status_line, $ /;
    )
)

download_enclosures sub (
    my ($ self) = @ _;

    my @ encl_urls = $ self-> get_enclosures ();
    for (@ encl_urls) (
        my $ output_filename = $ self-> _get_filename ($_);
        if (not-e $ output_filename) (
            my $ res =
              $ Self-> (ua) -> get ($ _, ': content_file' => $ output_filename,);

            if ($ res-> is_success) (
                print colored (
                    ['Green'], basename ($ output_filename). "Is Downloaded."
                ), $ /;
            )
            else (
                print colored (
                    ['Red'],
                    basename ($ output_filename)
                      . "Can not Be Downloaded"
                      . $ Res-> status_line
                ), $ /;
            )
        )
        else (
            print colored (['green'],
                basename ($ output_filename). "Is Already Downloaded." )
              $ /;
        )
    )
)

_get_filename sub (
    my ($ self, $ url) = @ _;

    return File:: Spec-> catfile ($ self-> () output_dir, basename ($ url));
)

_Validation sub (
    my $ config = shift;
    if (exists $ config-> (url) and exists $ config-> (output_dir)) (
        return 1;
    )
    else (
        return 0;
    )
)

_usage sub (
    return "Usage: $ 0 - url http://path.to/rss.xml - output_dir. / \ n";
)

1;

The script is a modulino , mainly because I think this is the first version of the tool. I already have some ideas of features to add!

Regarding the algorithm is quite simple:

  1. The program takes two arguments, the URL of the podcast directory where the files must be downloaded;
  2. we retrieve the contents of the URL of the podcast;
  3. we use an expression XPath , //enclosure/@url to retrieve the URLs of individual files;
  4. we download the various files.

Simple, effective, and I now have a command line tool for my podcasts!

Bring with you the best ...

10 September, 21:23, by manu, machine translated from French

Like many of Perl Mongers , I am an absolute fan of CPAN . Yes, there is the worst and the best, but I must confess that I can easily live with the worst of a hand by not installing these modules (as possible), but also because research on CPAN often time save me.

And I find it so useful that I also accept to sacrifice some space on my hard drive to transport all the time with me. This is done with the script minicpan , available on CPAN ... . The tool relies on a simple configuration file:

 remote: http://ftp.belnet.be/mirror/ftp.cpan.org/
Local: / home / manu / toolbox / minicpan
exact_mirror: 1

This allows me to have a share of reading in abundance during my travels, but also, more seriously, they can install new modules even when I have no internet connection.

It is then necessary to change the configuration of cpan so that it uses local resources rather than remote resources. Here is a brief reminder of commands cpan useful in this situation:

  • o conf urllist , it will display the URLs used cpan to find and install modules;
  • o conf urllist unshift file:///home/manu/toolbox/minicpan , allows you to add a URL to the top of the list of URLs, as cpan uses the list in order, it can give priority to a resource;
  • o conf urllist shift , it allows for deleting the first element of the list;
  • I'll leave o conf urllist pop and o conf urllist push something !
  • Obviously, do not forget to use o conf commit to save the configuration for the next call to cpan , unless you called the command o conf auto_commit 1 before.

With the CPAN is one thing, but it is also quite nice to be able to query the repository via a nicer interface that locate , find and other grep . That's why I installed CPAN:: Mini:: Webserver makes my minicpan accessible via a web interface, I had you I already mentioned earlier .

Visualization of a CQL query

9 September, 21:45, by manu, machine translated from French

Bridging Instapaper and emacs

8 September, 21:44, by manu, machine translated from French

I recently discovered a very useful tool in a day, Instapaper . As its tagline says: "A simple tool to save web pages for reading later". Since many of these web tools, it offers a bookmarlet include its use in our web browser, but what about when not using a browser? Well, as many of these web tools ( but not all ), an API is available to interact with this web service.

As I often use Twitter to make my day, and I use twittering-mode to browse through my tweets, I need a tool to make the link between emacs and Instapaper. I am still very much a beginner with elisp , so I preferred to do the job with Perl .

The first step was to create a module: WebService:: Instapaper. The official documentation of the API describes the outline of the module, and the code is pretty straightforward:

use strict;
use warnings;

package WebService:: Instapaper;

use LWP:: UserAgent;
use Carp;
use URI;
use URI:: QueryParam;

= Head1 VERSION

= Cut

# ABSTRACT: Turns Into trinkets baubles

o $ VERSION = '0 .01 ';

use constant INSTAPAPER_URI => 'https: / / www.instapaper.com / api / add';

sub new (
    my ($ class,% attr) = @ _;

    my $ self = ();
    $ Self-> (username) = $ attr (username) | | "";
    $ Self-> (password) = $ attr (password) | | "";

    $ Self-> (_ua) = LWP:: UserAgent-> new (
        agent => 'WebService:: Instapaper/0.1'
        cookie => (),
        timeout => 30,
    )

    bless $ self, $ class;
)

sub username (
    my ($ self, $ username) = @ _;

    $ Self-> (username) = $ username;
)

sub password (
    my ($ self, $ password) = @ _;

    $ Self-> (password) = $ password;
)

sub credentials (
    my ($ self, $ username, $ password) = @ _;

    $ Self-> username ($ username);
    $ Self-> password ($ password);
)

sub add (
    my ($ self,% params) = @ _;

    croak "No username!" unless $ self-> (username) ne "";
    croak "No URI to read_later!" UNLESS exists $ params (url);

    my $ uri = URI-> new (INSTAPAPER_URI);
    $ Uri-> query_param (url => $ params (url));

    my $ request = HTTP:: Request-> new (GET => $ uri);
    $ Request-> authorization_basic ($ self-> (username), $ self-> (password));

    my $ response = $ self-> (_ua) -> request ($ request);

    $ Self-> () = _last_code $ response-> code ();
    $ Self-> () = _last_message $ response-> message ();

    if ($ response-> code () == 201) (
        return 1;
    )
    else (
        return 0;
    )
)

sub message (
    my ($ self) = @ _;

    return $ self-> () if exists _last_message $ self-> () _last_message;
)

sub code (
    my ($ self) = @ _;

    return $ self-> () if exists _last_code $ self-> () _last_code;
)

1;

The module is not yet available on CPAN , but the git repository is available via Github .

After programming this module, so I wrote a script to add links to my Instapaper account. Here's the code:

#! / Usr / bin / env perl

use strict;
use warnings;

use WebService:: Instapaper;

use Getopt:: Long;
use Config:: INI:: Reader;
use File:: Spec;

my $ config = (config => File:: Spec-> catfile ($ ENV (HOME), '. instapaperrc'));

my $ ini_config = Config:: INI:: Reader-> read_file ($ config-> (config));

my $ reader = WebService:: Instapaper-> new (
    username => $ ini_config-> (Instapaper) -> (username)
    password => $ ini_config-> (Instapaper) -> (password)
)

GetOptions ($ config, 'url = s', 'config = s');

die "Usage: $ 0 - http://www.perl.org url \ n" unless exists $ config-> (url);

if ($ reader-> add (url => $ config-> (url))) (
    print "The

Naïve analysis of network traffic

7 September, 21:19, by manu, machine translated from French

A few months ago, I had the opportunity to program a tool to scan a network and the IP addresses of the most active in terms of consumption of bandwidth (in fact, it gives all the IP addresses scanned and for each IP address, the number of bytes associated with that address). It is therefore relatively naive as a tool, and it had no other goal than to give me a general idea of what was happening on the network.

Here's the script:

#! / Usr / bin / env perl

use strict;
use warnings;

use Net:: Pcap;
NetPacket use:: Ethernet qw (: ALL);
NetPacket use:: IP;
use DB_File;

my ($ err, $ netaddr, $ netmask);
my $ dev = shift | | 'eth0';
my $ stat_filename = shift | | '. / stats.db';

tie my% stats, 'DB_File', $ stat_filename, O_RDWR | O_CREAT, 0666, $ DB_HASH;
$ SIG (INT) = \ &clean;

die "You Need to Be root to run this script ... \ n" if $>! = 0;

Net:: Pcap:: lookupnet ($ dev, \ $ netaddr, \ $ netmask, \ $ err) and die "$ dev lookupnet failed ($!) \ N";
my $ object = Net:: Pcap:: open_live ($ dev, 1024, 1, 0, \ $ err);

Net:: Pcap:: loop ($ object, -1, \ & callback, [$ netaddr, $ netmask]);

sub callback (
    my ($ user_data, $ header, $ raw_packet) = @ _;
    my ($ netaddr, $ netmask) = @ ($ user_data);
    
    my $ packet = NetPacket:: Ethernet-> decode ($ raw_packet);
    if ($ packet-> (type) == ETH_TYPE_IP) (
        my $ ip = NetPacket:: IP-> decode (eth_strip ($ raw_packet));
        $ Stats ($ ip-> ()) src_ip = 0 if not exists $ stats ($ ip-> ()) src_ip;
        $ Stats ($ ip-> ()) dest_ip = 0 if not exists $ stats ($ ip-> ()) dest_ip;
        
        $ Stats ($ ip-> ()) src_ip + = $ ip-> (len);
        $ Stats ($ ip-> ()) dest_ip + = $ ip-> (len);
    )
)

sub clean (
    my ($ signal) = @ _;
    
    foreach my $ key (sort ($ stats ($ a) <=> $ stats ($ b)) keys% stats) (
        print $ key, ':', $ stats ($ key), $ /;
    )

    untie% stats;
    Net:: Pcap:: close ($ object);
)

So, this script uses the pcap library via Net:: Pcap to analyze what is happening on the network, and NetPacket to decode the packets of interest. I use DB_File to tie a hash to a file (which allows me to store the analysis and the results can be reused later). And finally, I catch the INT signal to display the results of the analysis, and leave the program strictly.

In short, nothing exceptional, but still useful.

Clipboards & Linux, managing multiple clipboard

6 September, 21:59, by manu, machine translated from French

Yesterday, I spoke of Clipboard , a module to access the clipboard of your operating system. Today, I will supplement this information with some subtleties concerning the use of Clipboard in Linux.

While Clipboard uses native tools on Windows and Mac OS X, it relies on a small utility for Linux. This utility, xclip , can manipulate the contents of the clipboard in X11, and yes, you read correctly, I am writing the contents of the clipboard. For X11, we have several available, and this is not without its problems at times.

In the documentation page of xclip, we are told that there are three clipboard available:

  1. primary used by default, so when you select text in X11, it is automatically copied to the clipboard, and you can access it through a third button click of the mouse (and if your mouse has no third button, you can emulate, for example, by simultaneously clicking the left click and right click);
  2. secondary, which is consistent with the existence of a primary;
  3. clipboard, which is used by your window manager, and thus, for example in Gnome , which is copied via Ctrl-c and paste via Ctrl-v.

In one of the tools developed accessing the clipboard to find the information to process, I preferred to use the clipboard rather than the primary (I use this script daily, and machines do not always have an external mouse, and the combination left click / right click is sometimes irritating some trackpads). So I was reading the module and I found that management of different clipboard had been put in place:

  • a method paste_from_selection() to take information from a clipboard available;
  • method copy_to_selection() to copy information to a special clipboard.

So I could work as follows:

  #! / Usr / bin / env perl
  
  use strict;
  use warnings;
  
  use Clipboard;
  
  my $ text = Clipboard:: xclip-> paste_from_selection ('clipboard') # to retrieve the text from the clipboard
  
  Clipboard:: xclip-> copy_to_selection ('clipboard', 'blah blah Balhi') # to paste text into the clipboard

Unfortunately, I use these scripts also with Windows and Mac OS X, it was therefore no question of having direct calls to paste_from_selection() or copy_to_selection() . He was therefore the solution to test $^O to see my runtime environment and adapt the behavior of my program office, or else use the next hack.

The Clipboard:: xclip , another method is defined to set the clipboard favorite, this method is called favorite_selection() , which is defined as follows:

  favorite_selection sub (($ self-> all_selections) [0]) # return the first available clipboard, so primary
  
  all_selections sub (qw (primary secondary clipboard buffer)) # list available clipboard

I just have to redefine the function favorite_selection() for him to pick my favorite clipboard. Here is the code:

  undef & Clipboard:: xclip:: favorite_selection; # it "deletes" the definition of the method favorite_selection
  * Clipboard:: xclip:: favorite_selection = sub ('clipboard') # we redefine the action method

And now I can continue to use Clipboard->paste() and Clipboard->copy() without worrying about the rest!

Take the information where they are

5 September, 11:00, by manu, machine translated from French

While some have defined computer programs as the sum of algorithm and a data structure , it is clear that programs are not very useful if we can not give them information (via input) and if we can not make the treatment outcomes (via release).

As the majority of programs I write in Perl within the category of tools meant to make my life easier, the concept of input-output is of particular importance. Thus, I have a tool that transforms an ISBN in a citation format BibTeX (actually, I have several versions of this tool, but never mind!), so I have to send the ISBN in my program and I'm back BibTeX record format, the ideal is to work on the basis of the clipboard , I copy the ISBN, I run the program, and, finally, I can stick my BibTeX record where I see fit.

To access the clipboard via Perl, we have a module named clipboard , this module is rigorously simple in its use, and extremely useful since it is multi-platform (well, it can transparently use the clipboard on Windows, Mac OS X and Linux).

Here is an example of use:

#! / Usr / bin / env perl

use strict;
use warnings;

use Clipboard;

print Clipboard-> paste; # we reach what has been copied to the clipboard

Clipboard-> copy (scalar localtime) # we copy the current date in the clipboard

Nothing Sorcerer then! And my programs need to integrate easily into my everyday computing.

Understanding the question

4 September, 15:13, by manu, machine translated from French

Regularly, I submit to a form of gymnastics of the mind. This gym is to find a theme that interests me, and to work briefly. If the computer is on, it will be for me to schedule a naive implementation, if not, write a short summary. The objective of this gym is to get me to manipulate concepts, whether known or not, to better integrate and understand.

Yesterday, the gym began with a reflection on the analysis of requests such as those submitted to search engines and submitting them to an engine SQL . Well, the subject is vast enough, I'm primarily interested in the parsing of a query. I was naturally on the CPAN to see the existing modules, and so read the current implementations. I'm leaning more specifically on two modules:

  1. Search:: QueryParser written by Laurent Dami , and
  2. CQL:: Parser written by Brian Cassidy .

Both modules analyze every two queries, but the shape is different queries, the first query parsing "the Google" and the latter specializes in queries CQL (Common Query Language CQL meaning). The implementation details are also different:

  • Search: QueryParser tokenizer does not use external and therefore works as follows: a loop in which the parameter is the query, and subsequent calls to s/// shortens the application to its full scan. I like this approach to quickly build a simple query parser. By cons, if the query language becomes more complex, I fear that it becomes a bit laborious to maintain;
  • CQL:: Parser in turn relies on an external tokenizer: String:: Tokenizer . This translates a string into an array, it "enough" to bring this table to query the information system of our choice ( DBMS , search engine , etc.), and I quickly read the documentation for String :: Tokenizer , and it is definitely a module that I will add in my Top 100.

On the side of the naive implementation, I mainly played with the method adopted by Search:: QueryParser . I simply create a tree of an arithmetic expression. And it's so naive that I did not validate the arithmetic expression. At the creation of the tree, I used another module of my Top 100, ie Tree: DAG_Node . Here is the product code:

 #! / Usr / bin / env perl

use strict;
use warnings;
diagnostic use,

use Tree:: DAG_Node;

my $ query = shift | | '+ 1 2';
my $ root;
my $ lastnode 

Am I connected or not?

7 August, 16:56, by manu, machine translated from French

After a lengthy absence (the end of school holidays, but my thesis kept me away from this blog), I will slowly resume writing articles for it.

Until now, this blog has mainly been devoted to Perl , but the approaching school year makes me want to expand the topics covered in the courses I teach, it will remain largely computer-related issues (and more generally computerized documentation), but with a more focused information sciences.

For the resumption of this blog, I'll stay in the habits of the past, namely a little note on Perl . Today, I speak of a module which is useful when you want to perform processing related to the web. It is LWP:: Online . This module allows a very simple answer to the question "Am I being connected to the web? .

Usage is as simple as the implementation of the module as it is an import function online() , which returns a boolean value.

A simple script would look to this:

 #! / Usr / bin / env perl

use strict;
use warnings;

use LWP:: Online qw (online) # we import function online () in our script

if (online ()) (
    print "We're connected \ n";
)
else (
    print "We are not connected \ n";
)

The module also provides a function offline() which returns true when we are not connected to the Web.

Automatic discovery of RSS son

5 May, 20:27, by manu, machine translated from French

I was confronted with an interesting problem this morning. The problem was the following:

A list of sites that we wanted to have the son RSS.

In terms of solutions, there were not too many choices:

  1. do the job manually;
  2. develop a more automated.

My heart obviously looking for the automated solution. I developed a little tool to retrieve the son of an RSS site. This tool was very simple because based on the function find_feeds() module XML:: Feed . Here is version one-liner:

 perl-MXML:: Feed-MDAT:: Dump-e 'dd (XML:: Feed-> find_feeds (shift))' http://lesoir.be

I changed my script to manage a list of links, and provide in return a list of RSS son. I also added a small management statistics to know the number of sites treated and which ones do not offer RSS son (or more precisely, sites with XML:: Feed failed to retrieve the son RSS). Here's the final script:

 #! / Usr / bin / env perl

use strict;
use warnings;

use YAML;
use Getopt:: Long;
use XML:: Feed;

my $ config = ();

GetOptions ($ config, "input = s", "output = s");

_usage die () unless _Validation ($ config);

my $ sites = YAML:: LoadFile ($ config -> (input));
my @ feeds;
my @ with_feeds;
my @ no_feeds;

foreach my $ site (@ ($ site)) (
    my @ site_feeds = XML:: Feed-> find_feeds ($ site);
    if (scalar (@ site_feeds)> 0) (
        push @ feeds, @ site_feeds;
        push @ with_feeds, $ site;
    )
    else (
        push @ no_feeds, $ site;
    )
)

YAML:: DumpFile ($ config -> (output), \

Migrating databases Winisis - First Step

27 April, 11:38, by manu, machine translated from French

For several years now, I'm régulièremnt contacted by mail to communicate with the tools to migrate databases Winisis . Unfortunately, I have never taken the time to write a proper documentation, which is unfortunate. I will lay the foundations of this material on this blog, and I hope that I will take time in the future to develop a more complete documentation.

Before turning to the tools itself, here is the context in which they were developed. As part of a training course organized jointly by the Free University of Brussels (ULB) and the University Commission for Development (CUD) . I taught a course called "Integrated Library Management. The trainees in this training should conduct a project, and one of those students wanted to migrate Winisis Koha (software was used to illustrate my way). So I looked into the issue. The objective was to migrate the database of ISIS MARC21 .

Databases based on a scheme specific to each user (Winisis is not an ILS, but a DBMS, specializing in bibliographic data, certainly, but a DBMS, so it looks like more than to Microsoft Access Koha). The first step is to determine the structure of the database in order to establish a correspondence table to the desired MARC (MARC21, UNIMARC, or any other variant). Today, I see only this phase, and the tool developed for this. Other steps will follow in the coming days.

The structure of an ISIS database, called "Field Definition Table", then "table field definition" in the jargon ISIS is stored in a text file with the extension " .fdt . In analyzing this file, one can easily determine the structure of the database. Biblio:: Isis is the module of choice for manipulating ISIS databases, it provides a function to read the table definition of fields, and So the following code is based on this module:

 #! / Usr / bin / env perl

use strict;
use warnings;

use Getopt:: Long;
use File:: Find:: Rule;
use Data:: Dumper;
use Encode qw (from_to);
use Spreadsheet:: WriteExcel;

my $ config = ();

my% save_functions = (
    Excel => \ & _save_excel,
    dump => \ & _dump_struct,
)

my $ fdt_struct = ();

GetOptions ($ config, 'database = s', 'file = s', 'save = s');

if (not exists $ config -> (

Duplicate of data during a migration

26 April, 15:18, by manu, machine translated from French

When migrating to an information system to another, one is often tempted to engage in an operation to improve quality. Although this is a project in itself, this may be the "right time", provided that we have the resources to do so. Human Resources, of course, but we must also have enough time to be able to develop a methodology able to guarantee us a better quality.

This year some students have embarked on this adventure for their graduation work with the migration of library catalogs. And a special problem was quickly placed with the presence of duplicates in the lists of authority. Catalogs are excellent venues to measure the creativity of the human mind can not follow the rules ;-)

Thus, in a library catalog, we have a list of publishers for example, and depending on the creativity of cataloguers, but also the amount of people involved in the management of these publishers, we will find more or fewer variants :

  • Ed O'Reilly
  • O'Reilly;
  • O'Reilly
  • E. O'Reilly
  • etc..

The ideal would therefore be able to find any duplicates, and replace them with the correct form. But how to find all these duplicates? Without computers, the task is tedious:

  1. browse the list of authority;
  2. establish a list of duplicates;
  3. make the choice of either "correct";
  4. make all necessary changes.

Faced with such situations, my first instinct is often to determine what I can computerize, and even automate. In this case, it would be ideal to get a list of "possible duplicates", ie, expressions that close enough for us to put a flea in his ear. In this case, the computer offers several techniques:

  • an algorithm to compute the Levenshtein distance , that is to say, the number of elementary operations to move a word to a word P M, based on this algorithm, we will be able to compare each entry in the list authority with the rest of that, and keep the items which the Levenshtein distance is not important (this threshold is obviously set as the mesh size of nets), of course, this algorithm is available on CPAN: Text :: Levenshtein for a version in Perl, and Text:: LevenshteinXS for a version in C;
  • other techniques derived from the previous exist, for example using the algorithm hiding behind agrep , a grep to make approximations in the investigations to strings, there is a Perl module that reproduces this behavior: String:: Approx (this is also an XS module, thus based on C).

Thus, techniques exist, "there is more than" ... do a search on CPAN, for example with the keyword "group" or "Similarity," which allowed me must see String:: Similarity:: Group based on String:: Similarity , which is based on an algorithm which significantly different but the same techniques as explained above. In short, once again CPAN save me time and allows me to develop a prototype quickly (without this module, I set up the group creation, which is certainly not complicated, but brings a lot of reflections).

Here is the prototype in question:

 #! / Usr / bin / env perl

use strict;
use warnings

Special Edition Powerhouse Perl - Linux Magazine

15 April, 19:47, by manu, machine translated from French

It's been a while since I had not gone into a bookstore, but this passage has been been successful since it allowed me to find a few books to complete these Easter holidays with this special issue of Linux Magazine devoted to Perl: Perl Special Edition Powerhouse .

There are nice items, and it gave me some ideas to fuel the Perl part of this blog.

Stay tuned ...

Introduction to Plack talk in French Perl Workshop

13 April, 07:14, by miyagawa, machine translated from French

Plack is a port of WSGI (Python) and Rack (Ruby). Its objective is to provide a common environment developers web framework. It provides connectors for many web servers, but also an environment to easily write middleware. Plack is still young but is already adopted by all frameworks Perl (Catalyst, Dancer, Mojo, ...), and many middleware are also available on CPAN.

via journeesperl.fr

frankcuny Will Talk about Plack in French Perl Workshop 2010.

Save the image in an HTML file

17 March, 22:36, by manu, machine translated from French

Some time ago now, when I began to read every day on my ebook, I quickly wanted to be able to read articles from the web on my reading lamp. After a quick tour, I had found nothing at the time (when it goes back even a little while now) and I had therefore embarked on developing a personal solution.

After some reflection, I developed the following tools:

As (bad) laziness, I never took the time to put these modules on CPAN 2, but this does not stop me to talk about anyway.

So, in the case before us today, we'll see HTML:: Image:: Save. Using it is as simple as possible:

Perl is a planet

15 March, 22:32, by manu, machine translated from French

With a lag, I just read the note from Jean Véronis entitled "Ontologies: Perl is a planet in the Solar System" (an English version of the article is available).

And therefore, set to hand the intrinsic value of the ticket, long live Perl ;-)

The importance of the ecosystem of a language

8 March, 20:13, by manu, machine translated from French

In reading the latest "PragPub: The First Iteration" (No. 9, available on the website PragProg) in the article "JavaScript: It's Not Just for Browsers Any More" by Jason Huggins, I came across a reflection interesting

When we choose a technology to write an application, we do not just choose the language, we also choose the list of available libraries. If a language has many useful libraries with a vibrant community around them, it's going to be easier to write your application in less time.

Well, in my case, it made me think to Perl and CPAN, but I'm sure others will read other things:)

Pearltrees, RDF & Perl

7 March, 14:54, by manu, machine translated from French

After some discussions recent, I plunge into the arcana of RDF. Indeed, Pearltrees offering a possibility to export favorites RDF, my natural curiosity prompted me to investigate in order to know what I could actually do with this file. In addition, @ SebDeclercq was nice enough to send me that his own safety, thus sparing me the tedious task of enriching my own Pearltrees.

So I have a file full of links, and now I want to exploit this information. How? As indicated in the ticket's blog Nicolas Cynober, we can use a tool like SPARQL to manipulate information. In SPARQL, you can submit your application written in SPARQL, and get a response in different format:

  • XML, with the opportunity to learn a script XSLT to transform the document (eg XHTML);
  • in JSON;
  • or just text.

So it is a very interesting tool, but it must be online, and personally, I like having my tools directly available, regardless of my web access. So, the question arises of what I have on my machine directly. A trip through the CPAN, and hop, here are some promising modules:

  • RDF:: Redland;
  • RDF:: Trine;
  • and that I need to query the file via the SPARQL: RDF:: Query.

"It keeps up!

26 February, 22:16, by manu, machine translated from French

Recently, at the turn of GitHub, I saw an interesting project: growlme. For those not familiar with Growl, it is a notification system non-intrusive. Imagine that you start burning a DVD, and during that time you enjoy to work on an article or blog post. Traditionally, the burning software you display a small window at the end of the grave, thus interrupting what you were doing. With reporting systems as Growl, a window will appear flying in the top right of the screen indicating the end of burning. Growl is only available on Mac 1, but there are similar systems on Linux (libnotify among others).

growlme can launch a command line and be kept informed of the outcome of its execution via Growl. As I often start a lengthy process via the command line and harness myself to other tasks in the meantime, I'd like to have the equivalent on my Linux.

The result is not very long and operates Desktop::Notify is to say, the Perl interface to libnotify, IPC::Run to run the command, Getopt::Long to process parameters on the command line (note use of Getopt::Long::Configure('pass_through') to keep @ARGV and Sys::Hostname for the title of the notification. For the rest, I aped the original program.

# '/ usr / bin / env perl

use strict;
use warnings;

use Desktop:: Notify;
use IPC:: Run qw (run);
use Getopt:: Long;
use Sys:: Hostname;

my $ config = (
    message => 'Succeed!'
    fail => 'FAILED',
    title => hostname (),
);

Getopt:: Long:: Configure ( 'pass_through');
GetOptions ($ config, 'message = s', 'fail = s', 'title = s');

if (scalar (@ ARGV) == 0) (
    die "$ 0: Must provide a command to execute \ n";
)
else (
    my $ notify = Deskto

Pearltrees

24 February, 21:23, by manu, machine translated from French

Pearltrees

Following a comment on his blog , @ SebDeclercq explained the reasons for his choice on Pearltrees . In summary (and correct me if I misunderstood):

  1. need a tool to make backups of these favorites;
  2. this tool should be online to be used for several machines;
  3. the proposed classification system should be effective, visual grading and tree form met its expectations.

For my part, I share the need for a system to manage my favorites. I've used for a while Delicious , but at some point, I abandoned the habit of depositing the resources deemed valuable for the benefit of a small tool developed by me. In its reply @ SebDeclercq reports that his use of Pearltrees is mainly done through the Firefox extension. I had not taken the time to test it. First, because my main browser is now Chromium , and also for lack of time. So I tested the tool offered by Pearltrees , namely the bookmarklet, but I was not necessarily happy. The latter deposited the beads (include URLs that you want to keep) in a basket, and you still had to "arrange" these pearls. And therefore, having to use this interface in Flash that I find particularly displeased a . In short, in these circumstances, it was difficult to adopt Pearltrees in my toolbox.

But the article @ SebDeclercq forced me to revisit the tool, and so this time I took the time to launch a Mozilla Firefox, and install the extension. And so I could understand why it was a useful tool to manage their favorites:

  • the obvious integration into the browser;
  • the provision of a button you can choose where to put your pearl, and therefore, it eliminates the need to go through the site, and thus avoid the Flash interface;
  • a button to launch your Pearltrees.

Nevertheless, despite everything, I do not include Pearltrees in my toolbox for the following reasons:

  • lack of APIs to this day: from my point of view, is really the biggest black cloud over this tool. Indeed, in the Web 2.0 that is ours, and in the Web 3.0 or Web data pointing the tip of her nose, the presence of a PLC is essential 2 ! So, if I have trouble with the Flash interface, the presence of an API would have allowed me to develop a command line interface that I like. But that is not with the export RDF that I'll go far three since I have to log in to achieve this export;
  • features of the tool does not exceed that of many Delicious . Well, the tree structure is very cool aspect is interesting display 4 , being able to "capture" a pearltree is really nice, but otherwise I'm afraid to reproduce the same pattern with Delicious ;
  • besides this, I discovered recently via a tweet @ MarioAsselin tool more in tune with my needs: Diigo , I use it for a week now, and although I still have to improve my workflow staff

Information Retrieval - Example of creating an index

21 February, 22:49, by manu, machine translated from French

In the context of a during a course of management of digital knowledge, I speak a little information retrieval. Just the basics, but hey, it seems important to know how a search engine, especially for future specialists in information and documentation. Among these bases, we how a search engine that receives text to index. If the theory is easy to understand, this does not prevent me from adding a visual layer, and for this reason that I wrote a little to illustrate this stage of indexing.

The script was developed over a period of noon, just before giving the course, and so I did not work ergonomics and layout tool. To achieve rapid tool, I used the following tools:

Otherwise, the code is quite simple:

# '/ usr / bin / env perl

use strict;
use warnings;

package MyView:: Templates;
use Template:: Declare:: Tags;
use base 'Template:: Declare';

private template form => sub (
    my $ self = shift;
    my $ title = shift;

    div (
        attr (style => 'margin: auto; size: 15%',);
        form (
            attr (
                action => '/ submit',
                method => 'POST',
            );
            textarea (
                attr (
                    cols => '100 ',
                    rows => '25 ',
                    name => 'title'
                    style => 'float: left;'
                );
                $ title;
            );
            input (
                attr (
                    name => 'submit',
                    type => 'submit',
                    value => 'Submit'
                    style => 'float: left;'
                );
            );
            input (
                attr (
                    name => 'reset'
                    type => 'reset'
                    value =>

The reinvention of the wheel through MARC:: Record

9 February, 21:20, by manu, machine translated from French

Recently on the mailing list perl4lib, a question was raised about the existence of a solution to divide a too large MARC file into several smaller files.

Like many users of MARC records, and programmers manipulating MARC records, I faced this problem and I developed a small quick solution:

# '/ usr / bin / env perl

use strict;
use warnings;

use MARC:: File:: USMARC;
use MARC:: Record;

use Getopt:: Long;

my $ config = (output => 'input');

GetOptions ($ config, 'input = s', 'chunk = s', 'output = s', 'max = s');

if (not exists $ config-> (input) and not exists $ config-> (chunk)) (
    die "Usage: $ 0 - input file - chunk-size [- output file] \ n";
Else ()
    run ($ config-> (input), $ config-> (output), $ config-> (chunk), $ config-> (max));
    
)

sub run (
    my ($ input, $ output, $ chunk, $ max) = @ _;

    my $ marcfile = MARC:: File:: USMARC-> in ($ input);
    
    my $ fh = $ output eq 'input'? create_file ($ input): create_file ($ output);
    my $ cpt = 1;
        my $ total = 0;
    while (my $ record = $ marcfile-> next) (
        $ count + +;
        
        if (defined $ max) (
            last if $ count> $ max;
        )
        if ($ cpt + +> $ chunk) (
            close $ fh;
            $ fh = $ output eq 'input'? create_file ($ input): create_file ($ output);
            $ cpt = 1;
        ) 

        print $ fh $ record-> as_usmarc;
    )   
    close $ fh;
)

create_file sub (
    my ($ output) = @ _;
    my $ cpt = 0;
    
    my $ filename = sprintf ( '% s.% 03d', $ output, $ cpt + +);
    while (-e $ filename) (
        $ filename = sprintf ( '% s.% 03d', $ output, $ cpt + +);
    )

    open my $ fh, '>', $ filename;
    return $ fh;
)

This tool is an example of a solution librarian - librarian should be able to program (if it wishes to schedule, of course). The algorithm used is far from complicated (although this is a good exercise), and it is an exploitation of existing modules (CPAN live!) For which there is documentation. In short, a fine example of laziness and spécicialisation.

Nevertheless, a better solution than what marc split. Pl is undoubtedly the use of utility-marcdump c yaz

Happy New Year - Happy New Year 2010

4 January, 03:51, by Yann, machine translated from French

We wish you all the best for the new year.

Our best wishes for the new year.


Eh! Maelys, Caroline & Yann

Noel 2009 - I couldn't manage a better picture this year. I suck


Baby pictures are available on my Flickr photostream as usual.

The baby pictures are available on my Flickr as usual.

A glance at the calendar

10 December, 18:33, by manu, machine translated from French

For several years now, an initiative of the Perl community, that can be called a tradition now organized in this period. It is an advent calendar. From the first day of December until Christmas (Advent so), an article is published to explain now a module, sometimes a technique, or an element of a culture monger 1.

To my knowledge, there are four Advent calendars in the Perl community this year. Each focuses on a particular project in the community:

I did a brief search to see if communities of other programming languages had taken the initiative, but I found nothing really followed:

  • Ruby was one in 2006 and 2008, but obviously nothing this year;
  • PHP has one in 2007, and this year;
  • I have found nothing for Python;
  • and I stopped there! Feel free to submit a comment for me to discover other such schedules.

Personally, I think it's an interesting initiative, first it helps to learn a language or a particular software package, and then it should also provide a sort of promotion for the purpose of the calendar .

In short, in my case, much more interesting than the timing of a famous brand of tires, for example ;-)

A long time ago

30 November, 09:23, by manu, machine translated from French

It's been a while since I have published notes on this blog, but this does not mean that Perl has not been helpful in the meantime. In fact, it is even quite the opposite! For now, I find it especially useful for me to make prototypes to better understand certain technologies.

Thus in GNU / Linux Magazine France the month of November 2009, there is an article (Put a sphinx in your search engine!) On a tool which I had thrown a quick glance: Sphinx. I have not had time to finish the article (I'm late on everything, I tell you!), But the principle is to provide an entry point to a database via a query. The results of this query is then indexed by an external software (Sphinx in this case). Then when we want to do a full text search on this database, we first examine the Sphinx to get the identifiers of data, we can then retrieve the data directly into the database.

I do not have databases large enough to hand that to see if the promises of performance are at the rendezvous, but according to the author of the article is true!

I therefore question the possibility to use Sphinx to create OPACs "new generation". The advantage is that simply adding Sphinx in the game can use a query language much richer than conventional OPACs.

I have used this technique for indexing by an external program to make available a database Winisis via the protocol Z39.50. I used Perl for this, with the following modules:

One more tool: CPAN:: Mini:: Webserver

10 November, 22:52, by manu, machine translated from French

Perl is really a tool that I like in my toolbox, and like many other programmers Perl, which I particularly appreciate is being able to count on the work of many others to help me solve my problems daily. Sometimes the services rendered are not of the order of support to complete a job, but simply to make an even more comfortable.

Often relying on the CPAN, I came to myself a copy on my hard drive that I keep synchronized as evenly as possible. This work is directed by the excellent minicpan (CPAN:: Mini). So here I am in the pleasant situation of always having to wear hand an archive from CPAN, which I am connected to the Internet or not. But the CPAN is not limited to this, it is also the tool that I consult to read the documentation (yes, I know perldoc is there for that), but what to do when I'm on the road? It's simple, use minicpan webserver (CPAN:: Mini:: Webserver). The latter is based on the configuration file minicpan, and provides all via a web server. I can now do research, read the documentation, testing, and I can even start the installation from this interface. In short, a CPAN survitaminé! A try!

Am I connected?

31 October, 22:40, by manu, machine translated from French

Most tools that I developed based on LWP, so I need a web access to run them (nice platitude!). Among these tools, some are expected to start via cron, and thus raises the question of what happens to the program when it starts and I'm not connected. One way to address the problem is to modify the program so as not to start treatment when web access is operational. Here LWP:: Online just to help me. It allows me to import a function online (), which will check if web access is present. If so, it returns a positive value. LWP:: Online checks web access by checking the presence of the copyrights on some sites like Google and Yahoo!, So this adds some latency in your program, but it remains for some interesting tools.

A cookie for another

22 October, 21:13, by manu, machine translated from French

Recently, I found myself faced with a problem rather silly but nonetheless annoying. I use for some years now the services of a site. I had to connect to this site recently, but

  1. I could not remember my password!
  2. resetting the password for difficiel was no way to remember the email address used to create the account!

Damned! How?

Well, welcome to be lazy! Indeed, to enjoy the services of this site, I had written a small robot in Perl with LWP. This tool works good as always relied on a cookie. This cookie was in the proprietary format of HTTP:: Cookies, so I had to convert the format of Mozilla Firefox.

No sooner said than done. A little research on the web, and I found an article of Mongueurs explaining a conversion of cookies, but Mozilla to LWP. After some experimentation, I eventually adapted the program. Here:

# '/ usr / bin / env perl

use strict;
use warnings;

use HTTP:: Cookies;
use HTTP:: Cookies:: Mozilla;
use Getopt:: Long;

my $ config = ();

GetOptions ($ config, 'from = s', 'to = s');

_usage die () unless _valid_config ($ config);

my $ input_jar = HTTP:: Cookies-> new (file => $ config-> (from));
bless $ input_jar, 'HTTP:: Cookies:: Mozilla';

$ input_jar-> save ($ config-> (to));

_usage sub (
    return "Usage: $ 0 - from cookies.lwp - to cookies.mozilla \ n";
)

_valid_config sub (
    my $ config = shift;

    if (exists $ config-> (from) and exists $ config-> (to)) (
        return 1;   
    Else ()
        return 0;
    )
)

He then just call this program as follows: $./cookies_converter -f my_lwp_cookie.txt -t /path/to/mozilla/profile/cookies.sqlite

Then I could go on my account page to change my password and put an email address

Help me fill my buffer

19 October, 15:26, by manu, machine translated from French

At the turn of a tweet, I discovered a little tool to complement emacs. This project perl-completion.el. This allows a smart auto-completion, that is to say that does not limit the content of open buffers. You can obtain help for the names of the modules, but also methods exported by these modules. In short, a must!

If you want to see more, there is a screencast that you will appreciate the work done. There is also a git repository online.

Enjoy!

Indexing BackPAN

29 July, 00:30, machine translated from French

brian d foy. 15 August 2008