2013年9月15日 星期日

perl 開檔

http://tw.perlmaven.com/open-and-read-from-files
讀取檔案:

use strict;
use warnings;

my $filename = 'xxx.txt';
open(my $fh, '<:encoding(UTF-8)', $filename) || die "Could not open file '$filename' $!"; # $fh stand for filehandle

my $count = 0;

while (my $row = <$fh>) {
    chomp $row;
    print "$row\n";
    $count++;
}

close($fh);

( 上述編碼在blogspot用syntax highlight有bug,從HTML轉到撰寫時會變成亂碼 )

http://ind.ntou.edu.tw/~dada/cgi/Perlsynx.htm
$_ The default input and pattern-searching space.
$! Contains the current value of errno.

取代檔案:
http://stackoverflow.com/questions/4732937/how-do-i-read-a-file-line-by-line-while-modifying-lines-as-needed ( 使用Tie::File )
xxx.txt濾出來的內容存成ref ($v)後對bbb.txt一行一行去取代第二個match的城市

use Tie::File;
my @file_array;
tie @file_array, 'Tie::File', 'bbb.txt' || die "END! $!";

$country = '';
$count = 0;
my $city = '';
my $no_match_ref;
my $no_match_ref_count = 0;

$v = {
          'Bolivia' => {
                         'El Beni' => 'El Beni'
                       },          
          'Canada' => {
                        'Newfoundland' => 'Newfoundland',
                        'Yukon Territory' => 'Yukon Territory'
                      },
        };

for my $line (@file_array) {
#    s/測試/一二三/g;         # Replace PERL with Perl everywhere in the file
    
    # country line
    if( $line =~ /v ===/ ){
        my @country_arr= split(/"/, $line);
        $country =  $country_arr[1]; 
    }
    
     #city line
    if( $line =~ /ss\(f,\s\d/ ){
        my @city_arr = split(/"/, $line);
#        print $line,"\n";
        $city = $city_arr[1];
#        print "country:$country, city:$city, deftag:$v->{$country}{$city}\n"; # - #832
        
        if($v->{$country}{$city}){
            my $index = 2;
            $file_array[$count] =~ s/($city)/--$index == 0 ? $v->{$country}{$city}:$1/ge; 
        }
        
        if(!$v->{$country}{$city}){
            $no_match_ref->{$country}{$city} = $city;
            $no_match_ref_count++;
        }
        
        # &use_reg_exp_to_match(); # no use, because I will use SQL 
    }
    $count++;
    
}

print Dumper $no_match_ref;
print "\n no_match_ref counter:$no_match_ref_count"; # - #296


untie @file_array;

算ref 的key個數
$v = {
          'Bolivia' => {
                         'El Beni' => 'El Beni'
                       },          
          'Canada' => {
                        'Newfoundland' => 'Newfoundland',
                        'Yukon Territory' => 'Yukon Territory'
                      },
          ...
        };
$count = 0;
foreach my $val ( keys %{$v} ){ #方法一,用foreach跑
  $count++;
}
print "\n$count\n";
print scalar keys $v; #方法二

在sublime用正規式搜尋中文
http://stackoverflow.com/questions/1585914/matching-chinese-characters-with-regular-expressions-php
[\x{4e00}-\x{9fa5}] --> One char between 4E00 and 9FA5

http://www.regular-expressions.info/unicode.html
\p{Han} 是perl的用法 ( 未試驗 )

如何將有改過的檔案檔名做唯一輸出?
工具:cygwin, sublime, ( Komodo Edit 8, perl編輯器 )
1. 將資料夾拉到cygwin上,以直接cd 進入該目錄
2. $ ls -R work* > ls_files.txt
3. 開sublime,將ls_files.txt的檔名濾出來到 all_files.txt另存新檔
4. 寫 filter_template.pl 去開 all_files.txt檔案,讓修改過的樣板為唯一
use strict;
use warnings;

use Data::Dumper;

open(my $fh, "<:encoding data-blogger-escaped-all_files.txt="" data-blogger-escaped-count="" data-blogger-escaped-die="" data-blogger-escaped-exist="" data-blogger-escaped-file="" data-blogger-escaped-my="" data-blogger-escaped-my_array="" data-blogger-escaped-not="" data-blogger-escaped-or="" data-blogger-escaped-row="<$fh" data-blogger-escaped-while="">) {
    chomp $row;
    push @my_array, $row;
    $count++;
}
close($fh);

sub uniq {
    return keys %{{ map { $_ => 1 } @_ }};
}

print join(" ", uniq(@my_array)), "\n";


5. $ perl filter_template.pl > result.txt
result.txt即為結果。讀檔進來的$row尾端會換行(未解決)
參考:
push 值到陣列
http://perl.hcchien.org/ch03.html
push @my_array, $row;
How do I remove duplicate items from an array in Perl?
http://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl
sub uniq {
    return keys %{{ map { $_ => 1 } @_ }};
}

@my_array = ("one","two","three","two","three");
print join(" ", @my_array), "\n";
print join(" ", uniq(@my_array)), "\n";




沒有留言:

張貼留言