Archived material Access restricted Article
Rantburg

Today's Front Page   View All of Sun 06/16/2013 View Sat 06/15/2013 View Fri 06/14/2013 View Thu 06/13/2013 View Wed 06/12/2013 View Tue 06/11/2013 View Mon 06/10/2013
1
2013-06-16 Economy
Search tool to find out who, what, where, and how related in tax haven island
Archived material is restricted to Rantburg regulars and members. If you need access email fred.pruitt=at=gmail.com with your nick to be added to the members list. There is no charge to join Rantburg as a member.
Posted by 3dc 2013-06-16 11:52|| || Front Page|| [4 views ]  Top

#1 An art restorer is able to rebuild a broken sculpture after thoroughly studying its multiple unconnected pieces in order to decode how they related in the past, to return them to their original place and to discover its full shape again.

Just like art restorers, the Investigative Unit at La Nación Costa Rica received in November 2012 a device with millions of data in different formats. The relational databases came scattered over more than 320 tables and without an original dictionary to explain their relations.

These databases were parts of two larger separated databases that had been fed for nearly 30 years by two companies: Singapore-based Portcullis TrustNet (PTN), and Commonwealth Trust Limited (CTL), based in the British Virgin Islands (BVI).

Both firms specialize in setting up offshore financial structures. They have helped tens of thousands of people create offshore companies and trusts, as well as hard-to-trace bank accounts.

The data were obtained by the International Consortium of Investigative Journalists (ICIJ), which chose The Investigative Unit at La Nacion to process it and to develop the interactive application of the most ambitious cross-border investigative project in history.

The task did not start from zero. In the preceding months, UK journalist Duncan Campbell and programmer Matthew Fowler had made progress in understanding and documenting part of those relations.

----------------------------------------------------------------------------------

DON'T MISS: How ICIJ's Project Team Analyzed the Offshore Files

----------------------------------------------------------------------------------

Between January and April, the Investigative Unit’s computer science engineer, Rigoberto Carvajal, thoroughly analyzed the data and, with advice from the UK team and data journalist Mar Cabra, applied reverse engineering processes to reveal the original relations between tables, fields, codes, and, ultimately, hundreds of thousands of records of companies and people.

As he started work, Rigoberto found himself faced with a disorganized and scattered structure, which for years enabled an insufficient and incomplete feeding of data, duplications, void values, unneeded repetitions, missing data and poorly solved relations.

There were thousands of names of people and companies which were duplicated because they had minimal variations in some character, abbreviations, typing errors, or a slightly different order of the elements.

If the data remained that way, the true links and relations of each separate element would have never been disclosed through visualization. It would have been something similar to varnishing, without first sanding them, the dirty pieces of a disassembled sculpture.

Part of the solution consisted in integrating the databases to bring together their similar entries and then organizing them in such a way that the structure would become practical for visualization.

In order to do so, the La Nación team used the Talend Open Studio for Data Integration, an open source tool for ETL (Extraction, Transformation and Load).

Talend hosted all of the processes: extracting the databases tables, organizing their structure to combine similar records, converting them into a node and link structure and, finally, loading up the unified nodes into a sole database which would feed the public application.

Procedures and algorithms to de-duplicate the data were applied. In this task, a library developed by the Massachusetts Institute of Technology (MIT) as a result of a project named Vicino, played an important role. This library was added to the Talend Open Studio tool to apply functions in the data flow.

Also relevant was the use of the SIMIL function, which estimates the percentage of similarity between two chains of text, based on the number of sub-strings they have in common.

With these algorithms, Carvajal merged several thousand separate records which were the same persons or companies with a total degree of certainty and which had exactly the same addresses.

Following this merge, the links associated to each of those entities were finally related.
Posted by 3dc 2013-06-16 12:11||   2013-06-16 12:11|| Front Page Top

#2 The database includes entities incorporated in 10 offshore jurisdictions: British Virgin Islands, Cayman Islands, Cook Islands, Singapore, Hong Kong, Samoa, Seychelles, Mauritius, Labuan and Malaysia. The information comes from two offshore service firms: Singapore-based Portcullis TrustNet and BVI-based Commonwealth Trust Limited (CTL).
Posted by 3dc 2013-06-16 12:18||   2013-06-16 12:18|| Front Page Top

#3 How did they get the two companies to release their databases? That may be a story in itself.
Posted by tipover 2013-06-16 13:38||   2013-06-16 13:38|| Front Page Top

#4 Tipper - the story is on the site. They didn't. The database just appeared on a live device in their mailroom.
Posted by 3dc 2013-06-16 13:58||   2013-06-16 13:58|| Front Page Top

22:59 JosephMendiola
22:58 JosephMendiola
22:24 rjschwarz
22:16 OldSpook
22:14 Frank G
22:13 Frank G
22:01 tipper
20:52 Winky Sproing5899
20:28 rjschwarz
20:27 rjschwarz
20:24 rjschwarz
20:04 Bill Clinton
20:02 DarthVader
19:48 linker
19:46 JosephMendiola
19:43 JosephMendiola
19:36 JosephMendiola
19:35 JosephMendiola
18:35 Dale
18:34 swksvolFF
18:24 Bright Pebbles
18:15 trailing wife
18:07 trailing wife
17:55 Bangkok Billy









Paypal:
Google
Search WWW Search rantburg.com