An art restorer is able to rebuild a broken sculpture after thoroughly studying its multiple unconnected pieces in order to decode how they related in the past, to return them to their original place and to discover its full shape again.
Just like art restorers, the Investigative Unit at La Nación Costa Rica received in November 2012 a device with millions of data in different formats. The relational databases came scattered over more than 320 tables and without an original dictionary to explain their relations.
These databases were parts of two larger separated databases that had been fed for nearly 30 years by two companies: Singapore-based Portcullis TrustNet (PTN), and Commonwealth Trust Limited (CTL), based in the British Virgin Islands (BVI).
Both firms specialize in setting up offshore financial structures. They have helped tens of thousands of people create offshore companies and trusts, as well as hard-to-trace bank accounts.
The data were obtained by the International Consortium of Investigative Journalists (ICIJ), which chose The Investigative Unit at La Nacion to process it and to develop the interactive application of the most ambitious cross-border investigative project in history.
The task did not start from zero. In the preceding months, UK journalist Duncan Campbell and programmer Matthew Fowler had made progress in understanding and documenting part of those relations.
Between January and April, the Investigative Units computer science engineer, Rigoberto Carvajal, thoroughly analyzed the data and, with advice from the UK team and data journalist Mar Cabra, applied reverse engineering processes to reveal the original relations between tables, fields, codes, and, ultimately, hundreds of thousands of records of companies and people.
As he started work, Rigoberto found himself faced with a disorganized and scattered structure, which for years enabled an insufficient and incomplete feeding of data, duplications, void values, unneeded repetitions, missing data and poorly solved relations.
There were thousands of names of people and companies which were duplicated because they had minimal variations in some character, abbreviations, typing errors, or a slightly different order of the elements.
If the data remained that way, the true links and relations of each separate element would have never been disclosed through visualization. It would have been something similar to varnishing, without first sanding them, the dirty pieces of a disassembled sculpture.
Part of the solution consisted in integrating the databases to bring together their similar entries and then organizing them in such a way that the structure would become practical for visualization.
In order to do so, the La Nación team used the Talend Open Studio for Data Integration, an open source tool for ETL (Extraction, Transformation and Load).
Talend hosted all of the processes: extracting the databases tables, organizing their structure to combine similar records, converting them into a node and link structure and, finally, loading up the unified nodes into a sole database which would feed the public application.
Procedures and algorithms to de-duplicate the data were applied. In this task, a library developed by the Massachusetts Institute of Technology (MIT) as a result of a project named Vicino, played an important role. This library was added to the Talend Open Studio tool to apply functions in the data flow.
Also relevant was the use of the SIMIL function, which estimates the percentage of similarity between two chains of text, based on the number of sub-strings they have in common.
With these algorithms, Carvajal merged several thousand separate records which were the same persons or companies with a total degree of certainty and which had exactly the same addresses.
Following this merge, the links associated to each of those entities were finally related.
The database includes entities incorporated in 10 offshore jurisdictions: British Virgin Islands, Cayman Islands, Cook Islands, Singapore, Hong Kong, Samoa, Seychelles, Mauritius, Labuan and Malaysia. The information comes from two offshore service firms: Singapore-based Portcullis TrustNet and BVI-based Commonwealth Trust Limited (CTL).
Jodie Evans, 2008 Obama bundler and co-founder of the leftist anti-American group Code Pink, is in Yemen this week campaigning against the United States efforts in fighting the war on terror. This is in keeping with Code Pinks decade-long history of working with terrorists and state sponsors of terrorism against the United States.
While in Yemen, Evans is raising money for families of Yemenis being held at the terrorist detention center at Guantanamo Bay, Cuba, and possibly the family of slain al Qaeda leader Anwar al-Awlaki.
Code Pink sent their top leadership, including Evans, Susan Medea Benjamin and former diplomat Col. Ann Wright (U.S .Army, Ret.) on this trip to the home of al Qaeda in the Arabian Peninsula for meetings with the Yemeni government, the U.S. Ambassador to Yemen Gerald M. Feierstein , Nasser al-Awlaki and other family members of U.S. drone strike targets as well as Guantanamo detainees families.
While Code Pink is an American 501(c)3 tax-exempt group, Evans is also appealing for donations from international donors through Evans Gmail address in support of the trip to Yemen.
Evans posted a photo on June 13 on Instagram and Twitter of the groups meeting with Nasser al-Awlaki captioned, Our @CODEPINK #yemen delegation with Anwar al-Awlakis father Nasser, US killed my grandson, Why?
A multi-volume chronology and reference guide set detailing three years of the Mexican Drug War between 2010 and 2012.
Rantburg.com and borderlandbeat.com correspondent and author Chris Covert presents his first non-fiction work detailing
the drug and gang related violence in Mexico.
Chris gives us Mexican press dispatches of drug and gang war violence
over three years, presented in a multi volume set intended to chronicle the death, violence and mayhem which has
dominated Mexico for six years.