Rantburg

Today's Front Page   View All of Sun 05/25/2025 View Sat 05/24/2025 View Fri 05/23/2025 View Thu 05/22/2025 View Wed 05/21/2025 View Tue 05/20/2025 View Mon 05/19/2025
2025-02-07 Cyber
A better way to search US gov records
[X]

Their search systems (indexes) are built to find whole records or parts of records efficiently—but they aren’t designed to search for specific words inside a giant chunk of text.

Now, USA Spending is massive—about 1.5 terabytes—so if you want to find a specific word or phrase buried somewhere inside, the system struggles. It’s just not built for that kind of search.

What I did with http://DataRepublican.com was add a reverse index on top of as many USA Spending records as possible. This makes searching much faster. When I run it locally, it’s 40% faster than grep—and that includes all the extra work like unzipping files, handling the user interface, and managing search requests. In other words, I made it as fast and searchable as possible. As a bonus, it is all client side, so it can never get overloaded like USA Spending so frequently does.

So instead of mocking the fact that we’re "rediscovering" data, maybe ask yourself why it’s been made so hard to search for this information in the first place.
Posted by 3dc 2025-02-07 00:35|| || Front Page|| [11149 views ]  Top

#1 What Is a Support Vector Machine? Working, Types, and Examples

A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges
Posted by Skidmark 2025-02-07 01:39||   2025-02-07 01:39|| Front Page Top

#2 SVMs can be numerically complex, or simply simple.
The above examples demonstrate a method of 'kmeans clustering' to differentiate sample groups.

Given a worded document, or data record, count the number of common words/fields. This should render a row:
word1 word2 word3 word4 …
count1 count2 count3 count4 …

Given multiple documents, you get multiple rows, one per document.
word1 word2 word3 word4 …
doc1 count1 count2 count3 count4 …
doc2 count1 count2 count3 count4 …
doc3 count1 count2 count3 count4 …
doc4 count1 count2 count3 count4 …
doc5 count1 count2 count3 count4 …

The resultant matrix from a single pass thru the data (computationally N*little o) renders a 'similarity score'. Documents/records/sentences that share large counts of common words, are similar and bear further examination.

In a past life, I coded a SVM to examine transcripts of Presidential 'State of the Union' speeches, sentence by sentence.
What I found was Obama's speechwriter(s) plagiarized Bill Clinton's. Obama's first differed from Clinton's first by only 13 words. The sentence count was the same.
Posted by Skidmark 2025-02-07 02:14||   2025-02-07 02:14|| Front Page Top

#3 That’s fascinating, Skidmark. And President Clinton, who is ao proud of his brilliance, smiles and jokes with him in the friendliest manner. How infuriating that must be for him, not to mention the honourable former Senatrix, his wife. What else did your work reveal?

Separately, I wonder if any of the Democratic presidents was dribking from the USAID funding firehose
Posted by trailing wife 2025-02-07 03:03||   2025-02-07 03:03|| Front Page Top

#4 ...funding firehose

DOT Sec. Sean Duffy Roasts Hillary Clinton: ‘DOGE Is Uncovering Your Family’s Obscene Grifting Via USAID’
Posted by Skidmark 2025-02-07 04:36||   2025-02-07 04:36|| Front Page Top

#5 ^#2
"...Obama's first differed from Clinton's first by only 13 words. The sentence count was the same..."

Wonder how Biden's compares?
Posted by NN2N1 2025-02-07 04:47||   2025-02-07 04:47|| Front Page Top

#6 ...would that be the video or the 'official' White House transcript?
Posted by Procopius2k 2025-02-07 06:52||   2025-02-07 06:52|| Front Page Top

#7 Anyone could have looked up the cost of the government’s politico pro subscriptions in 2024 or 2022 or 2020.

So if it was so simple, Mr. Lee, how come none of our crack journalists uncovered anything about government waste, except OrangeManBad?
Posted by Bobby 2025-02-07 07:34||   2025-02-07 07:34|| Front Page Top

#8 
Speaking of Declassified Records and Files....

JFK?
MLK?
RFK?
Where are those?
Posted by NN2N1 2025-02-07 08:18||   2025-02-07 08:18|| Front Page Top

#9 Wild.

Like watching the applied tv show Silicon Valley.
Posted by swksvolFF 2025-02-07 20:55||   2025-02-07 20:55|| Front Page Top

13:48 NoMoreBS
13:37 NoMoreBS
13:18 Skidmark
13:12 Pancho Poodle8452
13:09 Skidmark
13:09 Clem+Elmish4239
13:08 Clem+Elmish4239
13:06 alanc
12:48 Besoeker
12:48 Besoeker
12:37 Frank G
12:35 SteveS
12:22 Grom the Affective
12:15 Skidmark
12:09 Grom the Affective
12:00 Skidmark
11:52 Skidmark
11:45 Skidmark
11:41 Skidmark
11:38 Skidmark
11:33 Skidmark
11:22 Rambler
11:13 ed in texas
11:07 ed in texas









Paypal:
Google
Search WWW Search rantburg.com