I want to date an Florida man
Your is not active. We have sent an to the address you provided with an activation link. Check your inbox, and click on the link to activate your. Florida is undoubtedly the most curious, wackiest and unusual state in the US, inhabited by the weirdest people ever.
Age: I'm 36 years old
What is my nationaly: Romanian
Eye tone: Lustrous green
My gender: Woman
I prefer to drink: Tequila
Favourite music: Easy listening
You probably know the meme. Every day of the year, the enigmatic Florida Man seems to try his hand at something even more peculiar, violent, tragic or downright bizarre than the day before. Googling what the Florida Man did on y o ur birthday is fun, but you must be curious what he does the rest of the year as well at least I was.
Reconstructing the year of the legendary florida man
We perform a Data Science microproject, dabbling in some cloud provisioning, web scraping, multi-threading, string operations, regular expressions and word clouds to top it all off. It contains five phases that encompass the full project:. In this project, we configure the search engine such that it incorporates from the United States within the. Once the Custom Search Engine is set up, we create a list of birthdays yes, including that pesky February 29th.
Having our list of queries ready, we are ready to rumble!
Well, almost ready. If we have to distribute the workload over four separate APIs anyway, we might as well do a bit of multi-threading. Ok, now we are ready to rumble. Fortunately, we are only interested in the headlines.
The main problem we encounter is that the actual headlines— title as displayed on the Google search — are often incomplete due to their length.
The solution we pick is to instead read the field og:titlewhich includes the full title. Unfortunately not every website includes it, so we drop these. Still plenty of source material remains though. Now we have a dictionary with only full titles; things are getting more and more readable.
With that, we also note many headlines are not exactly what we are looking for though. Time to perform a deeper analysis. First things first, we are going to reformat the headlines. Many headlines contain only lower case words, one headline uses double quotation marks whereas another use singles, some media outlets post their own name in the title… We want a nice, uniform format, using string operations and regular expressions on certain patterns in the headlines and update accordingly.
Some examples:. Formatting headlines before checking which ones we actually want to keep is arguably not necessarily the most efficient way to go, but having the headlines in a uniform format certainly helps when filtering. Besides, speed is not the key objective of this script anyway. As the Florida Man gained quite some traction in popular culture, many articles are about the phenomenon rather than the man himself.
Upper- and lower case variants are considered as well. For an example of deploying the taboo set, check the gist below. We are making progress. Now that we have a nice set of headlines, a final problem we run into is that of duplication. Similar or even identical headlines tend to make the news on consecutive days, for instance:. It performs a fairly straightforward check to measure the overlap ratio between two sequences, but it works quite well for our purposes.
As comparing each headline to a of headlines for future dates is quite computationally intensive, we limit the lookahead horizon to ten days. To check for meaningful overlaps, we remove the substring Florida Man as well as all special symbols, and make each word lower case. This seems low, but exact matches are often hard to find. The overlap check works well and removes many similar headlines, but does not completely get the job done.
Occasionally, headlines like this seep through:. For the human reader, it is clear both headlines pertain to the same incident. Although semantically having comparable meanings, the SequenceMatcher obviously cannot verify this.
Natural Language Processing could be a nice solution in theory, e. Unfortunately, we only have headlines, far too few to properly train a sophisticated neural network. Constructing a custom search for matching nouns sounds feasible e.
At this point I decided the additional de effort was not worth the potential benefits. Given the lack of real added value or purpose of this project, one should know when to stop. After all that hard work, we naturally want to see the final overview. Regular square word clouds are getting a bit stale though.
60 times florida man did something so crazy we had to read the headings twice
And that concludes it — the year of the Florida Man summarized in one picture, as unintelligible as the man himself! Curious for more? The full project code can by found on my GitHub repository. Want to see the full list of headlines? Check out:.
Retrieved from learning. Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science. Your home for data science. A Medium publication sharing concepts, ideas and codes.
Get started. Open in app. in Get started. Get started Open in app. A Data Science microproject in Python to compile the top headlines per day. Wouter van Heeswijk, PhD. Prepare : Visualize raw output, filter out headlines from query. Analyze : Remove unsuitable headlines with set of taboo words and similarity check.
Use string operations and regular expressions to produce uniformly styled output. Report : Display sample output, visualize common words in word cloud.
A year in the life of the florida man
Act : Write and publish a Medium article based on project. Takeaways To study the life of the Florida Man, we performed a Data Science microproject ranging from data ingestion to reporting and visual presentation. Data formatting combines string operations and regular expressions to convert the headlines into a uniform style.
Cycling through a set of taboo words removes headlines that are thematically outside the project scope.
The SequenceMatcher succeeds in removing most duplicate headlines. An image mask is used to generate a custom-shaped word cloud, using the ImageColorGenerator. The Florida Man is a strange guy. References Altintas, I. More from Towards Data Science Follow. from Towards Data Science. More From Medium. K-Means Practical.
Leave a comment
Henry Blais in Towards Data Science. Beginners guide to Web scraping with Beautiful Soup on London rent listings. Jan Majewski. Evaluation metrics for Classification. Rishi Kambhampati. Data Structures and Algorithms— Course Plan. Khurshidbek Kurbanov. Luck Charoenwatana in LuckSpark. Thomas Spicer in Openbridge. Benjamin Obi Tayo Ph.
Using APIs to collect website data.
Our new persons
Love is a battlefield wherever you are, but dating in Florida can have its own unique challenges.
Yes, it has a lot to do with the nightlife and fast pace environment but I know you can still find love since people in this city are asking and wanting the same things from life and love.