Don’t do this. Don’t go there importing random code you found on the internet into your code because that can be dangerous and code would be injected into your machine or a client’s machine and you don’t want that. Remember to always have your code and modules in a version control system and that you have complete knowledge of what you’re loading.
Today I was talking with some of the guys in a Discord server about python, called Python Discord (which I definitely recommend you to check out!), and stumbled upon a guy who was requesting assistance.
(Yeah… I’m theinquisitor.)
What he wanted to do was to import a variable that contains a list object, which is perfectly fine, the only issue was that the variable was in a web page called dumptext.com.
And to make things worst, when you inspect the web page’s source code you find that the text isn’t even raw text, I don’t know why would anyone add a raw button that doesn’t even return raw text.
This is a bad idea, but hey: I like to solve problems, and we will work with this scenario just for the fun of it, so let’s start with Solution #1.
The first solution will consist in:
- Request the web page HTML’s source code.
- Parse the HTML code with BeautifulSoup
- Extract the contents inside the HTML’s
<pre></pre>tags which contains the list we want.
- Save the list into a module called
wordlist.pythat we can import.
- Import the variable
wordsfrom our newly created
Before we start, we have to create an empty module first that will hold our data:
$ touch wordlist.py
Now we proceed to create our main python code called:
Now, we are familiar with the packages
BeautifulSoup, but what about
importlib have a lot of utilities for our imports. In this case we imported the module
importlib that allows us to reload a previous imported module, fresh as new, including the newly created variable holding our list. Remember that PEP8 recommends adding your imports at the top of the file, following your docstring, not in the middle of the file.
Let’s see if our solution works well:
$ python solution1.py
It works!, we can see that our it is an object from the class list and it prints successfully.
There’s only one thing… we shouldn’t do this. Importing random code from the Internet is not a great option, believe me. You could do this in very specific scenario where you absolutely don’t have any choice, the file is read only and also you are the owner. If it’s a controlled file then good, but realistically I don’t see that happening, so please, avoid this solution.
It was fun to write though.
This is a much better (and cleaner) solution, what we’ll do is the same scraping as before, but instead of creating and loading a module, we will pick everything inside a pair of single quotes (
'') from the parsed HTML code with a regular expression and add those results to a list.
Let’s jump straight into the code:
Let’s see if it works correctly:
$ python solution2.py
That’s it!, no need for stinkin’ modules and injecting random code into our code, no importing and no reloading:
- We imported the library
rewhich aid us to search for strings using regular expression (Or RegEx).
- We use the same scraping technique as before.
- Now we define a RegEx pattern with
- Create a List of words, where
wordis the result of all the words found in
And that all, you can now use your
words in a lot more safer way.
If you’re interested in being part of a community, then I cannot recommend you Python Discord more. It has helped me to learn so much about Python, people is really helpful and we’re are always growing.
So please, be my guest and hop in this awesome community, here’s your invite: https://pythondiscord.com/invite
Everyone is welcome.