![]() And here are some tips on how to spot fake face photos… For example, stylegan generates the sort of faces you can see on ThisPersonDoesNotExist. If it’s faces you need, then deep learning networks may help. gpt-2 looks like a related repo, but with a smaller model and no examples in the README…Īnother use of text is for testing OCR (optical character recognition) systems: TextRecognitionDataGenerator. If you need to put some text into images in order to test text extraction from images: SynthText. The state of the art in text generation was evidenced by a blog post from OpenAI that was doing the rounds this week. (I wonder, are there educational possibilities there that may help draft materials for, or support researching, new courses?) ![]() For example, this “semiautomated scientific survey writing tool” that will create a scientific review paper for you: HackingScience. Sometimes a real world source document can be used to bootstrap the production of a related, fake, item. ![]() If it’s technical waffle you want, here’s a classic pure fake computer science paper generator, SciGen is a classic, though it may take a bit of digging to find the required dependencies to run it… For example, loremipsum or collective.loremipsum. Searching for “sentence generator” will also turn up some handy packages…: markovify or markovipy, for example. If you prefer using neural network models, there are those too: textgenrnn. If you like waffle, here’s a WaffleGenerator. For example, you can generate test datasets using SciKit Learn and here’s one of my early attempts at generating 2D numerical data to demonstrate different correlation coefficients.įor text strings, generators are often referred to as ‘lorem ipsum’ generators ( why?). You can also generate numerical data with various statistical properties. ![]() (We’re about to start looking at producing a new machine learning course, so stumbling across that sort of possible requirement is quite timely…)īy chance, whilst searching for something else, I spotted this article describing pydbgen, a simple Python package for generating fake data tables to test simple database systems.Ī quick trawl turns up other packages for doing similar things, such as mimesis, or faker, which also inspired this more general R package, charlatan. It also means you can test as much as you want without having to expose any real data.Īccording to this article - Synthetic data generation - a must-have skill for new data scientists - knowing how to create effective test data is one of those new skills folk are going to have to learn. Handy for writing exams and homework questions.When developing or testing a data system, it often makes sense to try it out with some data that looks real, but isn’t real, just in case something goes wrong… So, give your students the links to create data sets and analyze them themselves. These two data generators generate the conclusions for the test. Makeup real fake data that replicates the findings of fake or actual research. You can specify the mean and SD for the groups compared to having more control over the units and come up with your own back story. Nice things about Andrew's: Graphs out your results. Nice thing about Richard's: It gives you options of several different units (days, money, etc.) AND vignettes that explain why this data was collected. They are both free and help you do your job. Specific data for everything you teach in Intro Stats, like t-tests, ANOVA, correlation, and regression. My new resource is from social psychologist Andrew Luttrell. One tried and trustworthy resource was created by I/O psychologist Richard Landers. I blogged about this one in 2013, and I've used his data generator for years.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |