Commit graph

56 commits

Author SHA1 Message Date
AnonymousRandomPerson
9437f4cc7c Integrate whiteout scaling with merging 2022-05-21 16:57:19 -04:00
Hans5958
6e19e29983 Only extend on crawls 2022-05-21 08:14:58 +07:00
Cheng Hann Gan
d92eb66102 Revert "Revert "Removed unused contributors field from JSON""
This reverts commit 6a2b409fe5.

We need a proper discussion. Until then, this commit will stay.
2022-05-13 13:19:43 -04:00
Hans5958
6a2b409fe5 Revert "Removed unused contributors field from JSON"
This reverts commit 5deedbdea0.

ART, I'm not sure what is your intention, but it seems you are quite persistent. Please don't be distruptive.
2022-05-13 23:58:24 +07:00
Cheng Hann Gan
d90baab00a Revert "Revert "Removed unused contributors field from JSON""
This reverts commit 7f735f9abc.
2022-05-13 10:16:50 -04:00
Hans5958
7f735f9abc Revert "Removed unused contributors field from JSON"
This reverts commit 5deedbdea0.
2022-05-13 21:06:50 +07:00
Cheng Hann Gan
8921fdd0ad Fixed comma formatting in crawler 2022-05-12 12:54:27 -04:00
Cheng Hann Gan
5deedbdea0 Removed unused contributors field from JSON 2022-05-12 12:45:56 -04:00
Hans5958
710d4a7c6a Little rewrite on migrator, migrate submissions if needed, add writing log 2022-04-25 15:50:26 +07:00
Hans5958
dc5eda7ed6 Add missing spaces 2022-04-19 10:47:11 +07:00
Hans5958
d0d8add004 Use TXT instead for JSON for manual_atlas 2022-04-19 09:00:26 +07:00
Hans5958
37bfe79d3e Fix crawler and merger script, fix typo of formatter 2022-04-19 08:55:37 +07:00
Hans5958
0e5df81ea4 Edit instructions, move to top 2022-04-18 21:25:36 +07:00
Hans5958
3be58f2a02 Migrate contributors, move contributors to the end, simplify script
Forgot that there won't be submitted_by on the submissions
2022-04-18 21:21:02 +07:00
Hans5958
41ab1f0d20 Use different flair for edit 2022-04-16 22:25:11 +07:00
Hans5958
6faec6f11d Store and check read edit entry ids 2022-04-16 22:25:11 +07:00
Hans5958
1e1c007d31 Support editing on script, submitted_by to contributors, merge to atlas script 2022-04-16 22:25:10 +07:00
Hans5958
547ffc773e List ids that already been read in a seperate file 2022-04-10 21:34:06 +07:00
Hans5958
31101f8211 Forgot to avoid IndexError 2022-04-09 16:33:09 +07:00
Hans5958
a996899814 Add ensure_ascii=False 2022-04-09 16:33:09 +07:00
Hans5958
9053d6d106 Add docs on formatter, move path length checker to formatter 2022-04-09 16:33:09 +07:00
Hans5958
cc497d5178 Add forgotten parts on the port, move some parts for better git diff 2022-04-08 11:21:16 +07:00
Hans5958
121d8653a5 Update auth setup docs with more clarity
Also adapted from Nick

Co-authored-by: Nicolas Abram <abramlujan@gmail.com>
2022-04-08 11:09:35 +07:00
Hans5958
ad15cefb07 Remove zero width joiner before parsing 2022-04-08 11:03:18 +07:00
Hans5958
f04102f29c Port flair edit, small log change
Logic and port made/adapted from Nick's

Co-authored-by: Nicolas Abram <abramlujan@gmail.com>
2022-04-08 11:03:18 +07:00
Hans5958
4c4fb973f3 Fix trailing comma removal 2022-04-08 11:03:18 +07:00
Hans5958
33bda5f364 Expand error info 2022-04-08 11:03:17 +07:00
Hans5958
8913e781bf Forgot to silent on formatting 2022-04-08 11:03:17 +07:00
Hans5958
cd87520c87 Handle escaping escape characters 2022-04-08 11:03:17 +07:00
Hans5958
1da79b806b Take only the first object, remove escapes, add more info on errors 2022-04-08 11:03:17 +07:00
Hans5958
7571a92fd9 Simplify try-catch, assert that path length > 0, separator on fail file
Assertion migrated from ARP

Co-authored-by: Cheng Hann Gan <chenghanngan.us@gmail.com>
2022-04-08 11:03:14 +07:00
Hans5958
3a86822175 Happy little accidents 2022-04-08 11:03:12 +07:00
Hans5958
5a660759bf Improve and merge scripts, use JSON instead of regex 2022-04-08 11:03:12 +07:00
unknown
93e80dc7f9 New submissions, flair editing script 2022-04-07 17:00:30 -03:00
unknown
c7f4b927f5 Fix missing id in entry 2022-04-06 17:38:25 -03:00
unknown
7ad57eaef9 Make redditcrawl create a valid json for atlas_temp 2022-04-06 16:47:46 -03:00
unknown
f03a825c5d Fix crawler 2022-04-06 04:48:39 -03:00
unknown
443c210299 Fix Success counter 2022-04-05 22:33:57 -03:00
unknown
35672fdb47 Fix reddit crawl newline and quote handling 2022-04-05 19:00:31 -03:00
ash
2848f93f47
Merge branch 'master' into master 2022-04-05 22:38:13 +01:00
Nicolas Abram
2f45690040
Fix reddit crawling file encoding 2022-04-05 18:19:49 -03:00
ash
1eb81da6de
reddit contributions 2022-04-05 20:57:08 +01:00
ash
6206596da1
fix and update crawler 2022-04-05 13:47:36 +01:00
Aeywoo
5560a2f788
Merge branch 'master' into redditcrawl-fix 2022-04-05 20:48:48 +09:30
Someone Somewhere
9e77c4231c Fix encoding issue 2022-04-05 14:15:30 +03:00
ash
1933a5aebd
latest reddit updates and dupe prevention logic 2022-04-05 11:55:45 +01:00
ash
3e2f998116
update crawler to reject already submitted 2022-04-05 11:25:12 +01:00
ash
4d6fd79368
new submissions, and move to reddit-based ID 2022-04-05 08:39:51 +01:00
ash
91b2bb82c1
reddit contributions and improvements to redditcrawl.py 2022-04-05 07:58:54 +01:00
ash
c1a832389e
add docstring to redditcrawl.py 2022-04-04 19:02:15 +01:00