File :-(, x, )
Anonymous
sup /w/

I'm bored, so here's a toy that I've been working on. Essentially, it's an archiver/re-aggregator for /w/ and /wg/ so you can search old posts. It's a piece of shit right now but I have nothing better to do on my lunch break, so enjoy.

https://suigintou.desudesudesu.org/4scrape/index
>> Anonymous
great stuff
>> Anonymous
Security problem with certificates. Might want to fix that.
>> Anonymous
>>499606
It's because I'm using CACert instead of paying $500+ to have it signed. CACert is a free certificate service, but you typically have to add them as a root authority since they're not very widely trusted, so to speak.

That aside, I really only bothered with SSL for encryption, not identification, so I'm not that concerned about it.

>>499601
Thanks. Hopefully I'll actually add some useful features in the next few weeks (bounding searches to match only a specific format, resolution or aspect ratio).
>> Anonymous
GJ
>> Anonymous
AWESOME
>> Anonymous
Trojan.
>> Anonymous
>>499629
I wish. It's machine code.
>> Anonymous
Get a proper certificate and more content, then we'll talk.
>> Anonymous
>>499643
How much content do you want? Give me some figures and I'll tell you how much is currently in there :)
>> Anonymous
>>499643
Certificate issues were already mentioned in>>499611. If your browser pops up a warning, either accept the certificate permanently, or add CAcert to your list of trusted authorities. I'm not paying $50/year for a shitty low-level authentication -- I just wanted encryption for my own purposes.

Regarding content, it currently indexes 23,836 images across 40,624 posts. The images take 16GB of disk space. Approximately 600 new images and 1000 posts are scraped, processed and indexed each night.

So, if you want more content, post more content in /w/ and /wg/ and it'll get picked up and processed for the next day.
>> Anonymous
You sir, are a (not an) hero. Can someone sticky this please for the greater good?
>> Anonymous
>>499728
It appears to index only walls which have comments in the posts. Meaning walls without comments or inaccurate comments are not indexed, and indexed erroneously, respectively.
>> Anonymous
OP:
what's this written in? php+mysql?
>> Anonymous
I have a suggestion: let me filter the wallpapers by resolution
>> Anonymous
     File :-(, x)
>>499783
It indexes everything -- there's a link to a reconstruction of the threads on the image page to demonstrate this.

The current search mechanism operates as a fulltext search across the post comment and subject fields (a fulltext search means every word of at least three characters, excepting a list of stopwords like 'the' 'from', etc is indexed).

When you make a search, it pulls all posts which match the query, and all posts which are in a thread whose OP matches the query. From this set of posts, it grabs all of the images and presents that as the results.

As you've noted, this is a bit of a hassle for posts which have an image but no text. There's a couple of features I haven't gotten around to re-implementing yet that somewhat migitate this -- the first and most obvious is to include the original filename in the text search. This is useful because every image (excepting the 20k legacy images pulled from an older database) has that. Additionally, people (on /w/ and /wg/, at least) tend to organize their personal collection with filenames, so it'll likely be relevant to the image contents.

The other "feature" is twofold -- on the image display page (which currently shows the image thumbnail and all the posts it was posted on) will have a small set of "related images" at the bottom. These images are chosen from various bits of metadata from both scraped information (ie, posts contents) and stuff which can be extracted from the image (dominant colors, fuzzy analysis to find detexts, etc). So even if there isn't a comment associated with an image, I might be able to pull off enough metadata to make it properly searchable (or at least, some approximation of that).
>> Anonymous
>>499785
Python+MySQL :)

>>499787
On the to-do list; I'm hoping to have that particular feature done by tomorrow. Going to revamp the search (or just call the new one an "advanced search") which lets you constrain the results with a set of parameters (resolution, aspect ratio, other metadata, etc).
>> Anonymous
Very nice OP!

Just a few things that you could add:
Show image resolution and order pics by topic or smtn.

But even still, very nice!
>> Anonymous
Okay, limiting searches to resolution/aspect ratio is coded in. You can access that functionality by navigating to the "Advanced Search" from the index and putting values in there. Aspect ratio is compared to within 0.1, so if you search for "1.3" it'll pull up images with an aspect ratio of 1.2-1.4. I figured that was close enough.

>>499817
Image sizes are now displayed in the thread view, and what little image metadata I've collected is shown on the image view page. Eventually there will be more shit in that box, but not tonight. It's time for me to start drinking.

I'll check back in the morning. Enjoy.
>> Anonymous
wow i love the idea ^^ ill definantly use the site now even if it isnt fully done
>> Anonymous
Hmm, server seems to be down, or is that just me?
>> Anonymous
great site, my suggestion would be to add some sort of ratings and display up to 100 per page

love it though
>> Anonymous
no, its a security certificate issue thing, at the bottom of the error page just go through the links to accept it

im on firefox 3 btw
>> Anonymous
awesome, great way to keep from having so many repost repost threads

keep working on it
>> Anonymous
>>499883
Ratings (or at least, explicit ratings) are an almost guarenteed "no". One of the major precepts of this project was that there'd be no user interaction. No tagging bullshit, no ratings, no nothing. "The system should be completely autonomous and figure everything out for itself."

Part of the logic behind that is that I don't really expect more than 10-20 people to be using it, and tagging/rating/viewing 700 images/day is a nightmare. That might eventually change, but it's definitely a very low priority right now.

I'll definitely add in the images per page option sometime this week though, thanks for reminding me about that :)
>> Anonymous
>>499891

np, i understand

ive always felt guilty posting on /w/ since i know what i'm asking for has been posted 100 times already
>> Anonymous
I give you kudos for this! *bookmarks*
>> Anonymous
Fuck-yea! THIS IS LEDGEND!! Seconded on the fave/bookmark!
>> Anonymous
Server not found
Firefox can't find the server at suigintou.desudesudesu.org.
>> Anonymous
we need this for /b/
>> Anonymous
Amazing, thank you.
>> Anonymous
Truly a nice piece of work.

One thing I would love to see is some kind of mass downloader interactivity (ie: DTA!, not4chan grab, chanmongler, etc), just in my small amount of time hitting the random button (a fantastic feature btw!) I found a couple of massive threads I will need to grab, either way really like what you've done here
>> Anonymous
Looks interesting, OP.

B.t.w. I never understood why browsers consider people more trustworthy just because they pay $50/y.
>> Anonymous
why the need for a SSL certificate?
>> AWESOME AWESOME
AWESOME goes in all fields.
>> Anonymous
Great/amazing work. Love it!
>> Anonymous
BUMP for great justice
>> Anonymous
>>499891
Damn. Was going to ask for a tag system. No user interaction period?

Making the system 100% autonomous will only go so far.
>> Anonymous
The advanced search is a nice touch, though personally I think it could be customized slightly more since it's only scraping /w/ and /wg/. Specifically what I mean:

* An option to search only one of the two boards (prime example: search for tits, 90% of the results are Code Gayass wallpapers. If you wanted stuff from /wg/ with actual real non-anime tits on it, you'd be disappointed.)
* As opposed to typing in the desired aspect ratio, how about check boxes for aspect ratios to include in the search results, with 5:4, 4:3, 8:5, and 16:9 being the four values. Those will be the most commonly searched aspect ratio values.
* Not search-related, but still needed: On the image info page, could you possibly add target="_blank" to the link to the actual image?
* If you've been saving each image's original file name, how about having them save with it when downloaded?
* Tagging would be awesome (and would help the search function find relevant images), but get the rest of the shit working first.

Now, I must continue my valiant quest to have on my hard drive every reasonably unique Yoko wallpaper ever made.
>> Anonymous
fix the certificate issue and then it's instant sticky.
>> Anonymous
omg I love this site
>> Anonymous
Awesome site OP
>> Anonymous
Thanks for all the praise -- it makes me feel all warm and fuzzy inside.

>>499942
One issue with my setup is that I'm using DDNS to map my server's IP to the domain. suigintou.desudesudesu.org is a CNAME record which maps to desu-suigintou.no-ip.org which returns the A record with the IP address. I think my IP address renews every 6 or 12 hours (or something silly like that), so if it gets assigned a new one there's like 5-10 minutes of downtime while the DDNS updates.

>>499960
Wouldn't work. The scraper only visits the boards each night, so it's only viable on boards where threads last longer than a day, otherwise you'd end up with a lot of partial threads and other weirdness. The 4chanarchive is a much better solution for scraping/archiving boards like /b/.

>>500089
Everything on the internet should be encrypted, especially with some high-up people thinking that net neutrality is a "good" (or even feasible) idea.

>>500238
Yeah, I'm still debating about it. Eventually there might be some form of tag system, but I'll probably implement a scraper which scrapes user-input based on how you browse. For example, if you're viewing an image and click on one of the suggestion images on the bottom (which aren't there yet lol), it'll add weight to the link between the two images. So if a search pulls up one image there's a chance it'll pull up the linked image.

Though a plain vanilla tag system might be easier to implement. I dunno.

>>500384
Lots of good suggestions!

The board selection thing is definitely going to happen. There was an issue before with how I did the search in a previous version which made it slow as hell, but I think I've managed to fix that a bit.

Also, if you search for "tits -geuss" (without the quotes), it'll bring up what you're looking for.
>> Anonymous
>>500519con't

With regards to the aspect ratio -- would you prefer checkboxes or a drop-down box? I definitely agree that something needs to be changed about that, since it's not really intuitive right now.

Since there are two links now on the image page (the image and the link in the metadata), Having one of them pop open a new window is no biggie.

Saving as the original filename is a problem though (as helpful as it would be). The issue is that there isn't a one-to-one mapping of images to filenames (and about 90% of the images were migrated from previous versions of 4scrape which didn't scrape the original image names). Since an image can be referenced by multiple posts, it can have multiple original filenames, and it gets kind of messy from there. The other (slightly less apparent) issue is that setting the filename requires me to pass the images through a script manually (instead of having Apache do it with sendfile) just so I can cram an extra header in there.

With continued remarks about the certificate -- I'm considering purchasing a shitty cheap one off GoDaddy but I'm not sure if that'll fix it (are they a common root authority? I have no idea). Definitely won't happen before next Friday (payday!) because I has no money.

Finally, I won't get around to enacting the changes mentioned above until sometime tonight because I'm posting this from work.

Thanks for the feedback so far (yoko is mai waifu).
>> Anonymous
>>500001
With the more expensive certificates (ie, all the ones that are typically considered 'valid'), you have to do crazy shit paperwork with a lawyer and prove that you are who you say you are, etc.

With the cheap ones, not so much. It's one of the reasons I'm kind of worried about spending $15-50 on a cheap cert -- I'm not sure if it'll fix the problems some people have been having. I'll download Firefox 3 (since it seems to refuse the site if the certificate can't be verified wtf?) and see if I can dick it up or something.

>>499992
One of my friends suggested having it automatically gzip up threads or batches of images and uploading them to rapidshare automatically. That's not probably never going to happen.

If you want to download all the images from a whole thread, I'd recommend picking up wget (I would, being a UNIX faggot). It's an awesome http/ftp fetcher. You'd invoke it something like

wget -r -l1 -nd -A jpg,png,gif -R '*thumb*' <url to reconstructed thread>

I might be a little off, but you get the point.
>> Anonymous
I love you, OP. So much.
>> Anonymous
>>500503

yeah!

>'m not paying $50/year for a shitty low-level authentication -- I just wanted encryption for my own purposes.

then why do you use it.
>> Anonymous
>>500564
SSL provides two services - encryption (through asymettric encryption) and authentication (through the public key infrastructure, PKI). Unfortunately, there is no way to pick and choose from it -- it's an all or nothing deal. I only care about encryption, which means I have to deal with PKI bullshit as a side-effect.

The PKI can be thought of as a pyramid of trust. At the top you've got the root certificate authorities. Each certificate is "signed" by a certificate authority, which in turn has it's own cert signed by a higher authority up to the root authorities.

When your browser attempts to authenticate a certificate, it asks the cert's authority whether the certificate is valid. It then asks the ca's authority whether the ca's certificate is valid, up until it hits an authority it trusts. Browsers typically come with a set of root ca certificates (like verisign, etc) which it trusts implicitly, so with enterprise sites you don't typically have this problem.

So what this means is that either I need to get a certificate in one of those 'trusted by default' authorities, or people just have to add my certificate (or CAcert's root certificate) into their trusted list.

Note that authentication is not the same thing as encryption. I could give a shit about PKI and the integrity of the pyramid of trust. I do care about encryption, because I'm one of those weird staunch believers that everything should be encrypted.

It's never really been a problem for me, but I am considering either purchasing a certificate or providing a non-SSL version of the site (which is a pain, because I run lighttpd on port 80 and my code is based off mod_python, which is an Apache-only thing).

tl;dr will fix eventually.
>> Anonymous
>>500571
Basically, the authentication thing is what guards you against MITM attacks. Without guarding against that encryption is useless, which is probably why you cannot choose. So the best way to fix this would be to post your fingerprint somewhere out-of-band (the chance of a double MITM attack being much lower - unless you happen to live in China of course) and have the users verify the certificate and then trust it. That would fix the problem, it's just cumbersome. But I don't think you can prevent the cumbersomeness because MITM attacks are really insidious - you're basically not talking to who you think you're talking to.
>> Anonymous
FUCK YES YOU ARE BETTER THAN KONACHAN
>> Anonymous
Just had a look, pretty awesome so far and the certificate issue actually keeps fags out lulz. Anyway, would displaying more images per page be a reasonable implementation?
>> Anonymous
>>500604
Yeah, it's on the to-implement list for when I finish up work.

I finally caved in for you faggots (it took what? a whole day?) and bought a certificate (Go-Daddy was having a half-off sale). Let me know if there are issues with it -- I tested it on Windows with FF2 and FF3, and on FreeBSD with FF2.

Hopefully there won't be any more issues of that measure (until it expires a year from now), so I can get down and dirty with the code again :)
>> Anonymous
>>500606
Works fine on FF3 after I removed the exception.
>> Anonymous
>>499571
you, sir! mr. OP! i believe you have just won the game
>> Anonymous
Kookoo daddy.
>> Anonymous
>>500606

ok now it's good.
>> Anonymous
Just finished off most of the minor requested changes

* Clicking the image in the image view pops open a new window.
* Aspect ratio choice is now a drop-down.
* Searches now cover the original filenames, in addition to the post text and subject lines.
* Results per page is now selectable in the advanced search (max is currently 100).
* Now have a shitty favicon (a black wing to go with mai waifu suigintou).

I'm going to spend the rest of tonight (couple hours) hammering down the basics for the image suggestion bullshit which really is an important feature for browsing the images.
>> Anonymous
>>500112
Agreed, even though it didn't turn up any Ayane walls
>> Anonymous
Hey, OP. will you marry me?
>> ULTIMATE WALLPAPER OTAKU
OP, you are awesome. Always remember that. When you are having a shitty day, just remember: Everybody on /w/ & /wg/ loves you.

Suggestions
- Is there a script available that will automatically pick an image from a directory every so often and set it as wallpaper, say a couple of hours (Linux)? If not, if you made this, your status would be upgraded from FULL AWESOME to GOD.

- Work safe filter. User sees a 18+ image, ticks a box, image is added to a list. Optional "Include NSFW" filter at search. There isn't many NSFW wps that I could find by clicking "Random", but you know how it is.

- Same as above, but with "ANIMATION". I was browsing few, and I recognized a few animated wallpapers (As in animooted, not anime).

- Stay awesome. You are doing a good thing.

Have fun, my love.
>> Anonymous
OP, what's somewhere where I can send you some money? Paypal only. I'm scared to post my email on 4chan, but post a donate link on your website and I'll send you some money. If I'm lying, I'm dying.
>> Anonymous
>>500807
Well, it won't be a script to set the wallpaper, but I sure as hell can write a script to choose a random wallpaper (with a specific resolution even!) and give that to you. You can pretty easily tie that together with wget+cron+your wallpaper setter of choice to have it done hourly. I've been thinking about doing that for myself, so I'll post it (and the script I use on the client end) once I've got it working.

Worksafe filter is on my "I really should implement" list, and really shouldn't be that hard. The only thing I'm squeamish about is malicious users going through and ticking every image as NSFW, then having to deal with that. But again, I definitely do see the need (especially if you're pulling wallpapers from it on a laptop... at work).

I'll look into detecting animated GIFs with PIL (the image library I'm using). Picking those up shouldn't be too hard. Well, both of 'em, at least (the Nagato and the Laughing Man are the only animated wallpapers I think I've seen -- I guess there were a couple more now that I think about it).

I'm glad you guys are enjoying it :)
>> Anonymous
>>500810
Hah. Hold on to your money, I want to at least have it running for a week or two (and get some more of the features I consider important, ie, scraping the images for metadata) before I even think about pushing the collections basket around.
>> Anonymous
>>500817
Okay then. But as soon as that basket goes around, let me know. I have no idea how long it took you to do this (My work is in network maintenance, not web development) but you deserve a bit of change for your work here.
>> Anonymous
Sticky this. op, can i do you sexual favors please?
>> Anonymous
Dear OP

THANK YOU THANK YOU THANK YOU
>> Anonymous
>>500807

fuck that. keep the site as simple as possible, also what fucking job do you have that you browse 4chan all day at and care about a nsfw filter? UNDERAGE B&.

i suggested a the results per page option yesterday and see it on there today :) thanks op.

this thread needs a sticky.
>> Anonymous
>>499571

OP, I saw the thread you posted in about 2 weeks ago. Ive been waiting for you to unviel your project ;)


Love it.
>> Anonymous
I'm not sure about the aspect ratio box, most widescreen wallpapers and computer monitors are 16:10, not 16:9.
>> TRIPPING BALLZ -=-=-=-=-=-&gt;
bookmarking that site.

site = win
>> Anonymous
>>501031
16:10 = 8:5
>> Anonymous
>>501059
Scratch that
8:5 option appears to be searching for AR 1.66 instead of 1.60
>> Anonymous
To search for 16:10
&a=1.6
>> Michi-San
Wonderful Site.
Adding 2 my bookmarks
>> Anonymous
Jumping on the I FUCKING LOVE YOU OP bandwagon.
Can I have your children?
>> Anonymous
my dick has grown a few centimeters

ty sir
>> Anonymous
>>501072
Your dick is now erected you faggot
>> Anonymous
>>500954
/w/ is NOT an adult board, so it does NOT have an age limit. I'd also like to try and browse 4scrape from work, and I would also like the NSFW filter.

tl;dr fuck you!
>> Anonymous
>>501120

The whole of 4chan is 18 and over, work safe or otherwise.
>> Anonymous
amazing work OP
you fucking rule
>> Anonymous
another thank you to the OP
>> Anonymous
>>501062
Whoops, sorry about that, must have fat-fingered something. 8:5 now properly searches for an aspect ratio of 1.60.

> NSFW flagging
I've decided that there *will* be a NSFW feature, but it won't be enabled by default -- you'll have to check a box in the advanced search page. And I guess I'll have to put a checkbox next to the random search button as well.

In any case, on the page of each image there'll be a button to mark it NFSW if it's not already so marked. I hate stupid voting systems etc, so all it will take is one person to click the button and the image is flagged. I'll throw some safety features in there to hopefully prevent abuse (ie, it'll record the IP address of the person who flags it, and if I find out that someone's dicking around I'll just ban the address and unflag all the images they flagged), but I figure it's the only simple way to implement it.

Probably get around to doing that when I get off work, since I still don't have internet at home due to drama.
>> Anonymous
>>501153to clarify, "not enabled by default" means that it will display tits unless you ask it nicely not to.
>> sage
op you are a king among men
>> Anonymous
Hail OP! *Salutes*
>> Anonymous
dude this is awesome, want for /hr/ and /s/
>> Anonymous
>>501245
/hr/ has been requested a couple of times; I think I'm going to keep it strictly /w/ and /wg/ for now, simply because having only wallpapers makes it a fairly well-purposed project.

And while I do kind of want /s/, /hc/, /e/, /h/ and /d/ to be indexed as well (the previous version of 4scrape did all of those), it just turned the whole thing into some sort of loli porn machine. While that's not necessarily a bad thing, it's not really where I want to take this project.

So as cool as it would be, it's not going to happen anytime soon.

As a further note, I am available for baby making if and only if you don't mind my small floppy penis.
>> Anonymous
this is definitly awseome. 1000 internets to you
>> Anonymous
OP YOU FUCKING SUOMG THIS IS AWESOME!
>> Anonymous
>>501269

Good reasons, I wish I could do it myself hehe.
Btw, the pagination is fucked up, you forgot to include the c= in the "next page" links.
So you can only get 20 results per page unless you alter the url yourself.
>> Anonymous
>>501331
Whoops, thanks for catching that. Should work as expected now.
>> Anonymous
>>501337

Awesome, one more thing, maybe you should add a total counter so you can see how huge the db is.
>> Anonymous
I just found all the wallpapers made by me :D
https://suigintou.desudesudesu.org/4scrape/image?i=24416
>> Anonymous
Whoops, broke Apache. Am rebooting, etc, hopefully that fixes the problem.
>> Anonymous
And it's back up. Now the goddamn httpd daemons only take 130MB each instead of the 260+MB they were eating earlier. Which means I've got plenty of RAM left to spawn a crapload of them hooray.
>> Anonymous
what.
>> Anonymous
Wow. I'd like some checkboxes of standard wp resolutions so I can indiscriminately suck up stuff from /w/. thx.
>> Anonymous
Can some kind mod please sticky this? This thread will only live by itself for so long.
>> Anonymous
>>501245
For /hr/, there's already this:
http://playground.studio404.org/4down/hr/
Too bad it doesn't have a search function.

>>501859
THEN WE'LL MAKE A SECOND THREAD.
>> Anonymous
>>501795
I'd argue that if you need to select multiple wallpaper sizes, you should search by aspect ratio instead. I don't think I'll turn the aspect ratio drop-down into a checkbox, nor do I think I'll turn the resolution boxes into a checkbox/dropdown.

>>501859
Mods? On MY 4chan?
>> Anonymous
>>501859
Also, I put a shitty comment box on 4scrape yesterday (since I figured this thread would die), and I'll check that every now-and-then. So if you need to leave a suggestion/complaint you can always do it there.
>> Anonymous
First, I want to say : great work

but I was wondering, why is this always offline ?
I was able to access it 2 times
Are you still working on it ?
>> Anonymous
do like
>> Anonymous
>>502061
Depends what you mean by "offline". If you get an error page, it typically means I'm working on it. If you can't access it, it's either one of two things -

* Network outage. One of the core routers flopped this afternoon, so the network was out for about 10 minutes before the warm failover came online.

* DNS issues. Because of how the network is set up, I linked the domain name to a DDNS address with a CNAME entry. So if you're using a trashy DNS server which caches queries and doesn't update often, you'll get old DNS entries which are no longer valid. You can solve this by using a good DNS server, like 4.2.2.2 or 128.100.100.128. This isn't likely to change unless I magically start making an extra $750 month that a half-rack costs at the local datacenter.
>> Anonymous
I like the archive thing, but seriously...
/r/ and /b/ are That-a-way --->

Anime walls or GTFO.
>> Anonymous
     File :-(, x)
>>502078
>> Anonymous
Wow. Just...wow. *bookmarks* OP, you are awesome!
>> Anonymous
>>502078
I think this thingy is significantly more useful to us /w/'ers than to the rest of 4chan.
>> Anonymous
This is an excellent website. Thanks to the OP for creating this.
>> Anonymous
wow keep the good work, how to you have made a channel ?
>> Anonymous
You are awesome.
>> Newfag
Truly awesome.
>> Anonymous
wow. looking for something like this for a while. epic win!
>> Anonymous
bookmarked, very nice one OP, thanks a lot.
>> Anonymous
Wow. This is great! OP is awesome!
>> Anonymous
FUCKING AWESOME
>> Anonymous
u win.
>> Anonymous
OP, I love you
>> Anonymous
WAY TO GO DICKFACE YOU JUST RENDERED /W/ AND /WG/ OBSOLETE!!
>> Anonymous
>>502614
And nothing of value was lost
>> DesuMaiWaifu !fuwjJTs3Zc
OP, you're awesome, and I'm bored and feeling helpful. Currently going through all the 4:3 and tagging any NSFW/Sketchy walls I see. Let's see how far I get..
>> Anonymous
GREAT!
>> Anonymous
still not sticked?
>> Anonymous
You good sir, is a god amongst men.
>> DesuMaiWaifu !fuwjJTs3Zc
Going through all these, I notice that there's a decent amount of duplicates. Any ideas for deleting them?

Also, there isn't a limit to how many tabs you can have open in FF right? I have ALOT of walls open.
>> Anonymous
Where's the "I only want to view sketchy/NSFW images" option? :D
>> Anonymous
Holy sweet chocolate baby jesus. Awesome.
>> Anonymous
You sir have the godlike status like 20 frags in a row while flashed.

With a single grenade.
>> Anonymous
>>502652
Every post is a repost is a repost is a repost.
>> DesuMaiWaifu !fuwjJTs3Zc
Halfway through and I can't take anymore.

/bump
>> Anonymous
Sticky this thread you fucking lazy whores
>> Anonymous
     File :-(, x)
>>502652
Arguably, reposts could be due to the database returning an arbitrary ordering and fetching the same image more than once. The scraper md5's the entire image data and compares that to determine if the image is a repost; since it's hashing the entire file EXIF changes and minor shit like that gets past.

For the most part though (especially for the stuff posted on /wg/) md5'ing works fairly well. A lot of 'reposts' are actually different resolutions of the same image, which hopefully get picked up by the image processing shit (and shown as 'recommendations' in the image view).

Also, thanks for taking the time to flag NSFW images. I'm sure people who requested that feature appreciate your hard work nyoro~n.
>> &gt;&gt;500807
>>502826

I requested the filter, but I don't need it. I just didn't want it to become a high-res porn dump.

I don't really have any suggestions anymore. Maybe some sort of sorting by thread/average image colour (for those who need their desktop a certain colour). or something. You keep adding things so quickly, it's hard to suggest anything. Pretty much anything you add will be welcomed though, so you don't have to worry about screwing anything up.

Cmon mods, sticky this shit already.
>> Anonymous
Dumb question, but has this been crossposted to /wg/? Someone should let them know about this wonderful new toy.
ALSO, STICKY PLEASE
>> Anonymous
     File :-(, x)
>>503011
If it has, it wasn't by me. I'm not really a /wg/ person, someone just happened to suggest "hey scrape /wg/" really early on so I've just always been scraping it.

>>503007
One of these days I'm going to sit down and refactor the random/search code so that you can set constraints to the random thing (like, randomly browse only 1280x800 walls) and shit but I'm feeling really fucking lazy today.

I choose a really wonky way to store the average color (basically as an RGB triple, where each channel ranges from 0-8), and glancing at the database it looks like most of the average colors are some shade of gray (which I guess is to be expected). I dunno, I'll hack it or something.

I have a feeling that the median/mode are probably more meaningful than the mean, unless I sit my fatass down and compute standard deviations for each channel too or something. lol statistics.
>> bump bump
nothing intelligent to add...but i can bump
>> Anonymous
I've skimmed the thread and don't see the answer to this, and although I know you said no user interaction....

1. Will there be any way to flag mismarked walls? (i.e. I searched "kenshin" to find ruroni kenshin walls and toward the end about 15 trigun papers pulled up)

2. Will there be a way to flag screencaps/progress updates of vectors? (found one of my own screencaps of a vector-in-progress under "reborn", complete with inkscape border, lol)

Love the site. Thank you so much :)
>> Anonymous
Nice
>> Anonymous
I know it's been said but I can't stress it enough; this thing is awesome, OP. Thanks very much for it.
>> Anonymous
>>503102
Nope, aside from the NSFW thing, I don't think I'll ever have any explicit user input. No tagging, no flagging; keeps life simple.
>> Anonymous
OP, thank you. God bless you. My question, how old are you? This takes technical skills to do, like programmnig shit, ya?
>> Anonymous
so technically, couldn't someone screw up the searching by making threads and posting the wrong walls in there? and how does that handle threads started with "pic not related" or multi-fandom threads or hijacked threads?
>> Anonymous
Wow anon, you are my new favorite person. GREAT WORK!!
>> Anonymous
>>499571
Thank you kind anon. I've already put it to good use.
>> Anonymous
This is truly a wonderful godly piece of website.

OP is awesome!
>> &gt;&gt;500807
Bump for well deserved sticky.

Also, https://suigintou.desudesudesu.org/4scrape/image?i=74960
>> anon
this is a new age in anon O_O a way where we dont have to be on 24/7 to get walls!!! its amazing!!!! sorta >.> total support from anon to anon
>> Anonymous
>>503196
Yeah, that would fuck up the search. I won't say it's not a fairly stupid system (because it is), but at the same time it's easy as shit for me to maintain. I think any organizational system is going to have it's flaws, it's just a matter of which flaws you prefer. I have plans for a system which will (hopefully) rank the related walls higher on searches than unrelated ones, but that's still a long way off.

And arguably, any search system is vulnerable to poisoning.

>>503193
21, college dropout now working as a systems-level programmer for the research branch of the university library. They don't let me use C or Python at work (bunch of Ruby+Javafags here) so I get my jollies by writing shit like red-black trees and 4scrape in my free time. 4scrape isn't a brilliantly sophisticated application - the scraper is 650 lines of Python, the web-frontend 661 (not including templates and shit), so I'm not going to say you could do it without a good amount of programming knowledge.

Also, cocks. Let's see if I can finally get that goddamn image decomposition algorithm to match a thumbnail with it's fucking original image.
>> Anonymous
/r/equesting a sticky for win.
>> Anonymous
Look great but it doesnt work ?
>> Anonymous
Awesome, good job man !
>> TTX !!LDXtcFzmdPH
You totally have my respect , I'm learning Python too, and I'm enjoying it a l lot !
>> Anonymous
Works now, sorry for asking.
Dude, bless you.
How far in time does the archives go back ?
>> Anonymous
>>503704
Thanks for being descriptive enough for me to identify and fix a problem. I suspect you have DNS issues.

>>503731
Yeah, Python rox.

>>503732
Since early May, I think.
>> Anonymous
OP, can you do a scrape for /hc/ also >_>
>> Anonymous
>>503741
>rox

>>503737
>>_>

Get out. Really.
>> Anonymous
bumping and requesting sticky

and OP, great job on this
>> Anonymous
Sticky Request.
ALSO
>>499571
I can Has mIRC script for searching!? :o
>> Anonymous
>>503741
See>>501269

>>504119
Feel free to write one.
>> Anonymous
bump for sticky
>> Anonymous
>>504315
Anyone willing to write an mIRC script for 4scrape? not good with it... FAIL and AIDS occur when I try to script in mIRC.
>> Anonymous
>>504441
FAIL and AIDS were implied when mIRC was mentioned.
>> Anonymous
how does it deal with threads asking for certain characteristics, like "ITT: Girls with swords" or something like that.. would it pull up under that prompt?
>> Anonymous
this needs to be stuck
>> Anonymous
>>504479
really? I thought that was just you.
Good luck with AIDS.
>> Anonymous
this needs to be stuck
>> Anonymous
>>504574
Yes, but it doesn't handle "pic unrelated" (and related negations) yet.
>> Anonymous
its great.
>> Anonymous
>>499571
Its awesome! thanks for this, i'm adding it to mi favorites right now!
>> Anonymous
ITS OKAY I GUESS
>> Anonymous
This shit needs to be stickied. For, like, forever.
>> Anonymous
>>506150
problem is i don't think any mods visit /w/ I don't think /w/ even has a janitor
>> Anonymous
     File :-(, x)
>>506281
Maybe someone should ask moot over in /b/ or something... or someone of a high authority like that. this NEEDS to be stickied. it deserves a billion and eight internets and is just plain kickass.

look what I found when I searched for "Images"
>> Anonymous
Awsome site, my only comment is it would be nice if the wallpapers didn't automatically open in another tab when I click to see the fullsized ones. I hate that, If I wanted to have it in another tab I'd click it to go to another tab.
>> Anonymous
>>506582
Yeah, I kind of agree with you (someone requested they open up in another tab by default). On the image page, if you click on the local path link (in the metadata box), it'll open up the image in the same window rather than in a new tab.

Short of having some kind of poll, it's one of those weird preference things. Since there's a way to get both functionalities (despite one being more prominent than the other), I think it's almost a moot point.

>>506548
Easiest way to reach mootykins is to just send him a lovely email. Even if he's reading the thread you're posting in, it's kind of hard to pick shit out of the noise.

The simple alternative is to just use a bookmark (I don't think dad would be mad if he found it).
>> Anonymous
     File :-(, x)
Uh, how many of you tried searching "Feet"?

If you're eating, don't.
>> Anonymous
nice one OP.
this is relevant to my internets.
>> Anonymous
OP

you are almost God Of Win

your idea however is 100% Full of Win
>> Anonymous
Doesn't work as well as hoped.
>> Anonymous
>>506281
The last time /w/ had a janitor, perfectly good wallpapers got deleted for little or no reason. And with all the promotions around, either we have no mods or they're too busy fapping.

Also, >9000ing /r/ sticky
>> ?? Tripfag ?__?__? Ganondorf ?? !!48eQ+5JUVwg
>>507471
Also requesting a sticky.

Shits amazing.