In a modern enterprise internet company the performance tester needs to cover a large range of topics/requirements/tool sets and approaches. This page will outline as much as possible all the areas to be considered for hand over in one particular, typical, organization. I'll start with lists and see where we get to with details.
Top level areas:
- Front end performance testing and client resource loading patterns
- Back end performance and services/caches/databases
- Front end loading requirements: page complete and 3rd party resources
- Back end requirements for services: read/write times under specified loads
- SLAs on 3rd party systems
- (Facilita forecast) deprecated but old scripts may need analysing
- HP LoadRunner
- General Linux tools and bash scripts
- AWS console and APIs
- Adobe analytics
- Cacti (server monitoring)
- bash (use google for the tiny details! what a painful language to work with day to day...)
- Excel. This is limited for large data sets that we typically use but can be useful on occasion.
Monitoring (providing monitors as as a separate perf testing job)
- Monitoring of various performance metrics (see the Splunk pages)
- Monitoring system usage for SLAs
- Monitoring front end page load times
Load Test Data profiles
- Production log file analysis
- Front end (pre-cache - Adobe Analytics) versus back end (post cache - server log files)
- GETs, PUTs, POSTs etc. Not everything is in the log files
- You need to liaise with devs/architecture/business around expected traffic on new systems or peaks that are likely to happen in the near future due to known events
- Specific business cases can arise due to publicized events - ad campaigns/competitions
- Be aware of cache hit ratios and real data selections by users
- Calculate request ratios across apps and services and layers
- You need to be pro-active
- You need to liaise with devs, dev-ops, team leads and project managers
- You need to have access to book environments, lock services (mocking), monitor systems
- You need to specify any data requirements to interested parties
- You need to coordinate everyone across the various systems so your testing is not interrupted and you do not break environments or interrupt others
- You need access to ramp up databases/build environments to specifications (1,2,3 app servers etc.)
- You need to be more pro-active than in step 1. above. You need to coordinate everything you need. Anything missing will ruin your testing
- Give people notice and chase up after. Double check anything critical
- Email all interested parties before/during/after test runs
- DO NOT run any performance tests without everything being signed off and ready. (I have seen junior perf testers bring down system/point to wrong environments and cause havoc)
- Wait for confirmation. Chase up directly if it is not made clear in your eyes. Ask again if you feel you need to. And you should be coordinating everyone involved.
- Watch out for testing clashes. Is anyone else running a performance test that shares any of your resources? I am lucky just now, covering most projects myself makes this easier to keep an eye on. But I have often in the past found out half way through that someone on the other side of the office is running tests without my knowledge. If testing is split across teams/offices you need a system in place to cater for this.
- This is critical to get your conclusions across and action any findings
- Always provide a management summary that anyone/everyone can understand
- Provide details lower down. You must support any claims/findings with clear evidence.
- Keep as much analysis data and logging as possible until issues have been fixed
- Store key results data sets for future analysis
- Cover everything you report on - you may need to justify decisions
- Send emails about anything untoward or worthy of note. Even if they are not read they are good to have as a record
- Take a note of the application versions you are running against. Later on this (documented) information can be key, to prove something worked before for example, in a particular version.
- Go to stand-ups when you can. If you are across projects make sure you keep in touch with the teams if you can't make all the stand-ups
- It is a good idea to work with Jira but performance testing doesn't easily fit in the agile model so you may find a better way to work.
- One thing I do now is have my own weekly meeting on up and coming performance work across projects. This allows you to plan resources for the week, book environments, think about any DB updates or data files you may need to obtain
- Take care. Test tools can be at fault, even the amazing LoadRunner!
- If you have any collaborative monitoring (dynatrace/in built app metrics/ post analysis of log files), make full use of it
- Use the teams around you. If you do not have access to systems that you need, talk to others, try and get access or get reports emailed out to you
- Get analysis reports from DBAs when required
- Within LoadRunner watch out for issues caused by data. If you have monitoring plugged in, make full use of it - graph server resources and use the correlation and cross results functionality
- Make sure all your scripts are checking correctly for pass/failure
- Talk to the devs about expected results and return codes when designing scripts
- Make sure your cache hit ratios are correct. Slow results could result from random data selection that just wont be seen in Prod.
- If you can, compare with current Prod systems
- If there are issues, try and pin them down to specific causes. See 'Finding Issues' for some examples
- If you are not experienced at this, involve other people from other teams
Tests to run
- There are lots of different systems out there and any number of components can cause performance issues
- Your job is to catch as many issues as you can before go live.
- Always try and run a soak test, the longer the better. This weeds out any longer term issues
- Try and run high load soak tests. This is aimed at stressing the app and DB rather than the hardware
- Scalability needs testing. Even apps that are specifically designed to be linearly scalable are often not (ask me about Citrix farms!)
- And to look for true server capacity you need to creep up on it. Slowly increase the load and run at steady state before deciding if that load can be sustained. Then up the load by 10% and repeat. It takes time but you get proper answers that way.
- Double the server/quadruple the server and check with decent length tests - maybe half an hour but maybe 3 hours each, depending on the apps and the data model
- Watch out for test tools! You can hit their limits and it's not always obvious. In AWS watch out for pure numbers of injectors. CPU and memory may look fine but AWS may be limiting network bandwidth in the background. More injectors can fix this.
- Run tests to different layers if you can to isolate performance bottlenecks
- Watch out for caches - real data versus test data. With long running tests it is very difficult to keep your cache hit ratios low enough. And that can skew your results - improving thing for the servers!
- Of old I would always do rendezvous tests. This is not needed so much in stateless web sites but bear this mind.
- Low level as well as high level tests. A lot of emphasis is put on high load tests, mainly because we are often focussed on app and server stability. However, there is also a place for low level tests, to check application response times under more normal (mid-day) loads, when the caches may not be hit so much.
- Of course you also want to run several standard tests. These are particularly useful for regression testing. various 1 or 2 hour tests that can become benchmarks. Results can then be directly compared between application versions etc.
- If at all possible run the same tests several times, at different times of day. This is to allow for variations on the networks and servers, with traffic and batch jobs etc. that you may not ne aware of
- For front end testing it's really good to run from home (with a local install of webpagetest) so you can get real live networks through a standard ISP.
- There is a whole other side to performance testing: that of development support and tuning. In these cases you can work closely with the devs and follow what they need to perfect there apps - usually java memory tuning. You still need to bear in mind data profiles etc but tests may be built with very simple repetitive calls, perhaps one line tests but data is still important for caching reasons. This of course can all be discussed with the dev and models can be designed together. It depends on their specific needs here rather than the wider (final) business use
- If you think you've found an issue, double check it, try other tests around it, get the dev-ops to look at it (or whoever you are working most closely with - basically a second set of technical eyes)
- Keep an eye on the logs and on all the boxes resources - including the test tools
- Don't be afraid to ask questions. Yes you are technical but don't let the devs put you down just 'cos you don't know the inside of their app. In these technical relationships make it clear that you are on their side, you are there to give them confidence in their app - I do often simply ask them if they will sign off their app if they say it doesn't need performance testing!
- If you do find issues, look in the logs and see if you can send a more detailed email. Devs like reading logs! so if you do too, that's a good start!
- Otherwise send as much detail as you can and if something can be reproduced, that is always a big advantage.
- A lot of performance testing is about relationships. The dev teams want to come to you and say 'perf test this for me this afternoon' but to do that you need all the above in place and working smoothly. And to get those reports out you need to be coordinated with all the other teams. Really this is just a matter of working out processes that work for you, getting all the agreed business communications right, fitting in with everyone else's booking systems etc. This does come with time so if you've just started, don't worry!
- Run benchmarks and try and design them so you have all the details, so months down the line you can be sure you are comparing like-for-like. These days I add transactions into the test just for documenting settings in the summary report. This makes things much quicker to check across runs.
- Watch out for functional issue! The number of times I have uncovered functional issues is surprising. Often this is because we are the only ones testing with concurrent users. Or sometimes it's because of the sheer amount of data we push through the system - so I may test with a million urls and then the problematic ones get highlighted, but the functional testers just can't cover that breadth. (I even had one app once that didn't work at all with concurrent users - and I mean 2 - got a DB lock - it had just never been tested under these circumstance until it got to me)
- Try and avoid production, almost at all costs! I have seen some terrible consequences. Load testing can kill systems. You only have to make a mistake in a scenario setting (yes it does happen, we are still carbon based life forms running the IT world!) and you can bring down sites and systems. Even here, during legitimate monitored testing, we did break a front facing CDN. It wasn't meant to break - it wasn't meant to behave like that - but still, if you can, avoid any business critical systems.
It turns out that performance testing is a lot about mathematical modelling. The main aim is to mimic user interaction with the applications, covering different types of user, different work flows (and specifically their ratios to each other), typical data selection and even ratio of usage across different applications if they share any resources - watch out for DBs being shared.
- Talk to all the stake holders
- Get walk throughs of work flows
- Talk to the business (or BA) about importance and ratios of work flows
- Only include the work flows you have to - bear in mind the cost/benefit of your work
- Work flows may need to be included because of sheer volume
- Some work flows may be very occasional but need to be modelled because they are business critical
- Keep work flow scripts simple and try and keep to 5 or 6 per project
- Let the functional testers provide test coverage. You are focussed on performance and server stability etc.
- Data modelling is critical. You must try and hit production cache ratios. And this isn't always obvious. Some caches are hidden deep within applications or DBs. The best way to be sure of this is analysis production log files for real data patterns.
- Sometimes you can replay logs - or at least pull out the data and use it directly.
- Other times you need to apply curves to data you have gleaned from DBs - there are some methods doing for this on this site
- If you are looking at new apps, have discussions with devs and BAs and try to apply your typical users to this app, as will be expected on go live.
- All of this is critical because if your models are wrong, your answers will be wrong.
- When it comes to shared resources, watch out for different peak times across different apps.
- Apps can be run in isolation, particularly with the benchmark approach, but often you need to run several apps concurrently but with different peak models depending on what you want to look at
- Something else: Every performance test should be designed to answer a particular question. And the answer should be useful such that any results will be used. otherwise there is no need to run that test.
- When the devs ask you to perf test something, dig a little deeper and find those questions so that you can design your tests accordingly.
- And these questions should then be addressed in your results report and in particular in the management summary
Maintenance of scripts and data
- As projects develop the scripts need to change with them
- This may be changes to endpoints or it could mean additional functionality
- This requires you to be on top of all the development work that can affect your remit
- You must keep up with the Jiras and make sure any changes are conveyed to you
- And benchmarks may need re-settings to take into account acceptable changes in performance - of course these requirement changes do need sign-off with the relevant stake holders
- Watch out particularly for CI test packs. These run automatically and can easily be taken for granted. Over time they are likely to creep away from true form
- Also, data needs maintaining separately.
- Data creeps quite quickly. I have designed CI to combat this - in fact data creeps so quickly I need to do this. But every month or so the core data files on CI projects do need looking at.
- Other projects typically need looking at more frequently (I have built in design strategies in the CI that are not typically present in everyday projects)
- As data goes out of date the number of errors in a test run increase. This is ok as you can see the details and can sign them off as known data issues. HOWEVER, this factor can easily drown out actual errors. AND it does clog up the log files (in all systems) so you really need to keep these known errors to a minimum. The easiest results to analyse are those with known 100% good data, so any issues can clearly be seen
- When it comes to CD (down the line, with bamboo) this data maintenance issue will be more critical. Without any user intervention, someone must take on the responsibility to keep on top of this
CI setup - LoadRunner
CI Setup - webpagetest
Webpagetest local setup and configuration and update strategy
Load test tools and controllers and injectors
Local LoadRunner test tool box for script development and small investigations (has license installed)
Projects - old and current and their documentation and data models and specific requirements
Version control and backup procedures
HP support account
Project documentation (and benchmarks) on confluence
Performance test tab in the main Jenkins server
This web site! (for reference). My phone number!
(and please be careful with all this, look after the scripts and scenarios, and all the CI configurations. Watch out for webpagetest - it can be delicate but it is brilliant underneath. And keep the teams sweet and keep on top of everything and it will all run fine for years to come)